Prototyping the MBTAC processor for the REPLICA CMP
: Forsell Martti, Roivainen Jussi, Leppänen Ville
: Manish Parashar
: IEEE international parallel and distributed processing symposium workshops
: 2014
: Parallel & Distributed Processing Symposium Workshops (IPDPSW), 2014 IEEE International
: 709
: 716
: 8
: 978-1-4799-4117-9
: 978-1-4799-4116-2
DOI: https://doi.org/10.1109/IPDPSW.2014.82(external)
Current chip multiprocessors (CMP) have mostly
been designed by replicating sequential/single core processors
and providing some support for operating them with a shared
memory. As a result of this, they define asynchronous computational
model of threads, often require maximizing the locality
of memory references to get decent performance, and feature
high intercommunication overheads, that make parallel
programming tedious for general purpose functionalities.
Most of these problems can be eliminated by designing the
processors architecture for scalable general purpose computing
from the very beginning like done in processors for configurable
emulated shared memory (CESM) CMPs. They provide
support for machine instruction-level synchronization,
make use of multithreading to support latency-insensitive
computation, and promote the concept of uniform synchronous
shared memory for easy variable allocation and convenient
data exchange. In our earlier work we have proposed the
first CESM architecture TOTAL ECLIPSE composed of early
MBTAC processors making use of very low-overhead multithreading,
parallel computing savvy functional unit organization,
support for fast synchronization between the instructions
and threads, and highly efficient multioperations.
Unfortunately, certain key parts of these processors turned
out to be hardly implementable and overall they lacked support
for ordered multiprefix operations and full configurability
of the CESM scheme. In this paper we introduce a new
fully configurable version of the MBTAC processor for our
new REPLICA CESM architecture and the first FPGA implementations
of it. To evaluate it, we execute short test programs
on it and compare it preliminary against Intel Core i7 and
DLX processors. Our FPGA design flow and testing approach
are described.