Prototyping the MBTAC processor for the REPLICA CMP




Forsell Martti, Roivainen Jussi, Leppänen Ville

Manish Parashar

IEEE international parallel and distributed processing symposium workshops

2014

Parallel & Distributed Processing Symposium Workshops (IPDPSW), 2014 IEEE International

709

716

8

978-1-4799-4117-9

978-1-4799-4116-2

DOIhttps://doi.org/10.1109/IPDPSW.2014.82(external)



Current chip multiprocessors (CMP) have mostly

been designed by replicating sequential/single core processors

and providing some support for operating them with a shared

memory. As a result of this, they define asynchronous computational

model of threads, often require maximizing the locality

of memory references to get decent performance, and feature

high intercommunication overheads, that make parallel

programming tedious for general purpose functionalities.

Most of these problems can be eliminated by designing the

processors architecture for scalable general purpose computing

from the very beginning like done in processors for configurable

emulated shared memory (CESM) CMPs. They provide

support for machine instruction-level synchronization,

make use of multithreading to support latency-insensitive

computation, and promote the concept of uniform synchronous

shared memory for easy variable allocation and convenient

data exchange. In our earlier work we have proposed the

first CESM architecture TOTAL ECLIPSE composed of early

MBTAC processors making use of very low-overhead multithreading,

parallel computing savvy functional unit organization,

support for fast synchronization between the instructions

and threads, and highly efficient multioperations.

Unfortunately, certain key parts of these processors turned

out to be hardly implementable and overall they lacked support

for ordered multiprefix operations and full configurability

of the CESM scheme. In this paper we introduce a new

fully configurable version of the MBTAC processor for our

new REPLICA CESM architecture and the first FPGA implementations

of it. To evaluate it, we execute short test programs

on it and compare it preliminary against Intel Core i7 and

DLX processors. Our FPGA design flow and testing approach

are described.




Last updated on 2024-26-11 at 15:08