Path-Based Partitioning Methods for 3D Networks-on-Chip with Minimal Adaptive Routing - UTU Research Portal

A1 Refereed original research article in a scientific journal

Path-Based Partitioning Methods for 3D Networks-on-Chip with Minimal Adaptive Routing

Authors: Masoumeh Ebrahimi, Masoud Daneshtalab, Pasi Liljeberg, Juha Plosila, José Flich, Hannu Tenhunen

Publisher: IEEE COMPUTER SOC

Publishing place: LOS ALAMITOS; 10662 LOS VAQUEROS CIRCLE, PO BOX 3014, LOS ALAMITOS, CA 90720-1314 USA

Publication year: 2014

Journal: IEEE Transactions on Computers

Journal name in source: IEEE Transactions on Computers

Journal acronym: IEEE Trans.Comput.

Volume: 63

Issue: 3

First page : 718

Last page: 733

Number of pages: 16

ISSN: 0018-9340

DOI: https://doi.org/10.1109/TC.2012.255

Abstract

Combining the benefits of 3D ICs and Networks-on-Chip (NoCs) schemes provides a significant performance gain in Chip Multiprocessors (CMPs) architectures. As multicast communication is commonly used in cache coherence protocols for CMPs and in various parallel applications, the performance of these systems can be significantly improved if multicast operations are supported at the hardware level. In this paper, we present several partitioning methods for the path-based multicast approach in 3D mesh-based NoCs, each with different levels of efficiency. In addition, we develop novel analytical models for unicast and multicast traffic to explore the efficiency of each approach. In order to distribute the unicast and multicast traffic more efficiently over the network, we propose the Minimal and Adaptive Routing (MAR) algorithm for the presented partitioning methods. The analytical and experimental results show that an advantageous method named Recursive Partitioning (RP) outperforms the other approaches. RP recursively partitions the network until all partitions contain a comparable number of switches and thus the multicast traffic is equally distributed among several subsets and the network latency is considerably decreased. The simulation results reveal that the RP method can achieve performance improvement across all workloads while performance can be further improved by utilizing the MAR algorithm. Nineteen percent average and 42 percent maximum latency reduction are obtained on SPLASH-2 and PARSEC benchmarks running on a 64-core CMP.