doi:10.1016/j.peva.2004.10.008
Copyright © 2004 Elsevier B.V. All rights reserved.
Performance analysis of a QoS capable cluster interconnect
References and further reading may be available for this article. To view references and further reading you must
purchase this article.
Eun Jung Kima,
,
, Ki Hwan Yumb,
and Chita R. Dasc, 
aDepartment of Computer Science, Texas A&M University, College Station, TX 77843, USA
bDepartment of Computer Science, University of Texas, San Antonio, TX 78249, USA
cDepartment of Computer Science and Engineering, the Pennsylvania State University, University Park, PA 16802, USA
Available online 8 December 2004.
Abstract
The growing use of clusters in diverse applications, many of which have real-time constraints, requires quality-of-service (QoS) support from the underlying cluster interconnect. All prior studies on QoS-aware cluster routers/networks have used simulation for performance evaluation. In this paper, we present an analytical model for a wormhole-switched router with QoS provisioning. In particular, the model captures message blocking due to wormhole switching in a pipelined router, and bandwidth sharing due to a rate-based scheduling mechanism, called VirtualClock. Then we extend the model to a hypercube-style cluster network. Average message latency for different traffic classes and deadline missing probability for real-time applications are computed using the model.
We evaluate a 16-port router and hypercubes of different dimensions with a mixed workload of real-time and best-effort (BE) traffic. Comparison with the simulation results shows that the single router and the network models are quite accurate in providing the performance estimates, and thus can be used as efficient design tools.
Keywords: Analytical model; Cluster network; Pipelined router architecture; Quality-of-service; VirtualClock; Wormhole switching
Fig. 1. The pipelined router architecture with a full crossbar.
Fig. 2. Two server organizations. (a) Server without internal buffer, (b) Server with internal buffer.
Fig. 3. State transition diagram with 2 classes of real-time traffic.
Fig. 4. Network latency comparison of analytical model and simulation model in a 16-port router with (a) varying real-time load and fixed best-effort load (0.01 msgs/cycle), and (b) varying best-effort load and fixed real-time load (R1: 0.005 msgs/cycle).
Fig. 5. Components of message latency in a 16-port router with varying real-time load and fixed best-effort load (0.01 msgs/cycle). (a) Analytical model, (b) simulation model.
Fig. 6. Comparison of VirtualClock, Fair Queueing, and Weighted Round Robin in the 16-port router with varying real-time load and fixed best-effort load (0.01 msgs/cycle).
Fig. 7. Network latency comparison of analytical and simulation models in (a) a 5-cube, (b) a 6-cube and (c) a 7-cube with varying real-time load and fixed best-effort load (0.002 msgs/cycle).
Fig. 8. Latency components of analytical model and simulation model in a 6-cube with varying real-time load and fixed best-effort load (0.002 msgs/cycle). (a) Analytical model, (b) simulation model.
Fig. 9. Best-effort network latency in a 6-cube as a function of both real-time and best-effort load. (a) Analytical model, (b) simulation model.
Fig. 10. Network Latency in a 6-cube with variable best-effort message length (M = 64) (best-effort traffic load: 0.002 msgs/cycle).
Fig. 11. Network latency in a 6-cube with (a) different input/output buffer size (bs= 4, 16, 32 and 64 flits) and (b) 3 real-time classes under fixed best-effort load (0.002 msgs/cycle). (a) Various buffer sizes, (b) 3 real-time classes.
Fig. 12. Network latency comparison of analytical model and simulation model with ON/OFF real-time traffic (a) in a 16-port router with varying real-time load and fixed best-effort load (0.01 msgs/cycle), and (b) in a 6-cube with varying real-time load and fixed best-effort load (0.002 msgs/cycle). (a) Single router, (b) 6-cube.
Fig. 13. DMP comparison of analytical and simulation models in a single router with fixed best-effort load (0.01 msgs/cycle). (a) Single router with deadline 42 cycles, (b) single router with deadline 47 cycles.
Fig. 14. DMP comparison of analytical and simulation models in a 6-cube with varying real-time load and fixed best-effort load (0.002 msgs/cycle). (a) Deadline : 55 cycles (2 hops), 70 cycles (5 hops), (b) deadline : 60 cycles (2 hops), 75 cycles (5 hops).
Table 1.
Simulation parameters

A preliminary version of this paper was presented at the 11th GI/ITG Conference on Measuring, Modelling and Evaluation of Computer and Communication Systems (MMB 2001), September 2001.

Corresponding author.