doi:10.1016/j.comnet.2006.05.003
Copyright © 2006 Elsevier B.V. All rights reserved.
Modeling and generating realistic streaming media server workloads
Wenting Tanga, 1,
,
, Yun Fub, 2,
,
, Ludmila Cherkasovaa,
,
and Amin Vahdatb, 
aHewlett-Packard Laboratories, 1501 Page Mill Road, Palo Alto, CA 94303, United States
bDepartment of Computer Science and Engineering, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093, United States
Received 28 May 2004;
revised 27 February 2006;
accepted 2 May 2006.
Responsible Editor: U. Krieger.
Available online 14 June 2006.
References and further reading may be available for this article. To view references and further reading you must
purchase this article.
Abstract
Currently, Internet hosting centers and content distribution networks leverage statistical multiplexing to meet the performance requirements of a number of competing hosted network services. Developing efficient resource allocation mechanisms for such services requires an understanding of both the short-term and long-term behavior of client access patterns to these competing services. At the same time, streaming media services are becoming increasingly popular, presenting new challenges for designers of shared hosting services. These new challenges result from fundamentally new characteristics of streaming media relative to traditional web objects, principally different client access patterns and significantly larger computational and bandwidth overhead associated with a streaming request. To understand the characteristics of these new workloads we use two long-term traces of streaming media services to develop MediSyn, a publicly available streaming media workload generator. In summary, this paper makes the following contributions: (i) we propose a framework for modeling long-term behavior of network services by capturing the process of file introduction, non-stationary popularity of media accesses, file duration, encoding bit rate, and session duration. (ii) We propose a variety of practical models based on the study of the two workloads. (iii) We develop an open-source synthetic streaming service workload generator to demonstrate the capability of our framework to capture the models.
Keywords: Streaming media server workload; Synthetic workload generator; Media access patterns; Temporal and static properties; Non-stationary popularity; Zipf–Mandelbrot law; File life span; Modeling
Fig. 1. PDF of the HPC duration distribution: (a) four normal distributions to capture the four peaks and (b) the aggregate distribution of the four normal distributions.
Fig. 2. PDF of the HPL duration distribution: (a) five normal distributions to capture the five peaks and (b) the aggregate distribution of the five normal distributions.
Fig. 3. PDF of the MediSyn duration distribution: (a) Four normal distributions to capture the four peaks and (b) the aggregate distribution of the four normal distributions.
Fig. 4. The original popularity distributions of the HPC trace and the HPL trace on a log–log scale.
Fig. 5. The popularity distributions after Zipf k-transformation on a log–log scale.
Fig. 6. Comparison between the popularity distribution generated by MediSyn and the original traces on a log–log scale.
Fig. 7. Popularity rank vs. duration for the HPC trace.
Fig. 8. Popularity rank vs. duration for the HPL trace.
Fig. 9. A typical access prefix distribution of short duration media file.
Fig. 10. A typical access prefix distribution of long duration media file.
Fig. 11. Fraction of complete sessions (rc’s) versus the corresponding media file durations.
Fig. 12.
of all bins versus the corresponding bin duration on a log–log scale.
Fig. 14. CDFs of the prefixes generated by MediSyn and the HPC trace.
Fig. 15. (a) Reference distances calculated over the entire trace period in the original HPC trace and (b) reference distances calculated over the entire trace period in the permuted HPC trace.
Fig. 16. (a) Reference distances calculated within each day in the original HPC trace and (b) reference distances calculated within each day in the permuted HPC trace.
Fig. 17. New file introduction gaps measured in days for the HPC trace.
Fig. 18. New file introduction gaps measured in days for the HPL trace.
Fig. 19. The number of new files introduced per introduction day.
Fig. 20. The histogram and the PDF of the new file introduction gap for the HPC trace.
Fig. 21. The histogram and the PDF of the new file introduction gap for the HPL trace.
Fig. 22. The histogram and the PDF for the number of new files introduced per day.
Fig. 23. New file introduction time gaps within an introduction day.
Fig. 24. The start times of new file introduction within introduction days.
Fig. 25. The rotated start times of new file introduction within introduction days.
Fig. 26. The correlation of file introduction time between the HPC workload and the workload generated by MediSyn.
Fig. 27. A regular lifespan.
Fig. 28. A news-like lifespan.
Fig. 29. PDF that a file at a certain rank has a Pareto life span distribution.
Fig. 30. The PDF of session access interarrival time gaps for a file measured in a day.
Fig. 31. The PDF of session access interarrival time gaps for a file measured in an hour.
Fig. 32. The session access diurnal pattern for the HPC trace. Each bin is an hour.
Table 1.
Summary for two media logs used to develop property models in MediSyn

Table 2.
Parameters of the normal distributions for the HPC trace

Table 3.
Parameters of the normal distributions for the HPL trace

Table 4.
Parameters of Zipf k-transformation of the HPC and the HPL traces

Table 5.
Correlation coefficient between file popularity and file duration

Table 6.
The parameters for the distributions (normal distributions) of the parameters in lognormal and pareto life span distributions

Table 7.
Properties generated for each file
