doi:10.1016/j.peva.2004.07.008
Copyright © 2004 Elsevier B.V. All rights reserved.
Variable heavy tails in Internet traffic
aDepartment of Computer Science, University of North Carolina at Chapel Hill, NC 27599-3175, USA
bDepartment of Statistics, University of North Carolina at Chapel Hill, NC 27599-3260, USA
cDepartment of Statistical Science, Cornell University, Ithaca, NY 14853, USA
dSchool of Operations Research and Industrial Engineering, Cornell University, Ithaca, NY 14853, USA
Available online 11 September 2004.
Abstract
This paper studies tails of the size distribution of Internet data flows and their “heaviness”. Data analysis motivates the concepts of moderate, far and extreme tails for understanding the richness of information available in the data. The data analysis also motivates a notion of “variable tail index”, which leads to a generalization of existing theory for heavy-tail durations leading to long-range dependence.
Keywords: Heavy-tailed distributions; Long-range dependence; Extreme value theory; World Wide Web
Fig. 1. Mice and elephants visualization of IP flows. Shows how heavy-tail durations can lead to long-range dependence of aggregated traffic.
Fig. 2. Pareto Q–Q plot (solid) for the Thursday morning HTTP response size data. Compare to 45
line (dashed) and simulated versions (dotted). Shows that the Pareto fit is not perfect.
Fig. 3. LLCD plot of the distribution of the Thursday morning HTTP response size data. Horizontal axis is size in bytes. Shows wobbly tail, inconsistent with most standard distributions, such as the Pareto which should be linear on this scale.
Fig. 4. LLCD plots of distributions of HTTP response size data for all 21 traces. Note very similar pattern, showing “wobbles” are not sampling artifacts.
Fig. 5. Effective tail index plots for the Thursday morning HTTP response size data. The upper left panel is the same as Fig. 3, the others are essentially different numerical derivatives. Shows that while the effective tail index does not stabilize, it still “mostly” stays in critical range α
(1,2).
Fig. 6. LLCD plot for the distribution of Thursday morning HTTP response sizes, together with a visually fit double Pareto log-normal. The envelope of simulated LLCDs, from the double Pareto log-normal shows that the response size distribution is significantly different.
Fig. 7. LLCD plot for the Thursday morning HTTP response size data, together with a visually fit mixture of three double Pareto log-normals. This shows that the fit is surprisingly good, since the data LLCD lies essentially within the simulated envelope.
Fig. 8. LLCD plot for the Thursday morning HTTP response size data, together with a visually fit mixture of three log-normals. Again the theoretical fit is surprisingly good.
Fig. 9. LLCD plot for the HTTP response size data, together with a visually fit mixture of three double Pareto log-normals, for data from the New Zealand Internet Exchange. Again the fit is surprisingly good, showing that the observed phenomenon occurs for data outside of UNC.
Fig. 10. LLCD plot for the Tuesday afternoon HTTP response size data, together with a visually fit mixture of three double Pareto log-normals, for data from the University of Auckland. This again suggests generality of the notion of fitting this distribution to data.

Corresponding author.