ABSTRACT
Passive traffic measurement increasingly employs sampling at the packet level. Many high-end routers form flow statistics from a sampled substream of packets. Sampling is necessary in order to control the consumption of resources by the measurement operations. However, knowledge of the statistics of flows in the unsampled stream remains useful, for understanding both characteristics of source traffic, and consumption of resources in the network.This paper provide methods that use flow statistics formed from sampled packet stream to infer the absolute frequencies of lengths of flows in the unsampled stream. A key part of our work is inferring the numbers and lengths of flows of original traffic that evaded sampling altogether. We achieve this through statistical inference, and by exploiting protocol level detail reported in flow records. The method has applications to detection and characterization of network attacks: we show how to estimate, from sampled flow statistics, the number of compromised hosts that are sending attack traffic past the measurement point. We also investigate the impact on our results of different implementations of packet sampling.
- J. Apisdorf, K. Claffy, K. Thompson, R. Wilder, "OC3MON: Flexible, Affordable, High Performance Statistics Collection," See: http://www.nlanr.net/NA/Oc3monGoogle Scholar
- B.-Y. Choi, J.Park, Zh.-L. Zhang, "Adaptive Random Sampling for Load Change Detection", ACM SIGMETRICS 2002 (Extended Abstract). Google ScholarDigital Library
- Cisco NetFlow; for further information see http://www.cisco.com/warp/public/732/netflow/index.htmlGoogle Scholar
- K. C. Claffy, H.-W. Braun, and G. C. Polyzos. "Parameterizable methodology for internet traffic flow profiling", IEEE Journal on Selected Areas in Communications, vol. 13, no. 8, pp. 1481--1494, Oct. 1995. Google ScholarDigital Library
- K. C. Claffy, G. C. Polyzos, and H.-W. Braun. "Application of Sampling Methodologies to Network Traffic Characterization", Proceedings ACM SIGCOMM'93, San Francisco, CA, September pp. 13--17, 1993. Google ScholarDigital Library
- D. Comer, "Internetworking with TCP/IP, Volume 1: Principles, Protocols, and Architecture", Third Edition, Prentice Hall, NJ, 1995. Google ScholarDigital Library
- A. P. Dempster, N. M. Laird, D. B. Rubin, "Maximum likelihood from incomplete data via the EM algorithm (with discussion)", J. Roy. Statist. Soc. Ser., vol. 39, pp. 1--38, 1977.Google Scholar
- N. G. Duffield, C. Lund, M. Thorup, "Charging from sampled network usage," ACM SIGCOMM Internet Measurement Workshop 2001, San Francisco, CA, November 1-2, 2001. Google ScholarDigital Library
- N. G. Duffield, C. Lund, M. Thorup, "Properties and Prediction of Flow Statistics from Sampled Packet Streams", ACM SIGCOMM Internet Measurement Workshop 2002, Marseille, France, November 6-8, 2002. Google ScholarDigital Library
- C. Estan and G. Varghese, "New Directions in Traffic Measurement and Accounting", Proc SIGCOMM 2002, Pittsburgh, PA, August 19--23, 2002. Google ScholarDigital Library
- A. Feldmann, R. Caceres, F. Douglis, G. Glass, M. Rabinovich, "Performance of Web Proxy Caching in Heterogeneous Bandwidth Environments," in Proc. IEEE INFOCOM'99, New York, NY, March 23-25, 1999.Google Scholar
- A. Feldmann, J. Rexford, and R. Cáceres, "Efficient Policies for Carrying Web Traffic over Flow-Switched Networks," IEEE/ACM Transactions on Networking, vol. 6, no.6, pp. 673--685, December 1998. Google ScholarDigital Library
- P.J. Haas and L. Stokes, "Estimating the number of classes in a finite population," J. Amer. Statist. Assoc., vol. 93, pp 1475--1487, 1998.Google ScholarCross Ref
- Inmon Corporation, "sFlow accuracy and billing", see: http://www.inmon.com/PDF/sFlowBilling.pdfGoogle Scholar
- P.J. Green, "On the use of the EM algorithm for penalized likelihood estimation," J. R. Statist. Soc. B, vol. 52, pp. 443--452, 1990.Google Scholar
- "Internet Protocol Flow Information eXport" (IPFIX). IETF Working Group. See: http://net.doit.wisc.edu/ipfix/Google Scholar
- D. Moore, V. Paxson, S. Savage, C. Shannon, S. Staniford, N. Weaver, "The Spread of the Sapphire/Slammer Worm", Technical Report, CAIDA, 2003. See http://www.caida.org/outreach/papers/2003/sapphire/sapphire.html.Google Scholar
- NLANR Moat PMA trace archive. See http://pma.nlanr.net/Traces/long/ipls1.htmlGoogle Scholar
- V. Paxson, "Empirically-Derived Analytic Models of Wide-Area TCP Connections", IEEE/ACM Transactions on Networking, Vol. 2 No. 4, August 1994. Google ScholarDigital Library
- V. Paxson, G. Almes, J. Mahdavi, M. Mathis, "Framework for IP Performance Metrics", RFC 2330, May 1998. Google ScholarDigital Library
- Packet Sampling (PSAMP) IETF Working Group Charter. See http://www.ietf.org/html.charters/psamp-charter.htmlGoogle Scholar
- J. Postel, "Transmission Control Protocol," RFC 793, September 1981.Google Scholar
- L. Sachs, "Applied Statistics", Second Edition, Springer, New York, 1984.Google Scholar
- C.F. Jeff Wu, "On the convergence properties of the EM algorithm", Annals of Statistics, vol. 11, pp. 95--103, 1982.Google Scholar
Index Terms
- Estimating flow distributions from sampled flow statistics
Recommendations
Estimating flow distributions from sampled flow statistics
Passive traffic measurement increasingly employs sampling at the packet level. Many high-end routers form flow statistics from a sampled substream of packets. Sampling controls the consumption of resources by the measurement operations. However, ...
Estimating Flow Length Distributions from Double-Sampled Flow Statistics
SCALCOM-EMBEDDEDCOM '09: Proceedings of the 2009 International Conference on Scalable Computing and Communications; Eighth International Conference on Embedded ComputingKnowing the length distributions of traffic flows passing through a network link is useful for some applications such as inferring traffic demands, characterizing source traffic, and detecting traffic anomalies. The collection of the necessary ...
A Novel Method for Estimating Flow Length Distributions from Double-Sampled Flow Statistics
HPCC '10: Proceedings of the 2010 IEEE 12th International Conference on High Performance Computing and CommunicationsSince the generation of detailed traffic statistics does not scale well with link speed, increasingly passive traffic measurement employs sampling at the packet or flow level. Sampling has become an attractive and scalable means to measure flow data on ...
Comments