ABSTRACT
Personal cloud storage services are gaining popularity. With a rush of providers to enter the market and an increasing offer of cheap storage space, it is to be expected that cloud storage will soon generate a high amount of Internet traffic. Very little is known about the architecture and the performance of such systems, and the workload they have to face. This understanding is essential for designing efficient cloud storage systems and predicting their impact on the network.
This paper presents a characterization of Dropbox, the leading solution in personal cloud storage in our datasets. By means of passive measurements, we analyze data from four vantage points in Europe, collected during 42 consecutive days. Our contributions are threefold: Firstly, we are the first to study Dropbox, which we show to be the most widely-used cloud storage system, already accounting for a volume equivalent to around one third of the YouTube traffic at campus networks on some days. Secondly, we characterize the workload users in different environments generate to the system, highlighting how this reflects on network traffic. Lastly, our results show possible performance bottlenecks caused by both the current system architecture and the storage protocol. This is exacerbated for users connected far from storage data-centers.
All measurements used in our analyses are publicly available in anonymized form at the SimpleWeb trace repository: http://traces.simpleweb.org/dropbox/
Supplemental Material
Available for Download
Summary Review Documentation for "Inside Dropbox: Understanding Personal Cloud Storage Services", Authors: I. Drago, M. Mellia, M. Munafo, A. Sperotto, R. Sadre, A. Pras
- A. Bergen, Y. Coady, and R. McGeer. Client Bandwidth: The Forgotten Metric of Online Storage Providers. In Proceedings of the 2011 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing, PacRim'2011, pages 543--548, 2011.Google ScholarCross Ref
- I. Bermudez, M. Mellia, M. M. Munafò. R. Keralapura, and A. Nucci. DNS to the Rescue: Discerning Content and Services in a Tangled Web. In Proceedings of the 12th ACM SIGCOMM Conference on Internet Measurement, IMC'12, 2012. Google ScholarDigital Library
- M. Cha, H. Kwak, P. Rodriguez, Y.-Y. Ahn, and S. Moon. I Tube, You Tube, Everybody Tubes: Analyzing the World's Largest User Generated Content Video System. In Proceedings of the 7th ACM SIGCOMM Conference on Internet Measurement, IMC'07, pages 1--14, 2007. Google ScholarDigital Library
- N. Dukkipati, T. Refice, Y. Cheng, J. Chu, T. Herbert, A. Agarwal, A. Jain, and N. Sutin. An Argument for Increasing TCP's Initial Congestion Window. SIGCOMM Comput. Commun. Rev., 40(3):26--33, 2010. Google ScholarDigital Library
- A. Finamore, M. Mellia, M. Meo, M. M. Munafò and D. Rossi. Experiences of Internet Traffic Monitoring with Tstat. IEEE Network, 25(3):8--14, 2011.Google ScholarCross Ref
- A. Finamore, M. Mellia, M. M. Munafò, R. Torres, and S. G. Rao. YouTube Everywhere: Impact of Device and Infrastructure Synergies on User Experience. In Proceedings of the 11th ACM SIGCOMM Conference on Internet Measurement, IMC'11, pages 345--360, 2011. Google ScholarDigital Library
- M. Gjoka, M. Sirivianos, A. Markopoulou, and X. Yang. Poking Facebook: Characterization of OSN Applications. In Proceedings of the First Workshop on Online Social Networks, WOSN'08, pages 31--36, 2008. Google ScholarDigital Library
- S. Halevi, D. Harnik, B. Pinkas, and A. Shulman-Peleg. Proofs of Ownership in Remote Storage Systems. In Proceedings of the 18th ACM Conference on Computer and Communications Security, CCS'11, pages 491--500, 2011. Google ScholarDigital Library
- D. Harnik, B. Pinkas, and A. Shulman-Peleg. Side Channels in Cloud Services: Deduplication in Cloud Storage. IEEE Security and Privacy, 8(6):40--47, 2010. Google ScholarDigital Library
- S. Hätönen, A. Nyrhinen, L. Eggert, S. Strowes, P. Sarolahti, and M. Kojo. An Experimental Study of Home Gateway Characteristics. In Proceedings of the 10th ACM SIGCOMM Conference on Internet Measurement, IMC'10, pages 260--266, 2010. Google ScholarDigital Library
- W. Hu, T. Yang, and J. N. Matthews. The Good, the Bad and the Ugly of Consumer Cloud Storage. ACM SIGOPS Operating Systems Review, 44(3):110--115, 2010. Google ScholarDigital Library
- A. Lenk, M. Klems, J. Nimis, S. Tai, and T. Sandholm. What's Inside the Cloud? An Architectural Map of the Cloud Landscape. In Proceedings of the 2009 ICSE Workshop on Software Engineering Challenges of Cloud Computing, CLOUD'09, pages 23--31, 2009. Google ScholarDigital Library
- A. Li, X. Yang, S. Kandula, and M. Zhang. CloudCmp: Comparing Public Cloud Providers. In Proceedings of the 10th ACM SIGCOMM Conference on Internet Measurement, IMC'10, pages 1--14, 2010. Google ScholarDigital Library
- M. Mellia, M. Meo, L. Muscariello, and D. Rossi. Passive Analysis of TCP Anomalies. Computer Networks, 52(14):2663--2676, 2008. Google ScholarDigital Library
- A. Mislove, M. Marcon, K. P. Gummadi, P. Druschel, and B. Bhattacharjee. Measurement and Analysis of Online Social Networks. In Proceedings of the 7th ACM SIGCOMM Conference on Internet Measurement, IMC'07, pages 29--42, 2007. Google ScholarDigital Library
- M. Mulazzani, S. Schrittwieser, M. Leithner, M. Huber, and E. Weippl. Dark Clouds on the Horizon: Using Cloud Storage as Attack Vector and Online Slack Space. In Proceedings of the 20th USENIX Conference on Security, SEC'11, 2011. Google ScholarDigital Library
- G. Wang and T. E. Ng. The Impact of Virtualization on Network Performance of Amazon EC2 Data Center. In Proceedings of the 29th IEEE INFOCOM, pages 1--9, 2010. Google ScholarDigital Library
- Q. Zhang, L. Cheng, and R. Boutaba. Cloud Computing: State-of-the-Art and Research Challenges.Journal of Internet Services and Applications, 1:7--18, 2010.Google Scholar
- M. Zhou, R. Zhang, W. Xie, W. Qian, and A. Zhou. Security and Privacy in Cloud Computing: A Survey. In Sixth International Conference on Semantics Knowledge and Grid, SKG'10, pages 105--112, 2010. Google ScholarDigital Library
Index Terms
- Inside dropbox: understanding personal cloud storage services
Recommendations
Forgotten But Not Gone: Identifying the Need for Longitudinal Data Management in Cloud Storage
CHI '18: Proceedings of the 2018 CHI Conference on Human Factors in Computing SystemsUsers have accumulated years of personal data in cloud storage, creating potential privacy and security risks. This agglomeration includes files retained or shared with others simply out of momentum, rather than intention. We presented 100 online-survey ...
Characterizing and Modeling the Dropbox Workload
SBRC '14: Proceedings of the 2014 Brazilian Symposium on Computer Networks and Distributed SystemsThis paper presents a characterization and modeling of the workload of the currently most popular cloud storage system, Drop box. The workload is analyzed from two complementary perspectives. On one side, characteristics of the Drop box folders of a set ...
The HPS3 Service: Reduction of Cost and Transfer Time for Storing Data on Clouds
HPCC '14: Proceedings of the 2014 IEEE Intl Conf on High Performance Computing and Communications, 2014 IEEE 6th Intl Symp on Cyberspace Safety and Security, 2014 IEEE 11th Intl Conf on Embedded Software and Syst (HPCC,CSS,ICESS)In the past several years, organizations have been changing their storage methods as the volume of data they managed has increased. The cloud computing paradigm offers new ways of storing data based on scalability and on good conditions of reliability ...
Comments