skip to main content
10.1145/3173162.3173200acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article
Public Access

Skyway: Connecting Managed Heaps in Distributed Big Data Systems

Authors Info & Claims
Published:19 March 2018Publication History

ABSTRACT

Managed languages such as Java and Scala are prevalently used in development of large-scale distributed systems. Under the managed runtime, when performing data transfer across machines, a task frequently conducted in a Big Data system, the system needs to serialize a sea of objects into a byte sequence before sending them over the network. The remote node receiving the bytes then deserializes them back into objects. This process is both performance-inefficient and labor-intensive: (1) object serialization/deserialization makes heavy use of reflection, an expensive runtime operation and/or (2) serialization/deserialization functions need to be hand-written and are error-prone. This paper presents Skyway, a JVM-based technique that can directly connect managed heaps of different (local or remote) JVM processes. Under Skyway, objects in the source heap can be directly written into a remote heap without changing their formats. Skyway provides performance benefits to any JVM-based system by completely eliminating the need (1) of invoking serialization/deserialization functions, thus saving CPU time, and (2) of requiring developers to hand-write serialization functions.

References

  1. Lars Backstrom, Dan Huttenlocher, Jon Kleinberg, and Xiangyang Lan. 2006 a. Group Formation in Large Social Networks: Membership, Growth, and Evolution KDD. 44--54. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Lars Backstrom, Dan Huttenlocher, Jon Kleinberg, and Xiangyang Lan. 2006 b. Group Formation in Large Social Networks: Membership, Growth, and Evolution KDD. 44--54. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Paolo Boldi and Sebastiano Vigna. 2004. The WebGraph Framework I: Compression Techniques WWW. 595--601. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Vinayak R. Borkar, Michael J. Carey, Raman Grover, Nicola Onose, and Rares Vernica. 2011. Hyracks: A flexible and extensible foundation for data-intensive computing ICDE. 1151--1162. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Yingyi Bu, Vinayak Borkar, Guoqing Xu, and Michael J. Carey. 2013. A Bloat-Aware Design for Big Data Applications. In ISMM. 119--130. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Ronnie Chaiken, Bob Jenkins, Per-Ake Larson, Bill Ramsey, Darren Shakib, Simon Weaver, and Jingren Zhou. 2008. SCOPE: easy and efficient parallel processing of massive data sets. Proc. VLDB Endow. Vol. 1, 2 (2008), 1265--1276. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Jeff Chase, Miche Baker-Harvey, Hank Levy, and Ed Lazowska. 1992. Opal: A Single Address Space System for 64-bit Architectures. SIGOPS Oper. Syst. Rev. Vol. 26, 2 (1992), 9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Colfer. 2017. The Colfer Serializer. https://go.libhunt.com/project/colfer. (2017).Google ScholarGoogle Scholar
  9. Tyson Condie, Neil Conway, Peter Alvaro, Joseph M. Hellerstein, Khaled Elmeleegy, and Russell Sears. 2010. MapReduce online. In NSDI. 21--21. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Jeffrey Dean and Sanjay Ghemawat. 2004. MapReduce: Simplified Data Processing on Large Clusters OSDI. 137--150.Google ScholarGoogle Scholar
  11. Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: Simplified data processing on large clusters. Commun. ACM Vol. 51, 1 (2008), 107--113. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Izzat El Hajj, Alexander Merritt, Gerd Zellweger, Dejan Milojicic, Reto Achermann, Paolo Faraboschi, Wen-mei Hwu, Timothy Roscoe, and Karsten Schwan. 2016. SpaceJMP: Programming with Multiple Virtual Address Spaces ASPLOS. 353--368. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Lu Fang, Khanh Nguyen, Guoqing Xu, Brian Demsky, and Shan Lu. 2015. Interruptible Tasks: Treating Memory Pressure As Interrupts for Highly Scalable Data-Parallel Programs. In SOSP. 394--409. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Ionel Gog, Jana Giceva, Malte Schwarzkopf, Kapil Vaswani, Dimitrios Vytiniotis, Ganesan Ramalingam, Manuel Costa, Derek G. Murray, Steven Hand, and Michael Isard. 2015. Broom: Sweeping Out Garbage Collection from Big Data Systems HotOS. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Google. 2017. Orkut social network. http://snap.stanford.edu/data/com-Orkut.html. (2017).Google ScholarGoogle Scholar
  16. Steven M. Hand. 1999. Self-paging in the Nemesis Operating System. In OSDI. 73--86. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. UC Irvine. 2014. Hyracks: A data parallel platform. http://code.google.com/p/hyracks/. (2014).Google ScholarGoogle Scholar
  18. Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly. 2007. Dryad: distributed data-parallel programs from sequential building blocks EuroSys. 59--72. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. 2010. What is Twitter, a Social Network or a News Media? WWW. 591--600. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. A. Lindstrom, J. Rosenberg, and A. Dearle. 1995. The Grand Unified Theory of Address Spaces. In HotOS. 66--71. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Martin Maas, Tim Harris, Krste Asanović, and John Kubiatowicz. 2015. Trash Day: Coordinating Garbage Collection in Distributed Systems HotOS. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Martin Maas, Tim Harris, Krste Asanović, and John Kubiatowicz. 2016. Taurus: A Holistic Language Runtime System for Coordinating Distributed Managed-Language Applications. In ASPLOS. 457--471. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Jacob Nelson, Brandon Holt, Brandon Myers, Preston Briggs, Luis Ceze, Simon Kahan, and Mark Oskin. 2015. Latency-tolerant Software Distributed Shared Memory USENIX ATC. 291--305. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Khanh Nguyen, Lu Fang, Guoqing Xu, Brian Demsky, Shan Lu, Sanazsadat Alamian, and Onur Mutlu. 2016. Yak: A High-Performance Big-Data-Friendly Garbage Collector OSDI. 349--365. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Khanh Nguyen, Kai Wang, Yingyi Bu, Lu Fang, Jianfei Hu, and Guoqing Xu. 2015. textscFacade: A compiler and runtime for (almost) object-bounded big data applications. In ASPLOS. 675--690. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Christopher Olston, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar, and Andrew Tomkins. 2008. Pig Latin: a not-so-foreign language for data processing SIGMOD. 1099--1110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Rob Pike, Sean Dorward, Robert Griesemer, and Sean Quinlan. 2005. Interpreting the data: Parallel analysis with Sawzall. Sci. Program. Vol. 13, 4 (2005), 277--298. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Richard Rashid, Avadis Tevanian, Michael Young, David Golub, Robert Baron, David Black, William Bolosky, and Jonathan Chew. 1987. Machine-independent Virtual Memory Management for Paged Uniprocessor and Multiprocessor Architectures. In ASPLOS. 31--39. Google ScholarGoogle ScholarCross RefCross Ref
  29. Eishay Smith. 2017. The Java Serialization Benchmark Set. https://github.com/eishay/jvm-serializers. (2017).Google ScholarGoogle Scholar
  30. Masahiko Takahashi, Kenji Kono, and Takashi Masuda. 1999. Efficient Kernel Support of Fine-Grained Protection Domains for Mobile Code ICDCS. 64--73.Google ScholarGoogle Scholar
  31. Ashish Thusoo, Joydeep Sen Sarma, Namit Jain, Zheng Shao, Prasad Chakka, Suresh Anthony, Hao Liu, Pete Wyckoff, and Raghotham Murthy. 2009. Hive: a warehousing solution over a map-reduce framework. Proc. VLDB Endow. Vol. 2, 2 (2009), 1626--1629. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. TPC. 2014. The standard data warehousing benchmark. http://www.tpc.org/tpch. (2014).Google ScholarGoogle Scholar
  33. Duncan J. Watts and Steven H. Strogatz. 1998. Collective dynamics of `small-world' networks. Nature Vol. 393, 6684 (1998), 440--442.Google ScholarGoogle Scholar
  34. Michal Wegiel and Chandra Krintz. 2008. XMem: Type-safe, Transparent, Shared Memory for Cross-runtime Communication and Coordination. In PLDI. 327--338. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Java World. 2017. The Java serialization algorithm revealed. http://www.javaworld.com/article/2072752/the-java-serialization-algorithm-revealed.html. (2017).Google ScholarGoogle Scholar
  36. Hung-chih Yang, Ali Dasdan, Ruey-Lung Hsiao, and D. Stott Parker. 2007. Map-reduce-merge: simplified relational data processing on large clusters SIGMOD. 1029--1040. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Yuan Yu, Pradeep Kumar Gunda, and Michael Isard. 2009. Distributed Aggregation for Data-parallel Computing: Interfaces and Implementations. In SOSP. 247--260. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Yuan Yu, Michael Isard, Dennis Fetterly, Mihai Budiu, Úlfar Erlingsson, Pradeep Kumar Gunda, and Jon Currey. 2008. DryadLINQ: a system for general-purpose distributed data-parallel computing using a high-level language. In OSDI. 1--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Matei Zaharia. 2016. What is changing in Big Data? https://www.microsoft.com/en-us/research/wp-content/uploads/2016/07/Zaharia_Matei_Big_Data.pdf. (2016). MSR Faculty Summit.Google ScholarGoogle Scholar
  40. Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2010. Spark: Cluster computing with working sets. In HotCloud. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Skyway: Connecting Managed Heaps in Distributed Big Data Systems

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        ASPLOS '18: Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems
        March 2018
        827 pages
        ISBN:9781450349116
        DOI:10.1145/3173162
        • cover image ACM SIGPLAN Notices
          ACM SIGPLAN Notices  Volume 53, Issue 2
          ASPLOS '18
          February 2018
          809 pages
          ISSN:0362-1340
          EISSN:1558-1160
          DOI:10.1145/3296957
          Issue’s Table of Contents

        Copyright © 2018 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 19 March 2018

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        ASPLOS '18 Paper Acceptance Rate56of319submissions,18%Overall Acceptance Rate535of2,713submissions,20%

        Upcoming Conference

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader