ABSTRACT
Managed languages such as Java and Scala are prevalently used in development of large-scale distributed systems. Under the managed runtime, when performing data transfer across machines, a task frequently conducted in a Big Data system, the system needs to serialize a sea of objects into a byte sequence before sending them over the network. The remote node receiving the bytes then deserializes them back into objects. This process is both performance-inefficient and labor-intensive: (1) object serialization/deserialization makes heavy use of reflection, an expensive runtime operation and/or (2) serialization/deserialization functions need to be hand-written and are error-prone. This paper presents Skyway, a JVM-based technique that can directly connect managed heaps of different (local or remote) JVM processes. Under Skyway, objects in the source heap can be directly written into a remote heap without changing their formats. Skyway provides performance benefits to any JVM-based system by completely eliminating the need (1) of invoking serialization/deserialization functions, thus saving CPU time, and (2) of requiring developers to hand-write serialization functions.
- Lars Backstrom, Dan Huttenlocher, Jon Kleinberg, and Xiangyang Lan. 2006 a. Group Formation in Large Social Networks: Membership, Growth, and Evolution KDD. 44--54. Google ScholarDigital Library
- Lars Backstrom, Dan Huttenlocher, Jon Kleinberg, and Xiangyang Lan. 2006 b. Group Formation in Large Social Networks: Membership, Growth, and Evolution KDD. 44--54. Google ScholarDigital Library
- Paolo Boldi and Sebastiano Vigna. 2004. The WebGraph Framework I: Compression Techniques WWW. 595--601. Google ScholarDigital Library
- Vinayak R. Borkar, Michael J. Carey, Raman Grover, Nicola Onose, and Rares Vernica. 2011. Hyracks: A flexible and extensible foundation for data-intensive computing ICDE. 1151--1162. Google ScholarDigital Library
- Yingyi Bu, Vinayak Borkar, Guoqing Xu, and Michael J. Carey. 2013. A Bloat-Aware Design for Big Data Applications. In ISMM. 119--130. Google ScholarDigital Library
- Ronnie Chaiken, Bob Jenkins, Per-Ake Larson, Bill Ramsey, Darren Shakib, Simon Weaver, and Jingren Zhou. 2008. SCOPE: easy and efficient parallel processing of massive data sets. Proc. VLDB Endow. Vol. 1, 2 (2008), 1265--1276. Google ScholarDigital Library
- Jeff Chase, Miche Baker-Harvey, Hank Levy, and Ed Lazowska. 1992. Opal: A Single Address Space System for 64-bit Architectures. SIGOPS Oper. Syst. Rev. Vol. 26, 2 (1992), 9. Google ScholarDigital Library
- Colfer. 2017. The Colfer Serializer. https://go.libhunt.com/project/colfer. (2017).Google Scholar
- Tyson Condie, Neil Conway, Peter Alvaro, Joseph M. Hellerstein, Khaled Elmeleegy, and Russell Sears. 2010. MapReduce online. In NSDI. 21--21. Google ScholarDigital Library
- Jeffrey Dean and Sanjay Ghemawat. 2004. MapReduce: Simplified Data Processing on Large Clusters OSDI. 137--150.Google Scholar
- Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: Simplified data processing on large clusters. Commun. ACM Vol. 51, 1 (2008), 107--113. Google ScholarDigital Library
- Izzat El Hajj, Alexander Merritt, Gerd Zellweger, Dejan Milojicic, Reto Achermann, Paolo Faraboschi, Wen-mei Hwu, Timothy Roscoe, and Karsten Schwan. 2016. SpaceJMP: Programming with Multiple Virtual Address Spaces ASPLOS. 353--368. Google ScholarDigital Library
- Lu Fang, Khanh Nguyen, Guoqing Xu, Brian Demsky, and Shan Lu. 2015. Interruptible Tasks: Treating Memory Pressure As Interrupts for Highly Scalable Data-Parallel Programs. In SOSP. 394--409. Google ScholarDigital Library
- Ionel Gog, Jana Giceva, Malte Schwarzkopf, Kapil Vaswani, Dimitrios Vytiniotis, Ganesan Ramalingam, Manuel Costa, Derek G. Murray, Steven Hand, and Michael Isard. 2015. Broom: Sweeping Out Garbage Collection from Big Data Systems HotOS. Google ScholarDigital Library
- Google. 2017. Orkut social network. http://snap.stanford.edu/data/com-Orkut.html. (2017).Google Scholar
- Steven M. Hand. 1999. Self-paging in the Nemesis Operating System. In OSDI. 73--86. Google ScholarDigital Library
- UC Irvine. 2014. Hyracks: A data parallel platform. http://code.google.com/p/hyracks/. (2014).Google Scholar
- Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly. 2007. Dryad: distributed data-parallel programs from sequential building blocks EuroSys. 59--72. Google ScholarDigital Library
- Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. 2010. What is Twitter, a Social Network or a News Media? WWW. 591--600. Google ScholarDigital Library
- A. Lindstrom, J. Rosenberg, and A. Dearle. 1995. The Grand Unified Theory of Address Spaces. In HotOS. 66--71. Google ScholarDigital Library
- Martin Maas, Tim Harris, Krste Asanović, and John Kubiatowicz. 2015. Trash Day: Coordinating Garbage Collection in Distributed Systems HotOS. Google ScholarDigital Library
- Martin Maas, Tim Harris, Krste Asanović, and John Kubiatowicz. 2016. Taurus: A Holistic Language Runtime System for Coordinating Distributed Managed-Language Applications. In ASPLOS. 457--471. Google ScholarDigital Library
- Jacob Nelson, Brandon Holt, Brandon Myers, Preston Briggs, Luis Ceze, Simon Kahan, and Mark Oskin. 2015. Latency-tolerant Software Distributed Shared Memory USENIX ATC. 291--305. Google ScholarDigital Library
- Khanh Nguyen, Lu Fang, Guoqing Xu, Brian Demsky, Shan Lu, Sanazsadat Alamian, and Onur Mutlu. 2016. Yak: A High-Performance Big-Data-Friendly Garbage Collector OSDI. 349--365. Google ScholarDigital Library
- Khanh Nguyen, Kai Wang, Yingyi Bu, Lu Fang, Jianfei Hu, and Guoqing Xu. 2015. textscFacade: A compiler and runtime for (almost) object-bounded big data applications. In ASPLOS. 675--690. Google ScholarDigital Library
- Christopher Olston, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar, and Andrew Tomkins. 2008. Pig Latin: a not-so-foreign language for data processing SIGMOD. 1099--1110. Google ScholarDigital Library
- Rob Pike, Sean Dorward, Robert Griesemer, and Sean Quinlan. 2005. Interpreting the data: Parallel analysis with Sawzall. Sci. Program. Vol. 13, 4 (2005), 277--298. Google ScholarDigital Library
- Richard Rashid, Avadis Tevanian, Michael Young, David Golub, Robert Baron, David Black, William Bolosky, and Jonathan Chew. 1987. Machine-independent Virtual Memory Management for Paged Uniprocessor and Multiprocessor Architectures. In ASPLOS. 31--39. Google ScholarCross Ref
- Eishay Smith. 2017. The Java Serialization Benchmark Set. https://github.com/eishay/jvm-serializers. (2017).Google Scholar
- Masahiko Takahashi, Kenji Kono, and Takashi Masuda. 1999. Efficient Kernel Support of Fine-Grained Protection Domains for Mobile Code ICDCS. 64--73.Google Scholar
- Ashish Thusoo, Joydeep Sen Sarma, Namit Jain, Zheng Shao, Prasad Chakka, Suresh Anthony, Hao Liu, Pete Wyckoff, and Raghotham Murthy. 2009. Hive: a warehousing solution over a map-reduce framework. Proc. VLDB Endow. Vol. 2, 2 (2009), 1626--1629. Google ScholarDigital Library
- TPC. 2014. The standard data warehousing benchmark. http://www.tpc.org/tpch. (2014).Google Scholar
- Duncan J. Watts and Steven H. Strogatz. 1998. Collective dynamics of `small-world' networks. Nature Vol. 393, 6684 (1998), 440--442.Google Scholar
- Michal Wegiel and Chandra Krintz. 2008. XMem: Type-safe, Transparent, Shared Memory for Cross-runtime Communication and Coordination. In PLDI. 327--338. Google ScholarDigital Library
- Java World. 2017. The Java serialization algorithm revealed. http://www.javaworld.com/article/2072752/the-java-serialization-algorithm-revealed.html. (2017).Google Scholar
- Hung-chih Yang, Ali Dasdan, Ruey-Lung Hsiao, and D. Stott Parker. 2007. Map-reduce-merge: simplified relational data processing on large clusters SIGMOD. 1029--1040. Google ScholarDigital Library
- Yuan Yu, Pradeep Kumar Gunda, and Michael Isard. 2009. Distributed Aggregation for Data-parallel Computing: Interfaces and Implementations. In SOSP. 247--260. Google ScholarDigital Library
- Yuan Yu, Michael Isard, Dennis Fetterly, Mihai Budiu, Úlfar Erlingsson, Pradeep Kumar Gunda, and Jon Currey. 2008. DryadLINQ: a system for general-purpose distributed data-parallel computing using a high-level language. In OSDI. 1--14. Google ScholarDigital Library
- Matei Zaharia. 2016. What is changing in Big Data? https://www.microsoft.com/en-us/research/wp-content/uploads/2016/07/Zaharia_Matei_Big_Data.pdf. (2016). MSR Faculty Summit.Google Scholar
- Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2010. Spark: Cluster computing with working sets. In HotCloud. Google ScholarDigital Library
Index Terms
- Skyway: Connecting Managed Heaps in Distributed Big Data Systems
Recommendations
Skyway: Connecting Managed Heaps in Distributed Big Data Systems
ASPLOS '18Managed languages such as Java and Scala are prevalently used in development of large-scale distributed systems. Under the managed runtime, when performing data transfer across machines, a task frequently conducted in a Big Data system, the system needs ...
A model for real time mobility based on the RTSJ
JTRES '07: Proceedings of the 5th international workshop on Java technologies for real-time and embedded systemsCurrent technologies for mobility do not consider the real-time behavior and predictability in their implementation. So, there is a need for languages, tools, and patterns that help to develop such systems. Java has been used widely in many of those ...
Comments