HotML: A DSM-based machine learning system for social networks

https://doi.org/10.1016/j.jocs.2017.09.006Get rights and content

Highlights

  • Flexible consistency models are designed to boost machine learning algorithms.

  • HotML provides consistent and flexible checkpoint methods for fault tolerance.

  • HotML can achieve up to 1.9× performance compared to Petuum.

Abstract

In big data era, social networks, such as Twitter, Weibo, Facebook, are becoming more and more popular worldwide. To help social networks analysis, many machine learning (ML) algorithms have been adopted, e.g. user classification, link prediction, sentiment analysis, recommendations, etc. However, the dataset could be so large that it might take even days to train a model on a machine learning system. Performance issues should be considered to boost the training process. In this paper, we proposed HotML, a general machine learning system. HotML is designed in the parameter server (PS) architecture where the servers manage the globally shared parameters organized in tabular structure, and the workers compute the dataset in parallel and update the global parameters. HotML is based on our prior work DPS that provides high-level data abstraction, lightweight task scheduling system, and SSP consistency. HotML improved the DPS design by decoupling PS server and PS worker physically, and provides flexible consistency models including SSPPush, SSPDrop besides SSP, fault tolerance including consistent server-side checkpoint and flexible worker-side checkpoint, and workload balancing.

To demonstrate the performance and scalability of the proposed system, a series of experiments are conducted and the results show that HotML can reduce networking time by about 74%, and achieve up to 1.9× performance compared to the popular ML system, Petuum.

Introduction

In recent years, social networks such as Twitter, Weibo, Facebook, are becoming more and more popular worldwide. Social network data has increased dramatically and becomes invaluable for both the academia and the industry for research and commerce. To help social network analysis, many machine learning (ML) algorithms have been widely adopted to solve issues in social networks, such as spam bots detection [1], user classification [2], event detection [3], [4], [5], link prediction [6], [7], sentiment analysis [8], [9], topic learning [10] and many other fields. And such ML algorithms have achieved good results.

However, in so-called big data era, user scale of social networks could be in billions and the user generated contents (UGC) are extremely large. So two challenges have raised in machine learning for social networks. One is the big data, i.e. the data of training samples is extremely huge; the other one is the big model, i.e. ML algorithms often have to train parameters up to billions brought by the large scale of training samples and the deep architecture of neural networks. Such big data and big model bring a very serious computational performance issue which should be seriously and systematically considered.

Therefore, faced with the challenges of big data and big model raised by the thriving social networks, the parameter server (PS), as a high-throughput machine learning architecture for social networks, has gained much attention. The parameter server is a key-value store in a distributed shared memory fashion that enables clients to easily share access to the global model parameters stripped in multiple servers.

In our prior work DPS [11], a novel parameter server based on a recently proposed high-performance distributed shared memory (DSM) system, Grappa [12] which aims to provide a uniform memory view of machines for programmers to make writing distributed programs as if on a single machine, was introduced with high-level data abstraction, user-friendly programming interface with data flow operations like map/reduce, lightweight task scheduling system, and SSP consistency. However, there exist several drawbacks in DPS. (1) The server and worker are tightly coupled to a single node in DPS, which limits the overall performance because the server parameter request may be delayed due to executing worker tasks. (2) SSP consistency model may not make full use of the network bandwidth and there exist many trivial parameter updates that may waste network bandwidth. (3) DPS provides no fault tolerance, thus lacks high availability because ML algorithms for social networks may experience failures during the long-running training process. (4) DPS considers no load balancing which may be very important in a heterogeneous cluster with machines of different computing resources.

To overcome these drawbacks and further improve the performance. In this paper, based on DPS, we proposed HotML, a novel distributed machine learning system based on DSM for social networks that support both data parallelism and model parallelism. i.e. the global parameters are stored across the server nodes and each worker node takes a partition of the training data. And HotML contains many important components that cover the whole pipeline of machine learning for social networks.

The main contributions of HotML are as follows:

  • 1.

    The design of parameter server component in DPS is improved by decoupling the PS servers and workers physically. The dedicated parameter server is introduced to maximize the computing resources, and improve server throughput as well as the overall performance of HotML.

  • 2.

    Flexible consistency models are designed to boost the convergence of machine learning algorithms for social networks. SSP is implemented to relax the consistency as provided in DPS. In HotML, SSPPush, an improved version of SSP is introduced in servers to leverage idle network bandwidth to push global parameters to workers in advanced to reduce SSP waiting time. SSPDrop is designed in the worker to drop trivial parameter updates to reduce network communication. SSPDrop can work transparently with SSP or SSPPush.

  • 3.

    A flexible worker-side and a consistent server-side checkpoint mechanism are introduced to improve the availability of HotML because the DPS and the underlying DSM system Grappa does not provide fault tolerance mechanism and may be unreliable.

  • 4.

    A worker workload balancer is introduced to deal with the straggler problem.

A series of experiments are conducted to demonstrate the performance of the proposed system HotML. Experimental results show that HotML can reduce networking time by about 74%, and achieve up to 1.9× performance compared to the popular ML system, Petuum.

The rest of the paper is organized as follows. In Section 2, the background and related work are introduced. Section 3 describes the design and implementation of HotML. Section 4 presents performance evaluation results and analysis of HotML. Finally, the conclusion is in Section 5.

Section snippets

Background and related work

In this section, we will introduce consistency models, distributed shared memory, existing big data machine learning systems, and existing fault tolerance methods.

Design and implementation of HotML

In this section, we will first describe the overview of our HotML system and the implementation details of the key features, including flexible consistency models, lightweight task scheduler, GlobalTable data abstraction and programming interface, and fault tolerance.

Applications and evaluation

HotML was evaluated on several machine learning algorithms for social networks: matrix factorization (SGD MF) and logistic regression. We compared the performance of the proposed system, HotML, with the popular parameter system, Petuum [32], as well as our prior work, DPS [11]. Additional experiments are conducted to demonstrate the checkpoint, scalability, and workload balance of HotML.

Conclusion

In this paper, we described a DSM-based machine learning system with high performance for social networks, HotML. HotML is based on our prior work DPS and adopts DPS's high-level data abstraction and programming interfaces, user-level task scheduling and SSP consistency. To further improve the performance and availability, in HotML, the PS design is improved by decoupling PS server and PS worker physically to improve server throughput; SSPPush and SSPDrop consistency are adopted; consistent

Acknowledgements

This work is supported by China 973 Fundamental R&D Program (No. 2014CB340300), NSFC program (Nos. 61472022, 61421003), SKLSDE-2016ZX-11, and partly by the Beijing Advanced Innovation Center for Big Data and Brain Computing. We would also love to extend out gratitude to the reviewers for their valuable comments and suggestions that help improve the quality of this manuscript.

Yangyang Zhang is currently a Ph.D. student at the School of Computer Science and Engineering, Beihang University, China. His research interests include virtualization, machine learning, and distributed systems.

References (51)

  • S. Scellato et al.

    Exploiting place features in link prediction on location-based social networks

    Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

    (2011)
  • J. Zhao et al.

    MoodLens: an emoticon-based sentiment analysis system for Chinese tweets

    Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

    (2012)
  • A. Saha et al.

    Learning evolving and emerging topics in social media: a dynamic NMF approach with temporal regularization

    Proceedings of the Fifth ACM International Conference on Web Search and Data Mining

    (2012)
  • C. Sun et al.

    DPS: a DSM-based parameter server for machine learning

    14th International Symposium on Pervasive Systems, Algorithms and Networks

    (2017)
  • J. Nelson et al.

    Latency-tolerant software distributed shared memory

    Usenix Conference on Usenix Technical Conference

    (2015)
  • Apache Hadoop, https://hadoop.apache.org/ (accessed:...
  • Apache Spark, https://spark.apache.org/ (accessed:...
  • Spark ML, https://spark.apache.org/mllib/ (accessed:...
  • H. Cui et al.

    Exploiting bounded staleness to speed up big data analytics

    2014 USENIX Annual Technical Conference (USENIX ATC 14)

    (2014)
  • Q. Ho et al.

    More effective distributed ML via a stale synchronous parallel parameter server

    Adv. Neural Inf. Process. Syst.

    (2013)
  • W. Dai et al.

    High-performance distributed ML at scale through parameter server consistency models

    AAAI

    (2015)
  • G. Alverson et al.

    Exploiting heterogeneous parallelism on a multithreaded multiprocessor

    International Conference on Supercomputing

    (1992)
  • R. Alverson et al.

    The Tera computer system

    ACM SIGARCH Computer Architecture News

    (1990)
  • X. Meng et al.

    MLlib: machine learning in apache spark

    J. Mach. Learn. Res.

    (2015)
  • M. Zaharia et al.

    Spark: cluster computing with working sets

    Usenix Conference on Hot Topics in Cloud Computing

    (2010)
  • Cited by (6)

    • Automated classification of social network messages into Smart Cities dimensions

      2020, Future Generation Computer Systems
      Citation Excerpt :

      OSN have caused a shift on how people communicate and share knowledge [16,17] and OSN analysis has almost replaced any conventional social science tool (surveys, interviews, questionnaires) announcing thus, the computational social science [16]. In that direction, many machine learning (ML) techniques have been widely adopted to solve issues in OSN, such as spam bots detection, intrusion detection [18], user classification, event detection, sentiment analysis, topic learning and many other fields [15]. Each social media user can be seen as an agent or sensor that continuously shares information [19] both temporal (when) and spatial (where) and reveals activities and opinions about the urban ecosystem.

    • Hybrid classification structures for automatic COVID-19 detection

      2022, Journal of Ambient Intelligence and Humanized Computing
    • FreeLauncher: Lossless failure recovery of parameter servers with ultralight replication

      2021, Proceedings - International Conference on Distributed Computing Systems
    • Distributed graph computation meets machine learning

      2020, IEEE Transactions on Parallel and Distributed Systems

    Yangyang Zhang is currently a Ph.D. student at the School of Computer Science and Engineering, Beihang University, China. His research interests include virtualization, machine learning, and distributed systems.

    Jianxin Li received the PhD degree in January 2008. He is a professor in the School of Computer Science and Engineering, Beihang University. He was a visiting scholar in the Machine Learning Department, CMU, in 2015, and a visiting researchers of MSRA in 2011. His current research interests include virtualization and cloud computing, data analysis, and processing. He is a member of the IEEE and the ACM.

    Chenggen Sun received the M.S. degree in Computer Science from Beihang University, China in 2017. His research interests include machine learning and distributed systems.

    Md Zakirul Alam Bhuiyan received the PhD degree. He is currently an Assistant Professor of the Department of Computer and Information Sciences, Fordham University. Previously, he worked as an Assistant Professor with Temple University and a postdoctoral research fellow with Central South University, China. His research focuses on dependable cyber physical systems, WSN applications, big data, and cyber security. He has served as a lead guest editor of key journals including the IEEE Transactions on Big Data, the ACM Transactions on Cyber-Physical Systems, the Information Sciences, the IEEE IoT Journal. He has also served as the general chair, program chair, workshop chair, publicity chair, TPC member, and reviewer of various international journals/conferences. He is a member of IEEE and the ACM.

    Weiren Yu received the BE degree from the School of Advanced Engineering at Beihang University, China in 2011. He is currently a PhD candidate in the Department of Computer Science, Beihang University since 2011. His research interests include distributed machine learning systems, scalable graphical models and graph mining models for emerging event detection on social media.

    Richong Zhang received the BS and MASc degrees from Jilin University, China, in 2001 and 2004, respectively, the MS degree from Dalhousie University, Canada, in 2006, and the PhD degree from the School of Information Technology and Engineering, University of Ottawa, Canada, in 2011. He is currently an Associate Professor in the School of Computer Science and Engineering, Beihang University, China. His research interests include artificial intelligence and data mining. He is a member of the IEEE.

    View full text