Abstract
Low-rank matrix approximation is an important tool in data mining with a wide range of applications, including recommender systems, clustering, and identifying topics in documents. When the matrix to be approximated originates from a large distributed system, such as a network of mobile phones or smart meters, a challenging problem arises due to the strongly conflicting yet essential requirements of efficiency, robustness, and privacy preservation. We argue that although collecting sensitive data in a centralized fashion may be efficient, it is not an option when considering privacy and efficiency at the same time. Thus, we do not allow any sensitive data to leave the nodes of the network. The local information at each node (personal attributes, documents, media ratings, etc.) defines one row in the matrix. This means that all computations have to be performed at the edge of the network. Known parallel methods that respect the locality constraint, such as synchronized parallel gradient search or distributed iterative methods, require synchronized rounds or have inherent issues with load balancing, and thus they are not robust to failure. Our distributed stochastic gradient descent algorithm overcomes these limitations. During the execution, any sensitive information remains local, whereas the global features (e.g., the factor model of movies) converge to the correct value at all nodes. We present a theoretical derivation and a thorough experimental evaluation of our algorithm. We demonstrate that the convergence speed of our method is competitive while not relying on synchronization and being robust to extreme and realistic failure scenarios. To demonstrate the feasibility of our approach, we present trace-based simulations, real smartphone user behavior analysis, and tests over real movie recommender system data.
- Dimitris Achlioptas and Frank McSherry. 2005. On spectral learning of mixtures of distributions. In Proceedings of the 18th Annual Conference on Learning Theory (COLT’05). 458--469. Google ScholarDigital Library
- Waseem Ahmad and Ashfaq Khokhar. 2006. Secure aggregation in large scale overlay networks. In Proceedings of the IEEE Global Telecommunications Conference (GLOBECOM’06). DOI:http://dx.doi.org/10.1109/GLOCOM.2006.315Google Scholar
- Ethem Alpaydin. 2010. Introduction to Machine Learning (2nd ed.). MIT Press, Cambridge, MA. Google ScholarDigital Library
- Yossi Azar, Amos Fiat, Anna R. Karlin, Frank McSherry, and Jared Saia. 2001. Spectral analysis of data. In Proceedings of the 33rd Symposium on Theory of Computing (STOC’01). 619--626. Google ScholarDigital Library
- K. Bache and M. Lichman. 2013. UCI Machine Learning Repository. Retrieved March 13, 2016, from http://archive.ics.uci.edu/ml.Google Scholar
- Austin R. Benson, David F. Gleich, and James Demmel. 2013. Direct QR factorizations for tall-and-skinny matrices in MapReduce architectures. arXiv:1301.1071 {cs.DC}.Google Scholar
- Arnaud Berlioz, Arik Friedman, Mohamed Ali Kaafar, Roksana Boreli, and Shlomo Berkovsky. 2015. Applying differential privacy to matrix factorization. In Proceedings of the 9th ACM Conference on Recommender Systems. ACM, New York, NY, 107--114. Google ScholarDigital Library
- Michael W. Berry, Susan T. Dumais, and Gavin W. O’Brien. 1995. Using linear algebra for intelligent information retrieval. SIAM Review 37, 4, 573--595. Google ScholarDigital Library
- Árpád Berta, Vilmos Bilicki, and Márk Jelasity. 2014. Defining and understanding smartphone churn over the Internet: A measurement study. In Proceedings of the 14th IEEE International Conference on Peer-to-Peer Computing (P2P’14). IEEE, Los Alamitos, CA. DOI:http://dx.doi.org/10.1109/P2P.2014.6934317Google ScholarCross Ref
- Ken Birman, Márk Jelasity, Robert Kleinberg, and Edward Tremel. 2015. Building a secure and privacy-preserving smart grid. ACM SIGOPS Operating Systems Review 49, 1, 131--136. DOI:http://dx.doi.org/10.1145/2723872.2723891 Google ScholarDigital Library
- Cheng-Tao Chu, Sang Kyun Kim, Yi-An Lin, YuanYuan Yu, Gary Bradski, Andrew Y. Ng, and Kunle Olukotun. 2007. Map-reduce for machine learning on multicore. In Advances in Neural Information Processing Systems 19 (NIPS 2006). 281--288.Google Scholar
- Fan Chung, Linyuan Lu, and Van Vu. 2003. Eigenvalues of random power law graphs. Annals of Combinatorics 7, 1, 21--33.Google ScholarCross Ref
- Gábor Danner and Márk Jelasity. 2015. Fully distributed privacy preserving mini-batch gradient descent learning. In Distributed Applications and Interoperable Systems. Lecture Notes in Computer Science, Vol. 9038. Springer, 30--44. DOI:http://dx.doi.org/10.1007/978-3-319-19129-4_3Google Scholar
- P. Drineas, A. Frieze, R. Kannan, S. Vempala, and V. Vinay. 2004. Clustering large graphs via the singular value decomposition. Machine Learning 56, 1--3, 9--33. Google ScholarDigital Library
- Petros Drineas, Ravi Kannan, and Michael W. Mahoney. 2006. Fast Monte Carlo algorithms for matrices II: Computing a low-rank approximation to a matrix. SIAM Journal on Computing 36, 1, 158--183. DOI:http://dx.doi.org/10.1137/S0097539704442696 Google ScholarDigital Library
- Petros Drineas, Iordanis Kerenidis, and Prabhakar Raghavan. 2002. Competitive recommendation systems. In Proceedings of the 34th Symposium on Theory of Computing (STOC’02). 82--90. Google ScholarDigital Library
- Cynthia Dwork. 2011. A firm foundation for private data analysis. Communications of the ACM 54, 1, 86--95. DOI:http://dx.doi.org/10.1145/1866739.1866758 Google ScholarDigital Library
- Rainer Gemulla, Erik Nijkamp, Peter J. Haas, and Yannis Sismanis. 2011. Large-scale matrix factorization with distributed stochastic gradient descent. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’11). ACM, New York, NY, 69--77. DOI:http://dx.doi.org/10.1145/2020408.2020426 Google ScholarDigital Library
- Alan Genz. 1999. Methods for generating random orthogonal matrices. In Monte Carlo and Quasi-Monte Carlo Methods, H. Niederreiter and J. Spanier (Eds.). Springer, 199--213.Google Scholar
- Genevieve Gorrell. 2006. Generalized Hebbian algorithm for incremental singular value decomposition in natural language processing. In Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL’06).Google Scholar
- Naiyang Guan, Dacheng Tao, Zhigang Luo, and Bo Yuan. 2012a. NeNMF: An optimal gradient method for nonnegative matrix factorization. IEEE Transactions on Signal Processing 60, 6, 2882--2898. DOI:http://dx.doi.org/10.1109/TSP.2012.2190406 Google ScholarDigital Library
- Naiyang Guan, Dacheng Tao, Zhigang Luo, and Bo Yuan. 2012b. Online nonnegative matrix factorization with robust stochastic approximation. IEEE Transactions on Neural Networks and Learning Systems 23, 7, 1087--1099. DOI:http://dx.doi.org/10.1109/TNNLS.2012.2197827Google ScholarCross Ref
- Zhenqi Huang, Sayan Mitra, and Nitin Vaidya. 2015. Differentially private distributed optimization. In Proceedings of the 2015 International Conference on Distributed Computing and Networking. ACM, New York, NY, 4. Google ScholarDigital Library
- Sibren Isaacman, Stratis Ioannidis, Augustin Chaintreau, and Margaret Martonosi. 2011. Distributed rating prediction in user generated content streams. In Proceedings of the 5th ACM Conference on Recommended Systems (RecSys’11). ACM, New York, NY, 69--76. DOI:http://dx.doi.org/10.1145/2043932.2043948 Google ScholarDigital Library
- Ravindran Kannan, Hadi Salmasian, and Santosh Vempala. 2005. The spectral method for general mixture models. In Proceedings of the 18th Annual Conference on Learning Theory (COLT’05). 444--457. Google ScholarDigital Library
- David Kempe and Frank McSherry. 2004. A decentralized algorithm for spectral analysis. In Proceedings of the 36th Symposium on Theory of Computing (STOC’04). ACM, New York, NY, 561--568. Google ScholarDigital Library
- Jon Kleinberg. 1999. Authoritative sources in a hyperlinked environment. Journal of the ACM 46, 5, 604--632. Google ScholarDigital Library
- Satish Babu Korada, Andrea Montanari, and Sewoong Oh. 2011. Gossip PCA. In Proceedings of the ACM SIGMETRICS Joint International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS’11) ACM, New York, NY, 209--220. Google ScholarDigital Library
- Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization techniques for recommender systems. Computer 42, 8, 30--37. DOI:http://dx.doi.org/10.1109/MC.2009.263 Google ScholarDigital Library
- Quoc Le, Marc’Aurelio Ranzato, Rajat Monga, Matthieu Devin, Kai Chen, Greg Corrado, Jeff Dean, and Andrew Ng. 2012. Building high-level features using large scale unsupervised learning. In Proceedings of the 29th International Conference on Machine Learning (ICML’12). 81--88.Google Scholar
- Yongjun Liao, Pierre Geurts, and Guy Leduc. 2010. Network distance prediction based on decentralized matrix factorization. In NETWORKING 2010. Lecture Notes in Computer Science, Vol. 6091. Springer, 15--26. DOI:http://dx.doi.org/10.1007/978-3-642-12963-6_2 Google ScholarDigital Library
- Qing Ling, Yangyang Xu, Wotao Yin, and Zaiwen Wen. 2012. Decentralized low-rank matrix completion. In Proceedings of the 2012 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’12). 2925--2928. DOI:http://dx.doi.org/10.1109/ICASSP.2012.6288528Google ScholarCross Ref
- Yucheng Low, Joseph Gonzalez, Aapo Kyrola, Danny Bickson, Carlos Guestrin, and Joseph M. Hellerstein. 2010. GraphLab: A new parallel framework for machine learning. In Proceedings of the Conference on Uncertainty in Artificial Intelligence.Google Scholar
- Frank McSherry. 2001. Spectral partitioning of random graphs. In Proceedings of the 42nd Annual Symposium on Foundations of Computer Science (FOCS’01). 529--537. Google ScholarDigital Library
- Milena Mihail and Christos Papadimitriou. 2002. On the eigenvalue power law. In Randomization and Approximation Techniques in Computer Science. Lecture Notes in Computer Science, Vol. 2483. Springer, 254--262. DOI:http://dx.doi.org/10.1007/3-540-45726-7_20 Google ScholarDigital Library
- Alberto Montresor and Márk Jelasity. 2009. PeerSim: A scalable P2P simulator. In Proceedings of the 9th IEEE International Conference on Peer-to-Peer Computing. IEEE, Los Alamitos, CA, 99--100. DOI:http://dx.doi.org/10.1109/P2P.2009.5284506 extended abstract.Google ScholarCross Ref
- Valeria Nikolaenko, Stratis Ioannidis, Udi Weinsberg, Marc Joye, Nina Taft, and Dan Boneh. 2013. Privacy-preserving matrix factorization. In Proceedings of the 20th ACM Conference on Computer and Communications Security (CCS’13). ACM, New York, NY, 801--812. DOI:http://dx.doi.org/10.1145/2508859.2516751 Google ScholarDigital Library
- T. Nis. 1999. JAMA: A Java Matrix Package. Retrieved March 13, 2016, from http://math.nist.gov/javanumerics/jama.Google Scholar
- Róbert Ormándi, István Hegedűs, and Márk Jelasity. 2013. Gossip learning with linear models on fully distributed data. Concurrency and Computation: Practice and Experience 25, 4, 556--571. DOI:http://dx.doi.org/10.1002/cpe.2858Google ScholarCross Ref
- Christos H. Papadimitriou, Hisao Tamaki, Prabhakar Raghavan, and Santosh Vempala. 2000. Latent semantic indexing: A probabilistic analysis. Journal of Computer and System Sciences 61, 2, 217--235. Google ScholarDigital Library
- Fabio Petroni and Leonardo Querzoni. 2014. GASGD: Stochastic gradient descent for distributed asynchronous matrix completion via graph partitioning. In Proceedings of the 8th ACM Conference on Recommender Systems (RecSys’14). ACM, New York, NY, 241--248. DOI:http://dx.doi.org/10.1145/2645710.2645725 Google ScholarDigital Library
- Paul Resnick, Neophytos Iacovou, Mitesh Suchak, Peter Bergstrom, and John Riedl. 1994. GroupLens: An open architecture for collaborative filtering of netnews. In Proceedings of the 1994 ACM Conference on Computer Supported Cooperative Work (CSCW’94). ACM, New York, NY, 175--186. Google ScholarDigital Library
- Jelle Roozenburg. 2006. Secure Decentralized Swarm Discovery in Tribler. Master’s Thesis. Parallel and Distributed Systems Group, Delft University of Technology. http://www.pds.ewi.tudelft.nl/∼epema/MSc-theses/MSc-thesis-Roozenburg.pdf.Google Scholar
- Roberto Roverso, Jim Dowling, and Márk Jelasity. 2013. Through the wormhole: Low cost, fresh peer sampling for the Internet. In Proceedings of the 13th IEEE International Conference on Peer-to-Peer Computing (P2P’13). IEEE, Los Alamitos, CA. DOI:http://dx.doi.org/10.1109/P2P.2013.6688707Google ScholarCross Ref
- Nathan Srebro and Tommi Jaakkola. 2003. Weighted low-rank approximations. In Proceedings of the 20th International Conference on Machine Learning (ICML’03). 720--727.Google Scholar
- Daniel Stutzbach and Reza Rejaie. 2006. Understanding churn in peer-to-peer networks. In Proceedings of the 6th ACM SIGCOMM Conference on Internet Measurement (IMC’06). ACM, New York, NY, 189--202. DOI:http://dx.doi.org/10.1145/1177080.1177105 Google ScholarDigital Library
- Norbert Tölgyesi and Márk Jelasity. 2009. Adaptive peer sampling with Newscast. In Euro-Par 2009 Parallel Processing. Lecture Notes in Computer Science, Vol. 5704. Springer, 523--534. DOI:http://dx.doi.org/10.1007/978-3-642-03869-3_50 Google ScholarDigital Library
- Yu-Xiang Wang, Stephen Fienberg, and Alex Smola. 2015. Privacy for free: Posterior sampling and stochastic gradient Monte Carlo. In Proceedings of the 32nd International Conference on Machine Learning (ICML’15). 2493--2502.Google Scholar
- F. Yan, S. Sundaram, S. V. N. Vishwanathan, and Y. Qi. 2013. Distributed autonomous online learning: Regrets and intrinsic privacy-preserving properties. IEEE Transactions on Knowledge and Data Engineering 25, 11, 2483--2493. Google ScholarDigital Library
- Martin A. Zinkevich, Alex Smola, Markus Weimer, and Lihong Li. 2010. Parallelized stochastic gradient descent. In Advances in Neural Information Processing Systems 23 (NIPS’10). 2595--2603.Google Scholar
Index Terms
- Robust Decentralized Low-Rank Matrix Decomposition
Recommendations
Low-Rank Matrix Approximation Using the Lanczos Bidiagonalization Process with Applications
Low-rank approximation of large and/or sparse matrices is important in many applications, and the singular value decomposition (SVD) gives the best low-rank approximations with respect to unitarily-invariant norms. In this paper we show that good low-rank ...
A New Privacy-Preserving Data Mining Method Using Non-negative Matrix Factorization and Singular Value Decomposition
The data analysis and mining is more and more powerful with the rapid growing data size. And publishing data for researchers is becoming more valuable. This process has an important problem: privacy protection. In recent decades, many methods for ...
A structured rank-revealing method for Sylvester matrix
We propose a fast algorithm for computing the numeric ranks of Sylvester matrices. Let S denote the Sylvester matrix and H denote the Hankel-like-Sylvester matrix. The algorithm is based on a fast Cholesky factorization of S^TS or H^TH and relies on a ...
Comments