ABSTRACT
Matrix factorization is one of the fundamental techniques for analyzing latent relationship between two entities. Especially, it is used for recommendation for its high accuracy. Efficient parallel SGD matrix factorization algorithms have been developed for large matrices to speed up the convergence of factorization. However, most of them are designed for a shared-memory environment thus fail to factorize a large matrix that is too big to fit in memory, and their performances are also unreliable when the matrix is skewed.
This paper proposes a fast and robust parallel SGD matrix factorization algorithm, called MLGF-MF, which is robust to skewed matrices and runs efficiently on block-storage devices (e.g., SSD disks) as well as shared-memory. MLGF-MF uses Multi-Level Grid File (MLGF) for partitioning the matrix and minimizes the cost for scheduling parallel SGD updates on the partitioned regions by exploiting partial match queries processing}. Thereby, MLGF-MF produces reliable results efficiently even on skewed matrices. MLGF-MF is designed with asynchronous I/O permeated in the algorithm such that CPU keeps executing without waiting for I/O to complete. Thereby, MLGF-MF overlaps the CPU and I/O processing, which eventually offsets the I/O cost and maximizes the CPU utility. Recent flash SSD disks support high performance parallel I/O, thus are appropriate for executing the asynchronous I/O.
From our extensive evaluations, MLGF-MF significantly outperforms (or converges faster than) the state-of-the-art algorithms in both shared-memory and block-storage environments. In addition, the outputs of MLGF-MF is significantly more robust to skewed matrices. Our implementation of MLGF-MF is available at http://dm.postech.ac.kr/MLGF-MF as executable files.
- R. M. Bell and Y. Koren. Lessons from the netflix prize challenge. SIGKDD Explor. Newsl., 9(2):75--79, Dec. 2007. Google ScholarDigital Library
- J. L. Bentley. Multidimensional binary search trees used for associative searching. Commun. ACM, 18(9):509--517, Sept. 1975. Google ScholarDigital Library
- G. Dror, N. Koenigstein, Y. Koren, and M. Weimer. The Yahoo! Music Dataset and KDD-Cup'11. JMLR Workshop and Conference Proceedings, 18:3--18, 2012.Google Scholar
- V. Gaede and O. Günther. Multidimensional access methods. ACM Comput. Surv., 30(2):170--231, June 1998. Google ScholarDigital Library
- R. Gemulla, E. Nijkamp, P. J. Haas, and Y. Sismanis. Large-scale matrix factorization with distributed stochastic gradient descent. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '11, pages 69--77. ACM, 2011. Google ScholarDigital Library
- A. Guttman. R-trees: A dynamic index structure for spatial searching. SIGMOD Rec., 14(2):47--57, June 1984. Google ScholarDigital Library
- W.-S. Han, S. Lee, K. Park, J.-H. Lee, M.-S. Kim, J. Kim, and H. Yu. Turbograph: A fast parallel graph engine handling billion-scale graphs in a single pc. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '13, pages 77--85. ACM, 2013. Google ScholarDigital Library
- C.-J. Hsieh and I. S. Dhillon. Fast coordinate descent methods with variable selection for non-negative matrix factorization. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '11, pages 1064--1072. ACM, 2011. Google ScholarDigital Library
- Y. Hu, Y. Koren, and C. Volinsky. Collaborative filtering for implicit feedback datasets. In Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, ICDM '08, pages 263--272. IEEE Computer Society, 2008. Google ScholarDigital Library
- J. Kiefer and J. Wolfowitz. Stochastic estimation of the maximum of a regression function. The Annals of Mathematical Statistics, 23:462--466, 1952.Google ScholarCross Ref
- Y. Koren. Factorization meets the neighborhood: A multifaceted collaborative filtering model. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '08, pages 426--434. ACM, 2008. Google ScholarDigital Library
- A. Kyrola, G. Blelloch, and C. Guestrin. Graphchi: Large-scale graph computation on just a pc. In Presented as part of the 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI 12), pages 31--46. USENIX, 2012. Google ScholarDigital Library
- J.-H. Lee, Y.-K. Lee, K.-Y. Whang, and I.-Y. Song. A region splitting strategy for physical database design of multidimensional file organizations. In Proceedings of the 23rd International Conference on Very Large Data Bases, VLDB '97, pages 416--425. Morgan Kaufmann Publishers Inc., 1997. Google ScholarDigital Library
- J. Nocedal and S. J. Wright. Numerical Optimization. Springer, New York, 2nd edition, 2006.Google Scholar
- A. Papadopoulos, Y. Manolopoulos, Y. Theodoridis, and V. Tsotras. Grid file (and family). In Encyclopedia of Database Systems, pages 1279--1282. Springer US, 2009.Google ScholarCross Ref
- B. Recht, C. Re, S. Wright, and F. Niu. Hogwild: A lock-free approach to parallelizing stochastic gradient descent. In Advances in Neural Information Processing Systems 24, pages 693--701. Curran Associates, Inc., 2011.Google Scholar
- R. L. Rivest. Partial-match retrieval algorithms. SIAM Journal on Computing, 5(1):19--50, 1976.Google ScholarCross Ref
- H. Robbins and S. Monro. A Stochastic Approximation Method. The Annals of Mathematical Statistics, 22(3):400--407, 1951.Google ScholarCross Ref
- J. T. Robinson. The k-d-b-tree: A search structure for large multidimensional dynamic indexes. In Proceedings of the 1981 ACM SIGMOD International Conference on Management of Data, SIGMOD '81, pages 10--18. ACM, 1981. Google ScholarDigital Library
- K. Y. Whang, S. W. Kim, and G. Wiederhold. Dynamic maintenance of data distribution for selectivity estimation. The VLDB Journal, 3(1):29--51, Jan. 1994. Google ScholarDigital Library
- W. Xu, X. Liu, and Y. Gong. Document clustering based on non-negative matrix factorization. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, SIGIR '03, pages 267--273. ACM, 2003. Google ScholarDigital Library
- H.-F. Yu, C.-J. Hsieh, S. Si, and I. Dhillon. Scalable coordinate descent approaches to parallel matrix factorization for recommender systems. In Proceedings of the 2012 IEEE 12th International Conference on Data Mining, ICDM '12, pages 765--774. IEEE Computer Society, 2012. Google ScholarDigital Library
- H. Yun, H.-F. Yu, C.-J. Hsieh, S. Vishwanathan, and I. S. Dhillon. Nomad: Non-locking, stochastic multi-machine algorithm for asynchronous and decentralized matrix completion. In International Conference on Very Large Data Bases (VLDB), sep 2014. Google ScholarDigital Library
- Y. Zhou, D. Wilkinson, R. Schreiber, and R. Pan. Large-scale parallel collaborative filtering for the netflix prize. In Proceedings of the 4th International Conference on Algorithmic Aspects in Information and Management, AAIM '08, pages 337--348. Springer-Verlag, 2008. Google ScholarDigital Library
- Y. Zhuang, W.-S. Chin, Y.-C. Juan, and C.-J. Lin. A fast parallel sgd for matrix factorization in shared memory systems. In Proceedings of the 7th ACM Conference on Recommender Systems, RecSys '13, pages 249--256. ACM, 2013. Google ScholarDigital Library
- M. Zinkevich, M. Weimer, L. Li, and A. J. Smola. Parallelized stochastic gradient descent. In J. Lafferty, C. Williams, J. Shawe-Taylor, R. Zemel, and A. Culotta, editors, Advances in Neural Information Processing Systems 23, pages 2595--2603. Curran Associates, Inc., 2010.Google Scholar
Index Terms
- Fast and Robust Parallel SGD Matrix Factorization
Recommendations
Large-scale matrix factorization with distributed stochastic gradient descent
KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data miningWe provide a novel algorithm to approximately factor large matrices with millions of rows, millions of columns, and billions of nonzero elements. Our approach rests on stochastic gradient descent (SGD), an iterative stochastic optimization algorithm. We ...
A fast parallel SGD for matrix factorization in shared memory systems
RecSys '13: Proceedings of the 7th ACM conference on Recommender systemsMatrix factorization is known to be an effective method for recommender systems that are given only the ratings from users to items. Currently, stochastic gradient descent (SGD) is one of the most popular algorithms for matrix factorization. However, as ...
CuMF_SGD: Parallelized Stochastic Gradient Descent for Matrix Factorization on GPUs
HPDC '17: Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed ComputingStochastic gradient descent (SGD) is widely used by many machine learning algorithms. It is efficient for big data ap- plications due to its low algorithmic complexity. SGD is inherently serial and its parallelization is not trivial. How to parallelize ...
Comments