research-article

Fast and Robust Parallel SGD Matrix Factorization

Authors:
Jinoh Oh

POSTECH, Pohang, South Korea

POSTECH, Pohang, South Korea
View Profile

,
Wook-Shin Han

POSTECH, Pohang, South Korea

POSTECH, Pohang, South Korea
View Profile

,
Hwanjo Yu

POSTECH, Pohang, South Korea

POSTECH, Pohang, South Korea
View Profile

,
Xiaoqian Jiang

University of California at San Diego, San Diego, CA, USA

University of California at San Diego, San Diego, CA, USA
View Profile

KDD '15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data MiningAugust 2015Pages 865–874https://doi.org/10.1145/2783258.2783322

Published:10 August 2015Publication History

KDD '15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Pages 865–874

ABSTRACT

Matrix factorization is one of the fundamental techniques for analyzing latent relationship between two entities. Especially, it is used for recommendation for its high accuracy. Efficient parallel SGD matrix factorization algorithms have been developed for large matrices to speed up the convergence of factorization. However, most of them are designed for a shared-memory environment thus fail to factorize a large matrix that is too big to fit in memory, and their performances are also unreliable when the matrix is skewed.

This paper proposes a fast and robust parallel SGD matrix factorization algorithm, called MLGF-MF, which is robust to skewed matrices and runs efficiently on block-storage devices (e.g., SSD disks) as well as shared-memory. MLGF-MF uses Multi-Level Grid File (MLGF) for partitioning the matrix and minimizes the cost for scheduling parallel SGD updates on the partitioned regions by exploiting partial match queries processing}. Thereby, MLGF-MF produces reliable results efficiently even on skewed matrices. MLGF-MF is designed with asynchronous I/O permeated in the algorithm such that CPU keeps executing without waiting for I/O to complete. Thereby, MLGF-MF overlaps the CPU and I/O processing, which eventually offsets the I/O cost and maximizes the CPU utility. Recent flash SSD disks support high performance parallel I/O, thus are appropriate for executing the asynchronous I/O.

From our extensive evaluations, MLGF-MF significantly outperforms (or converges faster than) the state-of-the-art algorithms in both shared-memory and block-storage environments. In addition, the outputs of MLGF-MF is significantly more robust to skewed matrices. Our implementation of MLGF-MF is available at http://dm.postech.ac.kr/MLGF-MF as executable files.

References

R. M. Bell and Y. Koren. Lessons from the netflix prize challenge. SIGKDD Explor. Newsl., 9(2):75--79, Dec. 2007. Google ScholarDigital Library
J. L. Bentley. Multidimensional binary search trees used for associative searching. Commun. ACM, 18(9):509--517, Sept. 1975. Google ScholarDigital Library
G. Dror, N. Koenigstein, Y. Koren, and M. Weimer. The Yahoo! Music Dataset and KDD-Cup'11. JMLR Workshop and Conference Proceedings, 18:3--18, 2012.Google Scholar
V. Gaede and O. Günther. Multidimensional access methods. ACM Comput. Surv., 30(2):170--231, June 1998. Google ScholarDigital Library
R. Gemulla, E. Nijkamp, P. J. Haas, and Y. Sismanis. Large-scale matrix factorization with distributed stochastic gradient descent. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '11, pages 69--77. ACM, 2011. Google ScholarDigital Library
A. Guttman. R-trees: A dynamic index structure for spatial searching. SIGMOD Rec., 14(2):47--57, June 1984. Google ScholarDigital Library
W.-S. Han, S. Lee, K. Park, J.-H. Lee, M.-S. Kim, J. Kim, and H. Yu. Turbograph: A fast parallel graph engine handling billion-scale graphs in a single pc. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '13, pages 77--85. ACM, 2013. Google ScholarDigital Library
C.-J. Hsieh and I. S. Dhillon. Fast coordinate descent methods with variable selection for non-negative matrix factorization. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '11, pages 1064--1072. ACM, 2011. Google ScholarDigital Library
Y. Hu, Y. Koren, and C. Volinsky. Collaborative filtering for implicit feedback datasets. In Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, ICDM '08, pages 263--272. IEEE Computer Society, 2008. Google ScholarDigital Library
J. Kiefer and J. Wolfowitz. Stochastic estimation of the maximum of a regression function. The Annals of Mathematical Statistics, 23:462--466, 1952.Google ScholarCross Ref
Y. Koren. Factorization meets the neighborhood: A multifaceted collaborative filtering model. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '08, pages 426--434. ACM, 2008. Google ScholarDigital Library
A. Kyrola, G. Blelloch, and C. Guestrin. Graphchi: Large-scale graph computation on just a pc. In Presented as part of the 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI 12), pages 31--46. USENIX, 2012. Google ScholarDigital Library
J.-H. Lee, Y.-K. Lee, K.-Y. Whang, and I.-Y. Song. A region splitting strategy for physical database design of multidimensional file organizations. In Proceedings of the 23rd International Conference on Very Large Data Bases, VLDB '97, pages 416--425. Morgan Kaufmann Publishers Inc., 1997. Google ScholarDigital Library
J. Nocedal and S. J. Wright. Numerical Optimization. Springer, New York, 2nd edition, 2006.Google Scholar
A. Papadopoulos, Y. Manolopoulos, Y. Theodoridis, and V. Tsotras. Grid file (and family). In Encyclopedia of Database Systems, pages 1279--1282. Springer US, 2009.Google ScholarCross Ref
B. Recht, C. Re, S. Wright, and F. Niu. Hogwild: A lock-free approach to parallelizing stochastic gradient descent. In Advances in Neural Information Processing Systems 24, pages 693--701. Curran Associates, Inc., 2011.Google Scholar
R. L. Rivest. Partial-match retrieval algorithms. SIAM Journal on Computing, 5(1):19--50, 1976.Google ScholarCross Ref
H. Robbins and S. Monro. A Stochastic Approximation Method. The Annals of Mathematical Statistics, 22(3):400--407, 1951.Google ScholarCross Ref
J. T. Robinson. The k-d-b-tree: A search structure for large multidimensional dynamic indexes. In Proceedings of the 1981 ACM SIGMOD International Conference on Management of Data, SIGMOD '81, pages 10--18. ACM, 1981. Google ScholarDigital Library
K. Y. Whang, S. W. Kim, and G. Wiederhold. Dynamic maintenance of data distribution for selectivity estimation. The VLDB Journal, 3(1):29--51, Jan. 1994. Google ScholarDigital Library
W. Xu, X. Liu, and Y. Gong. Document clustering based on non-negative matrix factorization. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, SIGIR '03, pages 267--273. ACM, 2003. Google ScholarDigital Library
H.-F. Yu, C.-J. Hsieh, S. Si, and I. Dhillon. Scalable coordinate descent approaches to parallel matrix factorization for recommender systems. In Proceedings of the 2012 IEEE 12th International Conference on Data Mining, ICDM '12, pages 765--774. IEEE Computer Society, 2012. Google ScholarDigital Library
H. Yun, H.-F. Yu, C.-J. Hsieh, S. Vishwanathan, and I. S. Dhillon. Nomad: Non-locking, stochastic multi-machine algorithm for asynchronous and decentralized matrix completion. In International Conference on Very Large Data Bases (VLDB), sep 2014. Google ScholarDigital Library
Y. Zhou, D. Wilkinson, R. Schreiber, and R. Pan. Large-scale parallel collaborative filtering for the netflix prize. In Proceedings of the 4th International Conference on Algorithmic Aspects in Information and Management, AAIM '08, pages 337--348. Springer-Verlag, 2008. Google ScholarDigital Library
Y. Zhuang, W.-S. Chin, Y.-C. Juan, and C.-J. Lin. A fast parallel sgd for matrix factorization in shared memory systems. In Proceedings of the 7th ACM Conference on Recommender Systems, RecSys '13, pages 249--256. ACM, 2013. Google ScholarDigital Library
M. Zinkevich, M. Weimer, L. Li, and A. J. Smola. Parallelized stochastic gradient descent. In J. Lafferty, C. Williams, J. Shawe-Taylor, R. Zemel, and A. Culotta, editors, Advances in Neural Information Processing Systems 23, pages 2595--2603. Curran Associates, Inc., 2010.Google Scholar

Index Terms

Fast and Robust Parallel SGD Matrix Factorization
1. Information systems
  1. Information systems applications

Recommendations

Large-scale matrix factorization with distributed stochastic gradient descent
KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining

We provide a novel algorithm to approximately factor large matrices with millions of rows, millions of columns, and billions of nonzero elements. Our approach rests on stochastic gradient descent (SGD), an iterative stochastic optimization algorithm. We ...
Read More
A fast parallel SGD for matrix factorization in shared memory systems
RecSys '13: Proceedings of the 7th ACM conference on Recommender systems

Matrix factorization is known to be an effective method for recommender systems that are given only the ratings from users to items. Currently, stochastic gradient descent (SGD) is one of the most popular algorithms for matrix factorization. However, as ...
Read More
CuMF_SGD: Parallelized Stochastic Gradient Descent for Matrix Factorization on GPUs
HPDC '17: Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing

Stochastic gradient descent (SGD) is widely used by many machine learning algorithms. It is efficient for big data ap- plications due to its low algorithmic complexity. SGD is inherently serial and its parallelization is not trivial. How to parallelize ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
August 2015
2378 pages
ISBN:9781450336642
DOI:10.1145/2783258
General Chairs:
Longbing Cao
University of Technology, Sydney
,
Chengqi Zhang
University of Technology, Sydney
,
Program Chairs:
Thorsten Joachims
Cornell University
,
Geoff Webb
Monash University
,
Dragos D. Margineantu
Boeing Research
,
Graham Williams
Australian Taxation Office
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 10 August 2015
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
matrix factorization
stochastic gradient descent
Qualifiers
- research-article
Conference

Acceptance Rates
KDD '15 Paper Acceptance Rate160of819submissions,20%Overall Acceptance Rate1,133of8,635submissions,13%
More
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 27
  Total Citations
  View Citations
- 1,179
  Total Downloads
- Downloads (Last 12 months)19
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Fast and Robust Parallel SGD Matrix Factorization

KDD '15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Large-scale matrix factorization with distributed stochastic gradient descent

A fast parallel SGD for matrix factorization in shared memory systems

CuMF_SGD: Parallelized Stochastic Gradient Descent for Matrix Factorization on GPUs