ABSTRACT
Recommender systems research is often based on comparisons of predictive accuracy: the better the evaluation scores, the better the recommender. However, it is difficult to compare results from different recommender systems due to the many options in design and implementation of an evaluation strategy. Additionally, algorithmic implementations can diverge from the standard formulation due to manual tuning and modifications that work better in some situations.
In this work we compare common recommendation algorithms as implemented in three popular recommendation frameworks. To provide a fair comparison, we have complete control of the evaluation dimensions being benchmarked: dataset, data splitting, evaluation strategies, and metrics. We also include results using the internal evaluation mechanisms of these frameworks. Our analysis points to large differences in recommendation accuracy across frameworks and strategies, i.e. the same baselines may perform orders of magnitude better or worse across frameworks. Our results show the necessity of clear guidelines when reporting evaluation of recommender systems to ensure reproducibility and comparison of results.
Supplemental Material
- G. Adomavicius and A. Tuzhilin. Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions. IEEE Trans. Knowl. Data Eng., 17(6):734--749, 2005. Google ScholarDigital Library
- T. G. Armstrong, A. Moffat, W. Webber, and J. Zobel. Improvements that don't add up: ad-hoc retrieval results since 1998. In CIKM, 2009. Google ScholarDigital Library
- C. Basu, H. Hirsh, and W. W. Cohen. Recommendation as classification: Using social and content-based information in recommendation. In J. Mostow and C. Rich, editors, AAAI/IAAI, pages 714--720. AAAI Press / MIT Press, 1998. Google ScholarDigital Library
- A. Bellogín, P. Castells, and I. Cantador. Precision-oriented evaluation of recommender systems: an algorithmic comparison. In RecSys, 2011. Google ScholarDigital Library
- J. S. Breese, D. Heckerman, and C. Kadie. Empirical analysis of predictive algorithms for collaborative filtering. In UAI, 1998. Google ScholarDigital Library
- P. G. Campos, F. Díez, and I. Cantador. Time-aware recommender systems: a comprehensive survey and analysis of existing evaluation protocols. User Model. User-Adapt. Interact., 24(1-2):67--119, 2014. Google ScholarDigital Library
- P. Cremonesi, Y. Koren, and R. Turrin. Performance of recommender algorithms on top-n recommendation tasks. In RecSys, 2010. Google ScholarDigital Library
- P. Cremonesi, A. Sansottera, and S. Gualandi. On the cooling-aware workload placement problem. In AI for Data Center Management and Cloud Computing, 2011.Google Scholar
- M. Deshpande and G. Karypis. Item-based top-n recommendation algorithms. ACM Trans. Inf. Syst., 22(1):143--177, Jan. 2004. Google ScholarDigital Library
- C. Desrosiers and G. Karypis. A comprehensive survey of neighborhood-based recommendation methods. In Ricci et al. {26}, pages 107--144.Google Scholar
- M. D. Ekstrand, M. Ludwig, J. A. Konstan, and J. Riedl. Rethinking the recommender research ecosystem: reproducibility, openness, and lenskit. In RecSys, pages 133--140, 2011. Google ScholarDigital Library
- S. Funk. Netflix update: Try this at home. http://sifter.org/~simon/journal/20061211.html (retrieved Jan. 2014), Dec 2006.Google Scholar
- Z. Gantner, S. Rendle, C. Freudenthaler, and L. Schmidt-Thieme. Mymedialite: A free recommender system library. In RecSys, 2011. Google ScholarDigital Library
- K. Goldberg, T. Roeder, D. Gupta, and C. Perkins. Eigentaste: A constant time collaborative filtering algorithm. Inf. Retr., 4(2):133--151, July 2001. Google ScholarDigital Library
- A. Gunawardana and G. Shani. A survey of accuracy evaluation metrics of recommendation tasks. J. Mach. Learn. Res., 10:2935--2962, Dec. 2009. Google ScholarDigital Library
- J. L. Herlocker, J. A. Konstan, L. G. Terveen, and J. T. Riedl. Evaluating collaborative filtering recommender systems. ACM Trans. Inf. Syst., 22(1):5--53, Jan. 2004. Google ScholarDigital Library
- T. Jambor and J. Wang. Optimizing multiple objectives in collaborative filtering. In RecSys, pages 55--62, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
- J. A. Konstan and G. Adomavicius. Toward identification and adoption of best practices in algorithmic recommender systems research. In RepSys, pages 23--28, New York, NY, USA, 2013. ACM. Google ScholarDigital Library
- Y. Koren. Factorization meets the neighborhood: A multifaceted collaborative filtering model. In KDD. ACM, 2008. Google ScholarDigital Library
- Y. Koren and R. Bell. Advances in collaborative filtering. In Ricci et al. {26}, pages 145--186.Google Scholar
- Y. Koren, R. M. Bell, and C. Volinsky. Matrix factorization techniques for recommender systems. IEEE Computer, 42(8):30--37, 2009. Google ScholarDigital Library
- S. M. McNee, J. Riedl, and J. A. Konstan. Being accurate is not enough: how accuracy metrics have hurt recommender systems. In CHI Extended Abstracts, pages 1097--1101, 2006. Google ScholarDigital Library
- T. T. Nguyen, D. Kluver, T.-Y. Wang, P.-M. Hui, M. D. Ekstrand, M. C. Willemsen, and J. Riedl. Rating support interfaces to improve user experience and recommender accuracy. In RecSys. ACM, 2013. Google ScholarDigital Library
- S. Owen, R. Anil, T. Dunning, and E. Friedman. Mahout in Action. Manning Publications Co., Greenwich, CT, USA, 2011. Google ScholarDigital Library
- P. Resnick, N. Iacovou, M. Suchak, P. Bergstrom, and J. Riedl. Grouplens: An open architecture for collaborative filtering of netnews. In CSCW, pages 175--186, 1994. Google ScholarDigital Library
- F. Ricci, L. Rokach, B. Shapira, and P. B. Kantor, editors. Recommender Systems Handbook. Springer, 2011. Google ScholarDigital Library
- A. Said and A. Bellogín. Rival - a toolkit to foster reproducibility in recommender system evaluation. In RecSys, 2014. Google ScholarDigital Library
- A. Said, B. J. Jain, S. Narr, and T. Plumbaum. Users and noise: The magic barrier of recommender systems. In UMAP. Springer, 2012. Google ScholarDigital Library
- B. M. Sarwar, G. Karypis, J. A. Konstan, and J. Riedl. Item-based collaborative filtering recommendation algorithms. In WWW, 2001. Google ScholarDigital Library
- G. Shani and A. Gunawardana. Evaluating recommendation systems. In Ricci et al. {26}, pages 257--297.Google Scholar
- U. Shardanand and P. Maes. Social information filtering: Algorithms for automating "word of mouth". In CHI, pages 210--217, 1995. Google ScholarDigital Library
Recommendations
Research paper recommender system evaluation: a quantitative literature survey
RepSys '13: Proceedings of the International Workshop on Reproducibility and Replication in Recommender Systems EvaluationOver 80 approaches for academic literature recommendation exist today. The approaches were introduced and evaluated in more than 170 research articles, as well as patents, presentations and blogs. We reviewed these approaches and found most evaluations ...
A Scalable, Accurate Hybrid Recommender System
WKDD '10: Proceedings of the 2010 Third International Conference on Knowledge Discovery and Data MiningRecommender systems apply machine learning techniques for filtering unseen information and can predict whether a user would like a given resource. There are three main types of recommender systems: collaborative filtering, content-based filtering, and ...
A 3D approach to recommender system evaluation
CSCW '13: Proceedings of the 2013 conference on Computer supported cooperative work companionIn this work we describe an approach at multi-objective recommender system evaluation based on a previously introduced 3D benchmarking model. The benchmarking model takes user-centric, business-centric and technical constraints into consideration in ...
Comments