research-article

Popularity Bias in False-positive Metrics for Recommender Systems Evaluation

Authors:
Elisa Mena-Maldonado

RMIT University, Melbourne, Australia

RMIT University, Melbourne, Australia

0000-0003-1723-1916
View Profile

,
Rocío Cañamares

Universidad Autónoma de Madrid, Madrid, Spain

Universidad Autónoma de Madrid, Madrid, Spain

0000-0002-2278-0445
View Profile

,
Pablo Castells

Universidad Autónoma de Madrid, Madrid, Spain

Universidad Autónoma de Madrid, Madrid, Spain

0000-0003-0668-6317
View Profile

,
Yongli Ren

RMIT University, Melbourne, Australia

RMIT University, Melbourne, Australia

0000-0002-3137-9653
View Profile

,
Mark Sanderson

RMIT University, Melbourne, Australia

RMIT University, Melbourne, Australia

0000-0003-0487-9609
View Profile

Authors Info & Claims

ACM Transactions on Information Systems Volume 39 Issue 3Article No.: 36pp 1–43https://doi.org/10.1145/3452740

Published:25 May 2021Publication History

ACM Transactions on Information Systems

Abstract

We investigate the impact of popularity bias in false-positive metrics in the offline evaluation of recommender systems. Unlike their true-positive complements, false-positive metrics reward systems that minimize recommendations disliked by users. Our analysis is, to the best of our knowledge, the first to show that false-positive metrics tend to penalise popular items, the opposite behavior of true-positive metrics—causing a disagreement trend between both types of metrics in the presence of popularity biases. We present a theoretical analysis of the metrics that identifies the reason that the metrics disagree and determines rare situations where the metrics might agree—the key to the situation lies in the relationship between popularity and relevance distributions, in terms of their agreement and steepness—two fundamental concepts we formalize. We then examine three well-known datasets using multiple popular true- and false-positive metrics on 16 recommendation algorithms. Specific datasets are chosen to allow us to estimate both biased and unbiased metric values. The results of the empirical study confirm and illustrate our analytical findings. With the conditions of the disagreement of the two types of metrics established, we then determine under which circumstances true-positive or false-positive metrics should be used by researchers of offline evaluation in recommender systems.¹

References

G. Adomavicius and A. Tuzhilin. 2005. Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions. IEEE Trans. Knowl. Data Eng. 17, 6 (Jun. 2005), 734–749.Google ScholarDigital Library
C. Basu, H. Hirsh, and W. W. Cohen. 1998. Recommendation as classification: Using social and content-based information in recommendation. In Proceedings of the 15th National Conference on Artificial Intelligence (AAAI’98). AAAI Press, Menlo Park, CA, 714–720.Google ScholarDigital Library
R. F. Baumeister, E. Bratslavsky, C. Finkenauer, and K. D. Vohs. 2001. Bad is stronger than good. Rev. Gen. Psychol. 5, 4 (Dec. 2001), 323–370.Google ScholarCross Ref
A. Bellogín, P. Castells, and I. Cantador. 2011. Precision-oriented evaluation of recommender systems: An algorithmic comparison. In Proceedings of the 5th ACM Conference on Recommender Systems (RecSys’11). ACM, New York, NY, 333–336.Google Scholar
A. Bellogín, P. Castells, and I. Cantador. 2017. Statistical biases in information retrieval metrics for recommender systems. Inf. Retriev. 20, 6 (Jul. 2017), 606–634.Google ScholarDigital Library
A. Broder, M. Ciaramita, M. Fontoura, E. Gabrilovich, V. Josifovski, D. Metzler, V. Murdock, and V. Plachouras. 2008. To swing or not to swing: Learning when (not) to advertise. In Proceedings of the 17th ACM Conference on Information and Knowledge Management (CIKM’13). ACM, New York, NY, 1003–1012. Google Scholar
Marc Bron, Ke Zhou, Andy Haines, and Mounia Lalmas. 2019. Uncovering bias in ad feedback data analyses & applications. In Proceedings of the 3rd International Workshop on Augmenting Intelligence with Bias-Aware Humans in the Loop (HumBL @ WWW’19). ACM, New York, NY, 614–623.Google ScholarDigital Library
B. Brost, R. Mehrotra, and T. Jehan. 2019. The music streaming sessions dataset. In Proceedings of the World Wide Web Conference (WWW ’19). ACM, New York, NY, 2594–2600.Google Scholar
C. Buckley and E. M. Voorhees. 2004. Retrieval evaluation with incomplete information. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’04). ACM, New York, NY, 25–32.Google Scholar
R. Cañamares and P. Castells. 2017. A probabilistic reformulation of memory-based collaborative filtering: Implications on popularity biases. In Proceedings of the 40th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’17). ACM, New York, NY, 215–224.Google Scholar
R. Cañamares and P. Castells. 2018. Should I follow the crowd? A probabilistic analysis of the effectiveness of popularity in recommender systems. In Proceedings of the 41st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’18). ACM, New York, NY, 415–424.Google Scholar
R. Cañamares and P. Castells. 2020. On target item sampling in offline recommender system evaluation. In 14th ACM Conference on Recommender Systems (RecSys’20). ACM, New York, NY, 259–268.Google Scholar
B. Carterette. 2011. System effectiveness, user models, and user utility: A conceptual framework for investigation. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’11). ACM, New York, NY, 903–912. Google ScholarDigital Library
P. Castells and R. Cañamares. 2018. Characterization of fair experiments for recommender system evaluation: A formal analysis. In Proceedings of the Workshop on Offline Evaluation for Recommender Systems (REVEAL 2018) at the 12th ACM Conference on Recommender Systems (RecSys’18).Google Scholar
P. Castells, N. J. Hurley, and S. Vargas. 2015. Novelty and diversity in recommender systems. In Recommender Systems Handbook (2nd ed.), F. Ricci, L. Rokach, and B. Shapira (Eds.). Springer, New York, NY, 881–918.Google Scholar
O. Chapelle, D. Metlzer, Y. Zhang, and P. Grinspan. 2009. Expected reciprocal rank for graded relevance. In Proceedings of the 18th ACM Conference on Information and Knowledge Management (CIKM’09). ACM, New York, NY, 621–630. Google Scholar
P. Y. K. Chau, S. Y. Ho, K. K. W. Ho, and Y. Yao. 2013. Examining the effects of malfunctioning personalized services on online users’ distrust and behaviors. Decis. Support Syst. 56, C (Dec. 2013), 180–191.Google Scholar
C. L. A. Clarke, N. Craswell, I. Soboroff, and A. Ashkan. 2011. A comparative analysis of cascade measures for novelty and diversity. In Proceedings of the 4th ACM International Conference on Web Search and Data Mining (WSDM’11). ACM, New York, NY, 75–84. Google Scholar
P. Cremonesi, F. Garzotto, S. Negro, A. V. Papadopoulos, and R. Turrin. 2011. Looking for “Good” recommendations: A comparative evaluation of recommender systems. In Proceedings of the 13th International Conference on Human-Computer Interaction (Interact’11). Springer, New York, NY, 152–168.Google Scholar
P. Cremonesi, F. Garzotto, and R. Turrin. 2013. User-centric vs. system-centric evaluation of recommender systems. In Proceedings of the 14th International Conference on Human-Computer Interaction (Interact’13). Springer, New York, NY, 334–351.Google Scholar
C. Elkan. 2001. The foundations of cost-sensitive learning. In Proceedings of the 17th International Joint Conference of Artificial Intelligence (IJCAI’01). Morgan Kaufmann, Burlington, MA, 973–978.Google Scholar
B. Fields. 2011. Contextualize Your Listening: The Playlist as Recommendation Engine. Ph.D. Dissertation. Goldsmiths, University of London.Google Scholar
D. Fleder and K. Hosanagar. 2009. Blockbuster culture’s next rise or fall: The impact of recommender systems on sales diversity. Manage. Sci. 55, 5 (May 2009), 697–712.Google Scholar
E. Frolov and I. Oseledets. 2016. Fifty shades of ratings: How to benefit from a negative feedback in top-N recommendations tasks. In Proceedings of the 10th ACM Conference on Recommender Systems (RecSys’16). ACM, New York, NY, 91–98.Google Scholar
Z. Gantner, L. Drumond, C. Freudenthaler, and L. Schmidt-Thieme. 2011. Personalized ranking for non-uniformly sampled items. In Proceedings of the International Conference on KDD Cup 2011 (KDDCUP’11). JLMR.org, 231–247.Google Scholar
J. Garcia-Gathright, B. St. Thomas, C. Hosey, Z. Nazari, and F. Diaz. 2018. Understanding and evaluating user satisfaction with music discovery. In Proceedings of the 41st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’18). ACM, New York, NY, 55–64. Google Scholar
A. Germain and J. Chakareski. 2013. Spotify me: Facebook-assisted automatic playlist generation. In Proceedings of the IEEE 15th International Workshop on Multimedia Signal Processing (MMSP’13). IEEE Press, Los Alamitos, CA, 25–28.Google Scholar
A. Gilotte, C. Calauzènes, T. Nedelec, A. Abraham, and S. Dollé. 2018. Offline A/B testing for recommender systems. In Proceedings of the 11th ACM International Conference on Web Search and Data Mining (WSDM’18). ACM, New York, NY, 198–206.Google Scholar
D. G. Goldstein, R. P. McAfee, and S. Suri. 2013. The cost of annoying ads. In Proceedings of the 22nd International Conference on World Wide Web (WWW’13). ACM, New York, NY, 459–470.Google Scholar
P. Gopalan, J. M. Hofman, and D. M. Blei. 2015. Scalable recommendation with poisson factorization. In Proceedings of the 31st Conference on Uncertainty in Artificial Intelligence (UAI’15). AUAI Press, Arlington, VA, 326–335.Google Scholar
A. Gruson, P. Chandar, C. Charbuillet, J. McInerney, S. Hansen, D. Tardieu, and B. Carterette. 2019. Offline evaluation to make decisions about playlist recommendation. In Proceedings of the 12th ACM International Conference on Web Search and Data Mining (WSDM’19). ACM, New York, NY, 420–428.Google Scholar
A. Gunawardana and G. Shani. 2015. Evaluating recommender systems. In Recommender Systems Handbook (2nd ed.), F. Ricci, L. Rokach, and B. Shapira (Eds.). Springer, New York, NY, 265–308.Google Scholar
G. Guo, J. Zhang, Z. Sun, and N. Yorke-Smith. 2015. LibRec: A Java library for recommender systems. In Posters, Demos, Late-breaking Results and Workshop Proceedings of the 23rd Conference on User Modelling, Adaptation and Personalization (UMAP’15).Google Scholar
X. He, H. Zhang, M. Y. Kan, and T. S. Chua. 2016. Fast matrix factorization for online recommendation with implicit feedback. In Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’16). ACM, New York, NY, 549–558.Google Scholar
J. L. Herlocker, J. A. Konstan, L. G. Terveen, and J. T. Riedl. 2004. Evaluating collaborative filtering recommender systems. ACM Trans. Inf. Syst. 22, 1 (Jan. 2004), 5–53.Google ScholarDigital Library
J. M. Hernández-Lobato, N. Houlsby, and Z. Ghahramani. 2014. Probabilistic matrix factorization with non-random missing data. In Proceedings of the 31st International Conference on Machine Learning (ICML’14). 1512–1520.Google Scholar
T. Hofmann. 2004. Latent semantic models for collaborative filtering. ACM Trans. Inf. Syst. 22, 1 (Jan. 2004), 89–115.Google ScholarDigital Library
Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative filtering for implicit feedback datasets. In Proceedings of the 8th IEEE International Conference on Data Mining (ICDM’08). IEEE Computer Society, Los Alamitos, CA, 15–19.Google Scholar
M. Jahrer and A. Töscher. 2011. Collaborative filtering ensemble for ranking. In Proceedings of the International Conference on KDD Cup 2011 (KDDCUP’11). JLMR.org, 153–167.Google Scholar
D. Jannach, L. Lerche, I. Kamehkhosh, and M. Jugovac. 2015. What recommenders recommend: An analysis of recommendation biases and possible countermeasures. User Model’ User-Adapt’ Interact’ 25, 5 (Dec. 2015), 427–491.Google Scholar
Y. Koren. 2008. Factorization meets the neighborhood: A multifaceted collaborative filtering model. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’08). ACM, New York, NY, USA, 426–434.Google ScholarDigital Library
Y. Koren. 2009. Collaborative filtering with temporal dynamics. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’09). ACM, New York, NY, 447–456. Google ScholarDigital Library
W. Krichene and S. Rendle. 2020. On sampled metrics for item recommendation. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’20). ACM, New York, NY, 1748–1757.Google Scholar
A. Lipani, M. Lupu, and A. Hanbury. 2015. Splitting water: Precision and anti-precision to reduce pool bias. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’15). ACM, New York, NY, 103–112.Google Scholar
Dugang Liu, Pengxiang Cheng, Zhenhua Dong, Xiuqiang He, Weike Pan, and Zhong Ming. 2020. A general knowledge distillation framework for counterfactual recommendation via uniform data. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’20). ACM, New York, NY, 831–840. Google ScholarDigital Library
H. Lu, M. Zhang, W. Ma, C. Wang, F. xia, Y. Liu, L. Lin, and S. Ma. 2019. Effects of user negative experience in mobile news streaming. In Proceedings of the 42nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’19). 705–714.Google Scholar
B. M. Marlin and R. S. Zemel. 2009. Collaborative prediction and ranking with non-random missing data. In Proceedings of the 3rd ACM Conference on Recommender Systems (RecSys’09). ACM, New York, NY, 5–12.Google Scholar
B. M. Marlin, R. S. Zemel, S. T. Roweis, and M. Slaney. 2007. Collaborative filtering and the missing at random assumption. In Proceedings of the 23rd Conference on Uncertainty in Artificial Intelligence (UAI’07). 267–275.Google Scholar
F. M. Maxwell and J. A. Konstan. 2015. The movielens datasets: History and context. ACM Transactions on Interactive Intelligent Systems (TIIS) 5, 4 (Dec. 2015), 1--19.Google Scholar
S. M. McNee, J. Riedl, and J. A. Konstan. 2006. Being accurate is not enough: How accuracy metrics have hurt recommender systems. In Proceedings of ACM CHI 2006 Conference on Human Factors in Computing Systems (CHI’06). ACM, New York, NY, 1097–1101.Google Scholar
E. Mena-Maldonado, R. Cañamares, P. Castells, Y. Ren, and M. Sanderson. 2020. Agreement and disagreement between true and false-positive metrics in recommender systems evaluation. In Proceedings of the 43rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’20). ACM, New York, NY, 841–850.Google Scholar
A. Moffat, F. Scholer, and Z. Yang. 2018. Estimating measurement uncertainty for information retrieval effectiveness metrics. J’ Data Inf’ Qual’ 10, 3 (2018), 1--22.Google Scholar
A. Moffat and J. Zobel. 2008. Rank-biased precision for measurement of retrieval effectiveness. ACM Transactions on Information Systems (TOIS) 27, 1 (Dec. 2008), 1--27. Google ScholarDigital Library
R. J. Mooney and L. Roy. 1999. Content-based book recommending using learning for text categorization. In Proceedings of the 5th ACM Conference on Digital Libraries. ACM, New York, NY, 195–204.Google ScholarDigital Library
X. Ning, C. Desrosiers, and G. Karypis. 2015. A comprehensive survey of neighborhood-based recommender systems. In Recommender Systems Handbook (2nd ed.), F. Ricci, L. Rokach, and B. Shapira (Eds.). Springer, New York, NY, 37–76.Google Scholar
X. Ning and G. Karypis. 2011. SLIM: Sparse linear methods for top-N recommender systems. In Proceedings of the IEEE 11th International Conference on Data Mining (ICDM’11). IEEE Computer Society, Los Alamitos, CA, 497–506.Google Scholar
E. Pampalk, T. Pohle, and G. Widmer. 2005. Dynamic playlist generation based on skip-ping behavior. In Proceedings of the 6th International Conference on Music Information Retrieval (ISMIR’05). 634–637.Google Scholar
W. Pan and L. Chen. 2013. GBPR: Group preference based Bayesian personalized ranking for one-class collaborative filtering. In Proceedings of the 23rd International Joint Conference of Artificial Intelligence (IJCAI’13). AAAI Press, 2691–2697.Google Scholar
L. A. S. Pizzato, T. Rej, J. Akehurst, I. Koprinska, K. Yacef, and J. Kay. 2013. Recommending people to people: The nature of reciprocal recommenders with a case study in online dating. User Model. User-Adapt. Interact. 23, 5 (Nov. 2013), 447–488.Google ScholarDigital Library
F. Provost and T. Fawcett. 2001. Robust classification for imprecise environments. Mach. Learn. 42, 3 (Mar. 2001), 203–231.Google ScholarDigital Library
Filip Radlinski, Madhu Kurup, and Thorsten Joachims. 2008. How does clickthrough data reflect retrieval quality? In Proceedings of the 17th ACM Conference on Information and Knowledge Management (CIKM’08). ACM, New York, NY. Google ScholarDigital Library
S. E. Robertson. 1977. The probability ranking in IR. J. Document. 33, 4 (Jan. 1977), 294–304.Google ScholarCross Ref
T. Sakai. 2007. Alternatives to Bpref. In Proceedings of the 30th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’07). ACM, New York, NY, 71–78.Google ScholarDigital Library
T. Sakai and N. Kando. 2008. On information retrieval metrics designed for evaluation with incomplete relevance assessments. Inf. Retriev. 11, 5 (Mar. 2008), 447–470.Google Scholar
P. Sánchez and A. Bellogín. 2018. Measuring anti-relevance: A study on when recommendation algorithms produce bad suggestions. In Proceedings of the 12th ACM Conference on Recommender Systems (RecSys’18). ACM, New York, NY, 367–371.Google Scholar
T. Schnabel, A. Swaminathan, A. Singh, N. Chandak, and T. Joachims. 2016. Recommendations as treatments: Debiasing learning and evaluation. In Proceedings of the 3rd International Conference on Machine Learning (ICML’16). 1670–1679.Google Scholar
Y. Shi, M. Larson, and A. Hanjalic. 2010. List-wise learning to rank with matrix factorization for collaborative filtering. In Proceedings of the 4th ACM Conference on Recommender Systems (RecSys’10). ACM, New York, NY, 269–272.Google Scholar
H. Steck. 2010. Training and testing of recommender systems on data missing not at random. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’10). ACM, New York, NY, 713–722.Google ScholarDigital Library
H. Steck. 2011. Item popularity and recommendation accuracy. In Proceedings of the 5th ACM Conference on Recommender Systems (RecSys’11). ACM, New York, NY, 125–132.Google ScholarDigital Library
H. Steck. 2013. Evaluation of recommendations: Rating prediction and ranking. In Proceedings of the 7th ACM Conference on Recommender Systems (RecSys’13). ACM, New York, NY, 213–220.Google ScholarDigital Library
A. Swaminathan, A. Krishnamurthy, A. Agarwal, M. Dudík, J. Langford, D. Jose, and I. Zitouni. 2017. Off-policy evaluation for slate recommendation. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS’17). Curran Associates, Inc., Red Hook, NY, 3635–3645.Google Scholar
L. Törnqvist, P. Vartia, and Y. O. Vartia. 1985. How should relative changes be measured. Am. Stat. 39, 1 (Feb. 1985), 43–46.Google Scholar
K. Wang, T. Walker, and Z. Zheng. 2019. PSkip: Estimating relevance ranking quality from web search clickthrough data. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’19). ACM, New York, NY, 1355–1364.Google Scholar
A. F. Wicaksono and A. Moffat. 2020. Metrics, user models, and satisfaction. In Proceedings of the 13th ACM International Conference on Web Search and Data Mining (WSDM’20). ACM, New York, NY, 654–662. Google Scholar
L. Yang, Y. Cui, Y. Xuan, C. Wang, S. Belongie, and D. Estrin. 2018. Unbiased offline recommender evaluation for missing-not-at-random implicit feedback. In Proceedings of the 12th ACM Conference on Recommender Systems (RecSys’18). ACM, New York, NY, 279–287.Google Scholar
E. Yilmaz and J. A. Aslam. 2006. Estimating average precision with incomplete and imperfect judgments. In Proceedings of the 15th ACM International Conference on Information and Knowledge Management (CIKM’06). ACM, New York, NY, 102–111.Google Scholar
E. Yilmaz and J. A. Aslam. 2008. Estimating average precision when judgments are incomplete. Knowledge and Information Systems 16, 2 (Aug. 2008), 173–211.Google ScholarCross Ref
D. Yin, S. D. Bond, and H. Zhang. 2010. Are bad reviews always stronger than good? Asymmetric negativity bias in the formation of online consumer trust. In Proceedings of the 31st International Conference on Information Systems (ICIS’10). Association for Information Systems, 1–18.Google Scholar
Z. Yuan and E. Oja. 2005. Projective nonnegative matrix factorization for image compression and feature extraction. In Proceedings of the 14th Scandinavian Conference on Image Analysis (SCIA’05). Springer-Verlag, Berlin, 333–342.Google Scholar
C. Zhai and J. Lafferty. 2006. A risk minimization framework for information retrieval. Inf. Process. Manage. 42, 1 (Jan. 2006), 31–55.Google ScholarCross Ref

Index Terms

Popularity Bias in False-positive Metrics for Recommender Systems Evaluation
1. Information systems
  1. Information retrieval
    1. Evaluation of retrieval results
    2. Retrieval tasks and goals
      1. Recommender systems

Recommendations

User-centered Evaluation of Popularity Bias in Recommender Systems
UMAP '21: Proceedings of the 29th ACM Conference on User Modeling, Adaptation and Personalization

Recommendation and ranking systems are known to suffer from popularity bias; the tendency of the algorithm to favor a few popular items while under-representing the majority of other items. Prior research has examined various approaches for mitigating ...
Read More
Agreement and Disagreement between True and False-Positive Metrics in Recommender Systems Evaluation
SIGIR '20: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval

False-positive metrics can capture an important side of recommendation quality, focusing on the impact of suggestions that are disliked by users, as a complement of common metrics that only measure the amount of successful recommendations. In this paper ...
Read More
Statistical biases in Information Retrieval metrics for recommender systems
Abstract
There is an increasing consensus in the Recommender Systems community that the dominant error-based evaluation metrics are insufficient, and mostly inadequate, to properly assess the practical effectiveness of recommendations. Seeking to evaluate ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Information Systems Volume 39, Issue 3
July 2021
432 pages
ISSN:1046-8188
EISSN:1558-2868
DOI:10.1145/3450607
Editor:
Min Zhang
Tsinghua University, China
Issue’s Table of Contents
Copyright © 2021 Copyright held by the owner/author(s). Publication rights licensed to ACM.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 25 May 2021
- Accepted: 1 February 2021
- Revised: 1 January 2021
- Received: 1 August 2020
Published in tois Volume 39, Issue 3

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Recommender systems
evaluation
metric
false positives
popularity bias
non-random missing data
Qualifiers
- research-article
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 8
  Total Citations
  View Citations
- 406
  Total Downloads
- Downloads (Last 12 months)64
- Downloads (Last 6 weeks)9
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Popularity Bias in False-positive Metrics for Recommender Systems Evaluation

ACM Transactions on Information Systems

Abstract

References

Cited By

Index Terms

Recommendations

User-centered Evaluation of Popularity Bias in Recommender Systems

Agreement and Disagreement between True and False-Positive Metrics in Recommender Systems Evaluation

Statistical biases in Information Retrieval metrics for recommender systems