Boolean kernels for collaborative filtering in top-N item recommendation

doi:10.1016/j.neucom.2018.01.057

Neurocomputing

Volume 286, 19 April 2018, Pages 214-225

https://doi.org/10.1016/j.neucom.2018.01.057 Get rights and content

Abstract

In many personalized recommendation problems available data consists only of positive interactions (implicit feedback) between users and items. This problem is also known as One-Class Collaborative Filtering (OC-CF). Linear models usually achieve state-of-the-art performances on OC-CF problems and many efforts have been devoted to build more expressive and complex representations able to improve the recommendations. Recent analysis show that collaborative filtering (CF) datasets have peculiar characteristics such as high sparsity and a long tailed distribution of the ratings. In this paper we propose a boolean kernel, called Disjunctive kernel, which is less expressive than the linear one but it is able to alleviate the sparsity issue in CF contexts. The embedding of this kernel is composed by all the combinations of a certain arity d of the input variables, and these combined features are semantically interpreted as disjunctions of the input variables. Experiments on several CF datasets show the effectiveness and the efficiency of the proposed kernel.

Introduction

Collaborative Filtering (CF) is the de facto approach for making personalized recommendation. CF techniques exploit historical information about the user-item interactions in order to improve future recommendations to users. User-item interactions can be of two types: explicit or implicit. Explicit feedback is an unambiguous information about how much a user likes/dislikes an item and it is usually represented by a rating (e.g., 1–5 stars scale, thumbs up vs. thumbs down). Conversely, implicit feedback is a binary information because it simply means the presence or the absence of an interaction between a user and an item, and it is by its nature ambiguous.

Even though the explicit setting got most of the attention of the research community, recently the focus is drifting towards the implicit feedback context (also known as One-Class CF problem, OC-CF) because of the following two main reasons: (i) implicit data are much more easy to gather as they do not require any active action by the user, and (ii) they are simply more common.

Unlike the explicit feedback setting where the recommendation task is the rating prediction, in the implicit feedback case the goal is to produce an ordered list of items, where those items that are the most likely to have a future interaction with the user are at the top. In this work we focus on this last type of task which is known as top-N recommendation.

The first developed approaches for OC-CF problems are the neighborhood-based ones [1]. Despite they do not employ any kind of learning, they have been shown to be very effective [2], [3]. Afterwards, methods employing learning have been proposed, such as SLIM [4], WRMF [5], CF-OMD [6], [7], and more recently, LRec [8], and GLSLIM [9]. Both the neighborhood-based and the learning-based methods mentioned above exploit linear relations between users and/or items. The effectiveness of linear models in CF is further underlined in [10] where the authors propose an online linear model and they demonstrate its performance guarantees.

Recently, see [7] and [11], an efficient kernel-based method for OC-CF has been proposed. This method, called CF-KOMD, is based on the approach presented in [6] which showed better performance than other well known recommendation algorithms such as SLIM, WRMF and BPR. CF-KOMD have been tested using different kernels such as linear, polynomial, and Tanimoto kernel, and it has shown state-of-the-art performance on many CF datasets. From the reported results it is possible to notice that the linear kernel achieves very good results despite being the least expressive.

This behavior is typical in CF datasets because they usually own the following two characteristics [11], [12], [13]: they are very sparse ( ∼ 1% density), and the distribution of the interactions over the items and/or over the users are long tailed. For these reasons, for this kind of data, it is not ideal to use more complex and, consequently, even more sparse representations.

Kernel-based methods have also been used for rating prediction, and in general for matrix factorization. Liu et al. [14] have proposed a kernel-based collaborative filtering method for rating prediction as well as a multiple kernel learning variation of the same approach. Previously, Ding et al. [15] proposed a convex and semi-nonnegative matrix factorization technique based on kernel. More recently, other latent-factor based methods for rating prediction have shown promising results. In particular, Luo et al. [16] proposed a non-negative latent factor model for very high-dimensional and sparse matrices as well as a variant suitable for undirected networks [17]. Other latent-factor based methods are reported in [18], [19], [20].

In [21], Donini et al. characterized the notion of expressiveness of a representation (kernel function): more general representations correspond to kernels constructed on simpler features (e.g. single variables, linear kernel), while more specific representations correspond to kernels defined on elaborated features (e.g. product of variables, polynomials). Intuitively, less expressive representations tend to emphasize more the similarities between the examples, with the extreme case being the constant kernel matrix in which each example is equal to the others. On the contrary, more expressive kernels highlight more the differences between the examples, with the extreme case being the identity (kernel) matrix in which examples are orthogonal to each other. Different data can require different levels of expressiveness and, for the considerations expressed above, it is reasonable that CF contexts tend to favor more general representations (e.g., linear).

In this paper, we propose a new representation for Boolean valued data which is less expressive than the linear one. Specifically, we propose a (Boolean) kernel, called Disjunctive kernel (D-Kernel), in which the feature space is composed by all the combinations of the input variables of a given arity d, and these combined features are semantically interpreted as logical disjunctions of the input variables.

The underpinning idea behind the proposed D-Kernel is to define higher-level features that are a kind of generalization of the linear ones, so to obtain more general representations that hopefully can alleviate the sparsity issue. It is easy to see that the associated feature space cannot be explicitly defined because of the combinatorial explosion of the number of dimensions. For this, we give an efficient and elegant way to compute the kernel values which does not require an explicit mapping of examples over the feature space.

Besides the kernel definition, we also discuss about its properties and we demonstrate, both theoretically and empirically, that the expressiveness of the D-Kernel is monotonically decreasing with the arity d.

For the sake of completeness, and to confirm the ineffectiveness of more complex representations in OC-CF contexts, we also take into consideration an existing kernel that here we call Conjunctive kernel (C-Kernel). The embedding of this kernel is the same as the D-Kernel, that is all possible combinations of d variables but, as the name suggests, the combined input variables are semantically interpreted as logical conjunctions. As opposed to the disjunctive case, we demonstrate that the expressiveness of the C-Kernel is monotonically increasing with the arity d.

Finally, we empirically assess the effectiveness and the efficiency of the proposed disjunctive kernel against other boolean kernels over six large CF datasets. Results show that, when using CF-KOMD [7] as kernel-based ranker, in almost all the datasets our kernel achieves the best AUC performances.

To summarize, the main contributions of the paper are the following:

•
the definition of a Boolean kernel, called disjunctive kernel, to tackle the sparsity issue typical of CF datasets;
•
the theoretical analysis of the expressiveness of the D-Kernel and the C-Kernel, which shows that the expressiveness, and consequently the sparsity of the representation, decreases (resp. decreases) with the arity of the D-kernel (resp. C-Kernel);
•
the proposal of an efficient method for computing both the C-Kernel and the D-Kernel even with huge number of examples. Moreover, the provided recursive (and scaled) method is able to overcome the numerical issues typical of these kind of Boolean kernels;
•
a wide empirical assessment on six large scale datasets, which shows that the D-Kernel achieves better performance than the linear kernel in most of the datasets, while it achieves always better results than all the other tested kernels. These experiments show the viability of using kernels on large scale datasets, e.g., Netflix with 100 M of ratings.

The remainder of the paper is organized as follows. In Section 2 we introduce the notation used throughout the paper and the background knowledge needed to fully understand it. Section 3 presents the existing boolean kernels and how they relate to our work. Our main contributions are reported in Section 4, in particular the Disjunctive kernel is presented in Section 4.2. Finally, in Section 5, an extensive empirical work involving six different CF datasets is presented and, in Section 6, final considerations and future lines of research we can follow are proposed.

Section snippets

Notation and background

In this section, we present the notation and the background notions useful to fully understand the remainder of this work.

Throughout the paper we generally consider learning problems with training examples {(x₁, y₁), ..., (x_m, y_m)} where $x_{i} \in B^{n},$ with $B \equiv {0, 1},$ and $y_{i} \in {- 1, + 1}$ . $X \in B^{m \times n}$ denotes the binary matrix where examples are arranged in rows with the corresponding labels vector $y \in {- 1, 1}^{m}$ .

A generic entry of a matrix M is indicated by M_ij. 1_n represents the column vector of dimension n with all

Boolean kernels

Boolean kernels are those kernels which interpret the input vectors as sets of boolean variables and apply some boolean functions to them.

If we restrict the input space to the set $B^{n},$ then several kernels can be interpreted as boolean kernels since monomials (i.e., product of features) can be seen as conjunctions of positive boolean variables. In the following, we sketch some examples.

The first example is the linear kernel, i.e., $κ_{LIN} (x, z) = 〈 x, z 〉,$ in which the features correspond to the boolean

Structured boolean kernel families

In this section, we present two family of parametric boolean kernels in which the parameter plays a key role on the expressiveness of the kernel.

Experiments and results

In this section we present the extensive experimental work we have done. All the experiments are based on the CF-KOMD framework. The implementation of the framework and all the used datasets are available at https://github.com/makgyver/pyros.

Conclusion and future work

In this work we have proposed a new boolean kernel, called D-Kernel, able to deal with the sparsity issue in a collaborative filtering context. We leveraged on the observation made in our previous works [7], [11] to come up with the idea of creating a data representation less expressive than the linear one in order to mitigate the sparsity and the long tail issues that is common in CF datasets. We presented a very efficient way for calculating the D-Kernel and we have also demonstrated its

M. Polato received his Bachelor’s degree and Master’s degrees in Computer Science from the University of Padova in 2010 and 2013, respectively. He is currently a PhD student in Brain, Mind and Computer Science at the University of Padova. His research interests include recommender systems with a particular focus on collaborative filtering, kernel methods and machine learning in general.

References (35)

J. Zhou et al.
A novel approach to solve the sparsity problem in collaborative filtering
Proceedings of the 2010 International Conference on Networking, Sensing and Control (ICNSC)
(2010)
LuoX. et al.
Symmetric and nonnegative latent factor models for undirected, high-dimensional, and sparse networks in industrial applications
IEEE Trans. Ind. Inform.
(2017)
LuoX. et al.
An incremental-and-static-combined scheme for matrix-factorization-based collaborative filtering
IEEE Trans. Autom. Sci. Eng.
(2016)
K. Sadohara, Learning of Boolean Functions Using Support Vector Machines, Springer Berlin Heidelberg, Berlin,...
C. Desrosiers, G. Karypis, A Comprehensive Survey of Neighborhood-based Recommendation Methods, Springer US, Boston,...
F. Aiolli
Efficient top-N recommendation for very large scale binary rated datasets
ACM Recommender Systems Conference
(2013)
M. Polato et al.
A preliminary study on a recommender system for the job recommendation challenge
Proceedings of the Recommender Systems Challenge
(2016)
NingX. et al.
SLIM: sparse linear methods for top-n recommender systems
Proceedings of the 11th IEEE International Conference on Data Mining, ICDM 2011, Vancouver, BC, Canada
(2011)
HuY. et al.
Collaborative filtering for implicit feedback datasets
ICDM
(2008)
F. Aiolli
Convex AUC optimization for top-N recommendation with implicit feedback
ACM Recommender Systems Conference, New York, USA
(2014)

M. Polato et al.

Kernel based collaborative filtering for very large scale top-n item recommendation

Proceedings of the European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN)

(2016)

S. Sedhain et al.

On the effectiveness of linear models for one-class collaborative filtering

Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence

(2016)

E. Christakopoulou et al.

Local item-item models for top-n recommendation

Proceedings of the 10th ACM Conference on Recommender Systems

(2016)

G. Bresler et al.

A latent source model for online collaborative filtering

Proceedings of the 27th International Conference on Neural Information Processing Systems

(2014)

M. Polato et al.

Exploiting sparsity to build efficient kernel based collaborative filtering for top-n item recommendation

Neurocomputing

(2017)

M. Grčar, D. Mladenič, B. Fortuna, M. Grobelnik, Data Sparsity Issues in the Collaborative Filtering Framework,...

X. Liu, C. Aggarwal, Y.-F. Li, X. Kong, X. Sun, S. Sathe, Kernelized matrix factorization for collaborative filtering,...

Cited by (23)

Alleviating data sparsity problem in time-aware recommender systems using a reliable rating profile enrichment approach
2022, Expert Systems with Applications
Citation Excerpt :
Basically, collaborative filtering (CF), content-based, and hybrid methods are known as three main categories of the recommendation approaches. CF uses the interactions between users and items represented as user-item rating matrix to produce recommendations (Tahmasebi, Meghdadi, Ahmadian, & Valiallahi, 2021; Polato & Aiolli, 2018; Ahmadian, Moradi, & Akhlaghian, 2014; Ar & Bostanci, 2016). Memory-based and model-based approaches are two groups of CF-based recommender systems.
Recommender systems use intelligent algorithms to learn a user’s preferences and provide them relevant suggestions. Lack of sufficient ratings – also known as data sparsity problem – often results in poor recommendation performance. The existing recommendation methods have mainly focused on designing recommenders with high accuracy without paying much attention to the reliability of the recommendations. On the other hand, the users’ preferences may vary over time and considering the time factor in the design process is crucial, which has been largely ignored in most of the existing recommenders. To deal with these issues, a novel recommendation method is proposed in this paper which incorporates temporal reliability and confidence measures into the recommendation process. First, the effectiveness of the users’ rating is measured using a probabilistic approach and ineffective rating profiles are enriched by adding some implicit ratings to them. The quality of the predictions is evaluated using a temporal reliability measure taking into account the changes of users’ preferences over time. Then, the ratings with low reliability values are recalculated using a novel procedure, which updates the target user’s neighborhood by removing ineffective users. This leads to a temporal confidence measure that is used to update the neighborhood to provide more reliable and accurate recommendations. The superiority of the proposed method over state-of-the-art recommendation methods is shown by conducting extensive experiments on three benchmark datasets.
Gated and attentive neural collaborative filtering for user generated list recommendation
2020, Knowledge-Based Systems
Citation Excerpt :
The second one, which based on implicit feedback (e.g.,view, click) rather explicit ratings, is usually treated as top-N recommendation task [15]. Many approaches have been proposed on the basis of implicit feedback [16–21] Technically, the main difference between the tasks of rating prediction and top-N recommendation lies in the way of model optimization. In particular, the former usually constructs a regression loss function only on the known ratings to optimize, yet the latter needs to take the remaining data (a mixture of real negative feedback and missing data) into consideration which are always ignored by the models for explicit feedback.
Recommending user generated lists (e.g., playlists) has become an emerging task in many online systems. Many existing list recommendation methods predict user preferences on lists by aggregating their preferences on individual items, which neglects the list-level information (e.g., list attributes) and thus results in suboptimal performance. This paper proposes a neural network-based solution for user generated list recommendation, which can leverage both item-level information and list-level information to improve performance. Firstly, a representation learning network with attention and gate mechanism is proposed to learn the user embeddings, item embeddings and list embeddings simultaneously. Then, an interaction network is proposed to learn user–item interactions and user–list interactions, in which the two kinds of interactions can share the convolution layers to further improve performance. Experimental studies on two real-world datasets demonstrate that (1) the proposed representation learning network can learn more representative user/item/list embedding than existing methods and (2) the proposed solution can outperform state-of-the-art methods in both item recommendation and list recommendation in terms of accuracy.
Bayesian pairwise learning to rank via one-class collaborative filtering
2019, Neurocomputing
Citation Excerpt :
In past decade, many hybrid extensions of CF focus on one-class collaborative filtering, which try to leverage the implicit feedback for recommendation and gain great success in recommendation accuracy [14,29,32,39,49,50]. Polato and Aiolli propose a novel Boolean kernel based collaborative recommendation method, which could alleviate the data sparsity with less expensive time complexity than the linear one [49]. HOCCF [51] not only takes account of heterogeneous one-class feedback in collaborative filtering simultaneously, but also performs similarity measure between the candidate items and preferred items.
With the ever-growing scale of social websites and online transactions, in past decade, Recommender System (RS) has become a crucial tool to overcome information overload, due to its powerful capability in information filtering and retrieval. Traditional rating prediction based RS could learn user’s preference according to the explicit feedback, however, such numerical user-item ratings are always unavailable in real life. By contrast, pairwise learning algorithms could directly optimize for ranking and provide personalized recommendation from implicit feedback, although suffering from such data sparsity and slow convergence problems. Motivated by these, in this article, a novel collaborative pairwise learning to rank method referred to as BPLR is proposed, which aims to improve the performance of personalized ranking from implicit feedback. To this end, BPLR tries to partition items into positive feedback, potential feedback and negative feedback, and takes account of the neighborhood relationship between users as well as the item similarity while deriving the potential candidates, moreover, a dynamic sampling strategy is designed to reduce the computational complexity and speed up model training. Empirical experiments over four real world datasets certificate the effectiveness and efficiency of BPLR, which could speed up convergence, and outperform state-of-the-art algorithms significantly in personalized top-N recommendation.
VCG: Exploiting visual contents and geographical influence for Point-of-Interest recommendation
2019, Neurocomputing
Citation Excerpt :
Furthermore, many previous works found that users’ implicit feedbacks are more effective for predicting users future individual behavior. Therefore, many researchers conduct top-N recommendation to evaluate the performance of POI recommendation methods [3]. However, one inherent challenge in POI recommendation is how to deal with the extreme sparsity of data (i.e. rating matrices) and improve the accuracy of such dataset in recommendation system [4].
The rapid development of location-based social networks (LBSNs) provides a substantial amount of image data which not only reveals visual contents of POIs but also users’ visual preferences. We argue that the combination of visual content and other side information (e.g., geographical influence) can lead to a more accurate and personalized recommendation performance. In this paper, we enhance POI recommendation by proposing a unified framework named VCG, which incorporates visual contents and geographical influence in LBSNs. Specifically, we employ an overlapping community detection method to capture heterogeneous relations between POIs. Then high-level visual features are leveraged to model two types of POI relations in communities (i.e., POI–POI and POI–Community). Moreover, we design an objective function with social regularization terms based on weighted matrix factorization to learn latent vectors of users, POIs and communities. In terms of geographical influence, an improved power-law probabilistic model is proposed to discover users’ geographical preferences. Finally, we develop a fused POI recommendation framework which joints user preferences with POI visual relations and geographical influences. We evaluate VCG on two real-world datasets: Yelp and Breadtrip, and experimental results show that the proposed framework is effective for the POI recommendation task.
Criterion-based Heterogeneous Collaborative Filtering for Multi-behavior Implicit Recommendation
2023, ACM Transactions on Knowledge Discovery from Data
Mitigating Data Sparsity via Neuro-Symbolic Knowledge Transfer
2023, CEUR Workshop Proceedings

View all citing articles on Scopus

F. Aiolli received a Master’s Degree and a PhD in Computer Science both from the University of Pisa. He was Post-doc at the University of Pisa, Paid Visiting Scholar at the University of Illinois at Urbana-Champaign (IL), USA, and Post-doc at the University of Padova. He is currently Assistant Professor at the University of Padova. His research activity is mainly in the area of Machine Learning and Information Retrieval.

View full text

Boolean kernels for collaborative filtering in top-N item recommendation

Abstract

Introduction

Section snippets

Notation and background

Boolean kernels

Structured boolean kernel families

Experiments and results

Conclusion and future work

IEEE Trans. Ind. Inform.

IEEE Trans. Autom. Sci. Eng.

Efficient top-N recommendation for very large scale binary rated datasets

ACM Recommender Systems Conference

A preliminary study on a recommender system for the job recommendation challenge

Proceedings of the Recommender Systems Challenge

SLIM: sparse linear methods for top-n recommender systems

Proceedings of the 11th IEEE International Conference on Data Mining, ICDM 2011, Vancouver, BC, Canada

Collaborative filtering for implicit feedback datasets

ICDM

Convex AUC optimization for top-N recommendation with implicit feedback

ACM Recommender Systems Conference, New York, USA

Kernel based collaborative filtering for very large scale top-n item recommendation

Proceedings of the European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN)

On the effectiveness of linear models for one-class collaborative filtering

Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence

Local item-item models for top-n recommendation

Proceedings of the 10th ACM Conference on Recommender Systems

A latent source model for online collaborative filtering

Proceedings of the 27th International Conference on Neural Information Processing Systems

Exploiting sparsity to build efficient kernel based collaborative filtering for top-n item recommendation

Neurocomputing