Elsevier

Neurocomputing

Volume 286, 19 April 2018, Pages 214-225
Neurocomputing

Boolean kernels for collaborative filtering in top-N item recommendation

https://doi.org/10.1016/j.neucom.2018.01.057Get rights and content

Abstract

In many personalized recommendation problems available data consists only of positive interactions (implicit feedback) between users and items. This problem is also known as One-Class Collaborative Filtering (OC-CF). Linear models usually achieve state-of-the-art performances on OC-CF problems and many efforts have been devoted to build more expressive and complex representations able to improve the recommendations. Recent analysis show that collaborative filtering (CF) datasets have peculiar characteristics such as high sparsity and a long tailed distribution of the ratings. In this paper we propose a boolean kernel, called Disjunctive kernel, which is less expressive than the linear one but it is able to alleviate the sparsity issue in CF contexts. The embedding of this kernel is composed by all the combinations of a certain arity d of the input variables, and these combined features are semantically interpreted as disjunctions of the input variables. Experiments on several CF datasets show the effectiveness and the efficiency of the proposed kernel.

Introduction

Collaborative Filtering (CF) is the de facto approach for making personalized recommendation. CF techniques exploit historical information about the user-item interactions in order to improve future recommendations to users. User-item interactions can be of two types: explicit or implicit. Explicit feedback is an unambiguous information about how much a user likes/dislikes an item and it is usually represented by a rating (e.g., 1–5 stars scale, thumbs up vs. thumbs down). Conversely, implicit feedback is a binary information because it simply means the presence or the absence of an interaction between a user and an item, and it is by its nature ambiguous.

Even though the explicit setting got most of the attention of the research community, recently the focus is drifting towards the implicit feedback context (also known as One-Class CF problem, OC-CF) because of the following two main reasons: (i) implicit data are much more easy to gather as they do not require any active action by the user, and (ii) they are simply more common.

Unlike the explicit feedback setting where the recommendation task is the rating prediction, in the implicit feedback case the goal is to produce an ordered list of items, where those items that are the most likely to have a future interaction with the user are at the top. In this work we focus on this last type of task which is known as top-N recommendation.

The first developed approaches for OC-CF problems are the neighborhood-based ones [1]. Despite they do not employ any kind of learning, they have been shown to be very effective [2], [3]. Afterwards, methods employing learning have been proposed, such as SLIM [4], WRMF [5], CF-OMD [6], [7], and more recently, LRec [8], and GLSLIM [9]. Both the neighborhood-based and the learning-based methods mentioned above exploit linear relations between users and/or items. The effectiveness of linear models in CF is further underlined in [10] where the authors propose an online linear model and they demonstrate its performance guarantees.

Recently, see [7] and [11], an efficient kernel-based method for OC-CF has been proposed. This method, called CF-KOMD, is based on the approach presented in [6] which showed better performance than other well known recommendation algorithms such as SLIM, WRMF and BPR. CF-KOMD have been tested using different kernels such as linear, polynomial, and Tanimoto kernel, and it has shown state-of-the-art performance on many CF datasets. From the reported results it is possible to notice that the linear kernel achieves very good results despite being the least expressive.

This behavior is typical in CF datasets because they usually own the following two characteristics [11], [12], [13]: they are very sparse ( ∼ 1% density), and the distribution of the interactions over the items and/or over the users are long tailed. For these reasons, for this kind of data, it is not ideal to use more complex and, consequently, even more sparse representations.

Kernel-based methods have also been used for rating prediction, and in general for matrix factorization. Liu et al. [14] have proposed a kernel-based collaborative filtering method for rating prediction as well as a multiple kernel learning variation of the same approach. Previously, Ding et al. [15] proposed a convex and semi-nonnegative matrix factorization technique based on kernel. More recently, other latent-factor based methods for rating prediction have shown promising results. In particular, Luo et al. [16] proposed a non-negative latent factor model for very high-dimensional and sparse matrices as well as a variant suitable for undirected networks [17]. Other latent-factor based methods are reported in [18], [19], [20].

In [21], Donini et al. characterized the notion of expressiveness of a representation (kernel function): more general representations correspond to kernels constructed on simpler features (e.g. single variables, linear kernel), while more specific representations correspond to kernels defined on elaborated features (e.g. product of variables, polynomials). Intuitively, less expressive representations tend to emphasize more the similarities between the examples, with the extreme case being the constant kernel matrix in which each example is equal to the others. On the contrary, more expressive kernels highlight more the differences between the examples, with the extreme case being the identity (kernel) matrix in which examples are orthogonal to each other. Different data can require different levels of expressiveness and, for the considerations expressed above, it is reasonable that CF contexts tend to favor more general representations (e.g., linear).

In this paper, we propose a new representation for Boolean valued data which is less expressive than the linear one. Specifically, we propose a (Boolean) kernel, called Disjunctive kernel (D-Kernel), in which the feature space is composed by all the combinations of the input variables of a given arity d, and these combined features are semantically interpreted as logical disjunctions of the input variables.

The underpinning idea behind the proposed D-Kernel is to define higher-level features that are a kind of generalization of the linear ones, so to obtain more general representations that hopefully can alleviate the sparsity issue. It is easy to see that the associated feature space cannot be explicitly defined because of the combinatorial explosion of the number of dimensions. For this, we give an efficient and elegant way to compute the kernel values which does not require an explicit mapping of examples over the feature space.

Besides the kernel definition, we also discuss about its properties and we demonstrate, both theoretically and empirically, that the expressiveness of the D-Kernel is monotonically decreasing with the arity d.

For the sake of completeness, and to confirm the ineffectiveness of more complex representations in OC-CF contexts, we also take into consideration an existing kernel that here we call Conjunctive kernel (C-Kernel). The embedding of this kernel is the same as the D-Kernel, that is all possible combinations of d variables but, as the name suggests, the combined input variables are semantically interpreted as logical conjunctions. As opposed to the disjunctive case, we demonstrate that the expressiveness of the C-Kernel is monotonically increasing with the arity d.

Finally, we empirically assess the effectiveness and the efficiency of the proposed disjunctive kernel against other boolean kernels over six large CF datasets. Results show that, when using CF-KOMD [7] as kernel-based ranker, in almost all the datasets our kernel achieves the best AUC performances.

To summarize, the main contributions of the paper are the following:

  • the definition of a Boolean kernel, called disjunctive kernel, to tackle the sparsity issue typical of CF datasets;

  • the theoretical analysis of the expressiveness of the D-Kernel and the C-Kernel, which shows that the expressiveness, and consequently the sparsity of the representation, decreases (resp. decreases) with the arity of the D-kernel (resp. C-Kernel);

  • the proposal of an efficient method for computing both the C-Kernel and the D-Kernel even with huge number of examples. Moreover, the provided recursive (and scaled) method is able to overcome the numerical issues typical of these kind of Boolean kernels;

  • a wide empirical assessment on six large scale datasets, which shows that the D-Kernel achieves better performance than the linear kernel in most of the datasets, while it achieves always better results than all the other tested kernels. These experiments show the viability of using kernels on large scale datasets, e.g., Netflix with 100 M of ratings.

The remainder of the paper is organized as follows. In Section 2 we introduce the notation used throughout the paper and the background knowledge needed to fully understand it. Section 3 presents the existing boolean kernels and how they relate to our work. Our main contributions are reported in Section 4, in particular the Disjunctive kernel is presented in Section 4.2. Finally, in Section 5, an extensive empirical work involving six different CF datasets is presented and, in Section 6, final considerations and future lines of research we can follow are proposed.

Section snippets

Notation and background

In this section, we present the notation and the background notions useful to fully understand the remainder of this work.

Throughout the paper we generally consider learning problems with training examples {(x1, y1), ..., (xm, ym)} where xiBn, with B{0,1}, and yi{1,+1}. XBm×n denotes the binary matrix where examples are arranged in rows with the corresponding labels vector y{1,1}m.

A generic entry of a matrix M is indicated by Mij. 1n represents the column vector of dimension n with all

Boolean kernels

Boolean kernels are those kernels which interpret the input vectors as sets of boolean variables and apply some boolean functions to them.

If we restrict the input space to the set Bn, then several kernels can be interpreted as boolean kernels since monomials (i.e., product of features) can be seen as conjunctions of positive boolean variables. In the following, we sketch some examples.

The first example is the linear kernel, i.e., κLIN(x,z)=x,z, in which the features correspond to the boolean

Structured boolean kernel families

In this section, we present two family of parametric boolean kernels in which the parameter plays a key role on the expressiveness of the kernel.

Experiments and results

In this section we present the extensive experimental work we have done. All the experiments are based on the CF-KOMD framework. The implementation of the framework and all the used datasets are available at https://github.com/makgyver/pyros.

Conclusion and future work

In this work we have proposed a new boolean kernel, called D-Kernel, able to deal with the sparsity issue in a collaborative filtering context. We leveraged on the observation made in our previous works [7], [11] to come up with the idea of creating a data representation less expressive than the linear one in order to mitigate the sparsity and the long tail issues that is common in CF datasets. We presented a very efficient way for calculating the D-Kernel and we have also demonstrated its

M. Polato received his Bachelor’s degree and Master’s degrees in Computer Science from the University of Padova in 2010 and 2013, respectively. He is currently a PhD student in Brain, Mind and Computer Science at the University of Padova. His research interests include recommender systems with a particular focus on collaborative filtering, kernel methods and machine learning in general.

References (35)

  • M. Polato et al.

    Kernel based collaborative filtering for very large scale top-n item recommendation

    Proceedings of the European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN)

    (2016)
  • S. Sedhain et al.

    On the effectiveness of linear models for one-class collaborative filtering

    Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence

    (2016)
  • E. Christakopoulou et al.

    Local item-item models for top-n recommendation

    Proceedings of the 10th ACM Conference on Recommender Systems

    (2016)
  • G. Bresler et al.

    A latent source model for online collaborative filtering

    Proceedings of the 27th International Conference on Neural Information Processing Systems

    (2014)
  • M. Polato et al.

    Exploiting sparsity to build efficient kernel based collaborative filtering for top-n item recommendation

    Neurocomputing

    (2017)
  • M. Grčar, D. Mladenič, B. Fortuna, M. Grobelnik, Data Sparsity Issues in the Collaborative Filtering Framework,...
  • X. Liu, C. Aggarwal, Y.-F. Li, X. Kong, X. Sun, S. Sathe, Kernelized matrix factorization for collaborative filtering,...
  • Cited by (23)

    • Alleviating data sparsity problem in time-aware recommender systems using a reliable rating profile enrichment approach

      2022, Expert Systems with Applications
      Citation Excerpt :

      Basically, collaborative filtering (CF), content-based, and hybrid methods are known as three main categories of the recommendation approaches. CF uses the interactions between users and items represented as user-item rating matrix to produce recommendations (Tahmasebi, Meghdadi, Ahmadian, & Valiallahi, 2021; Polato & Aiolli, 2018; Ahmadian, Moradi, & Akhlaghian, 2014; Ar & Bostanci, 2016). Memory-based and model-based approaches are two groups of CF-based recommender systems.

    • Gated and attentive neural collaborative filtering for user generated list recommendation

      2020, Knowledge-Based Systems
      Citation Excerpt :

      The second one, which based on implicit feedback (e.g.,view, click) rather explicit ratings, is usually treated as top-N recommendation task [15]. Many approaches have been proposed on the basis of implicit feedback [16–21] Technically, the main difference between the tasks of rating prediction and top-N recommendation lies in the way of model optimization. In particular, the former usually constructs a regression loss function only on the known ratings to optimize, yet the latter needs to take the remaining data (a mixture of real negative feedback and missing data) into consideration which are always ignored by the models for explicit feedback.

    • Bayesian pairwise learning to rank via one-class collaborative filtering

      2019, Neurocomputing
      Citation Excerpt :

      In past decade, many hybrid extensions of CF focus on one-class collaborative filtering, which try to leverage the implicit feedback for recommendation and gain great success in recommendation accuracy [14,29,32,39,49,50]. Polato and Aiolli propose a novel Boolean kernel based collaborative recommendation method, which could alleviate the data sparsity with less expensive time complexity than the linear one [49]. HOCCF [51] not only takes account of heterogeneous one-class feedback in collaborative filtering simultaneously, but also performs similarity measure between the candidate items and preferred items.

    • VCG: Exploiting visual contents and geographical influence for Point-of-Interest recommendation

      2019, Neurocomputing
      Citation Excerpt :

      Furthermore, many previous works found that users’ implicit feedbacks are more effective for predicting users future individual behavior. Therefore, many researchers conduct top-N recommendation to evaluate the performance of POI recommendation methods [3]. However, one inherent challenge in POI recommendation is how to deal with the extreme sparsity of data (i.e. rating matrices) and improve the accuracy of such dataset in recommendation system [4].

    View all citing articles on Scopus

    M. Polato received his Bachelor’s degree and Master’s degrees in Computer Science from the University of Padova in 2010 and 2013, respectively. He is currently a PhD student in Brain, Mind and Computer Science at the University of Padova. His research interests include recommender systems with a particular focus on collaborative filtering, kernel methods and machine learning in general.

    F. Aiolli received a Master’s Degree and a PhD in Computer Science both from the University of Pisa. He was Post-doc at the University of Pisa, Paid Visiting Scholar at the University of Illinois at Urbana-Champaign (IL), USA, and Post-doc at the University of Padova. He is currently Assistant Professor at the University of Padova. His research activity is mainly in the area of Machine Learning and Information Retrieval.

    View full text