Elsevier

Decision Support Systems

Volume 46, Issue 1, December 2008, Pages 388-398
Decision Support Systems

A maximum entropy approach to feature selection in knowledge-based authentication

https://doi.org/10.1016/j.dss.2008.07.008Get rights and content

Abstract

Feature selection is critical to knowledge-based authentication. In this paper, we adopt a wrapper method in which the learning machine is a generative probabilistic model, and the objective is to maximize the Kullback–Leibler divergence between the true empirical distribution defined by the legitimate knowledge and the approximating distribution representing an attacking strategy, both in the same feature space. The closed-form solutions to this optimization problem lead to three adaptive algorithms, unified under the principle of maximum entropy. Our experimental results show that the proposed adaptive methods are superior to the commonly used random selection method.

Introduction

Knowledge-based authentication (KBA) refers to the method of verifying a user's identity by matching one or more pieces of information (also called factoids) provided by an individual (claimant) against information sources associated with the claimant [14], [30]. KBA has several advantages over other conventional methods of authentication for both claimants and verifiers [25]. First, no prior relationship needs to be established between the claimant and the verifier specifically for the sake of authentication, since the knowledge required for user authentication is available from previous transactions or from public data sources such as Social Security Administration (SSA) and Consumer Reporting Agency (CRA). Second, the factoids are relatively easy to remember, compared with strong passwords [17]. KBA has consequently found wide use in e-commerce and e-government applications, both as a primary method of authentication as in consumer credit check and as a secondary method to augment strong authentication methods such as one-time passcode and security token. KBA is usually implemented as a challenge–response system, a practical example of which is VeriSign's Consumer Authentication Service (CAS) deployed by eBay [35]. A comprehensive review of the KBA literature is available in [8].

The major challenges that exist in the discipline of KBA research and practice are: (1) the definitions of key metrics such as guessability and memorability; (2) the estimation of the parameters; and (3) a unified framework which also captures the dependency relationships among factoids. Chen and Liginlal [8] proposed entropy-based metrics and a Bayesian network model of KBA, as an attempt to solve the model selection problem. Their underlying intuition is appealing. In the context of KBA, Shannon entropy [32] can be interpreted as a measure of the security strength of a single factoid or that of a KBA system consisting of a selected subset of factoids. From the methodological perspective, their primary contribution is a sound probabilistic approach to the information security problem of practical significance. The guessability metric is defined as a probability and estimated by maximum likelihood estimation (MLE) and a game-theoretical derivation in a probabilistically rational setup. Based on the same probabilistic modeling approach proposed in [8], the problem of selecting secure KBA factoids yields a principled measure of factoid relevance, that is, Kullback–Leibler (KL) divergence between the true distribution underlying the legitimate knowledge and the conceived distribution possessed by attackers.

The purpose of this paper is to address the other fundamental problem in KBA modeling, namely that of feature selection. In the language of KBA, feature selection is the task of selecting a relevant subset of factoids, such that the resulting KBA system can best distinguish attackers from legitimate users. At a conceptual level, this mirrors the general twofold goal of feature selection as it is widely applied to machine learning, that is, to gain a better understanding of the underlying statistical regularity and thereby improve the predicative performance. In this regard, we shall formally define the relevance of the selected feature subset in terms of the assurance level guaranteed by the resulting KBA system and represented by the guessability metric. At a practical level, feature selection is necessary both to keep the computation tractable when hundreds of thousands of variables are present, and to cure the so called “curse of dimensionality” problem when the size of the training set is relatively small in a high-dimensional feature space. Although the practical concern is not as severe in the KBA domain since it is uncommon to have hundreds of factoids in knowledge sources, the issue of computational efficiency has implications for the response time of a KBA system, especially in online e-commerce applications. Besides, from a usability standpoint, it is necessary to focus on a small but relevant subset of factoids in KBA applications. In general, most users are considered unwilling to tolerate more than about five questions [26].

However, an important distinction should be made between the KBA problem in particular or security tasks in general, and the traditional statistical classification problem. From the model learning point of view, we can neither readily obtain the training data from real attackers, nor shall we theoretically assume that the inherently unpredictable future attacking behaviors can be extrapolated from the past fraudulent data. This challenge precludes us from directly employing the state-of-the-art classification techniques such as Support Vector Machines (SVMs), which are discriminative models and are known to be sensitive to unseen data [5]. Instead, we take the generative view to first model our opponent, i.e., potential attackers, and then model the KBA domain to deliver strong authentication systems. We do not have the training data about how those attackers will go about discharging their malicious behaviors, but we do know exactly what they are targeting; therefore, it is natural to use a game-theoretic analysis to speculate on their attacking strategies, based on some rationality assumptions. This leads to a maximum entropy approach to the feature selection problem in KBA. Our method is well grounded in probabilistic modeling and information theory. In fact, the principle of maximum entropy, with proper underlying probabilistic semantics for user authentication, provides us a unified framework for measurement, feature selection, and modeling tasks to design a secure and adaptive KBA system.

The rest of the paper is structured as follows. In Section 2, we formulate the feature selection problem in the KBA context, followed by a literature review of the major approaches to feature selection in general. In Section 3, we deliberate upon the generative model and the rational attacker assumptions. Section 4 presents three algorithms for feature selection, based on the principle of maximum entropy and characterized by increasing levels of adaptivity. In Section 5, we report the results of a computational evaluation that demonstrate the better authentication performance of the three adaptive algorithms over the random feature selection method commonly used in practical KBA deployments. We conclude in Section 6 by identifying several areas for future research.

Section snippets

Notation and terminology

We approach the KBA feature selection problem from a statistical modeling perspective, in which the knowledge with respect to authentication is embedded in a collection of discrete data. The data sources can be characterized at three hierarchical levels: factoid, identity, and domain. Formally:

  • 1.

    A factoid describes one specific characteristic of a legitimate user for the purpose of authentication, and is denoted by a pair of random variables (f, x). Here f is the name and x is the value of the

KBA model and metrics

Having formulated the feature selection problem, we now turn our attention to the KBA problem as a whole. We adopt a generative view [27], particularly Bayesian classifiers [11] as the underlying inductive paradigm for the KBA problem. To relax the naïve Bayes assumption, we model factoid dependency as a high-order Markov chain. Next, we briefly present the KBA model for the sake of readability and completeness of this work. For a detailed treatment, we refer the reader to [8].

Adaptive feature selection

In this section, we present three feature selection algorithms of increasing adaptivity based on the principle of maximum entropy, namely, domain-adaptive, identity-adaptive, and response-adaptive methods. To understand their differences, we show the graphical model representations in Fig. 1. All three feature selection schemes shown in Fig. 1(a), (b), and (c), are three-level hierarchical models; K, F, and I are domain-level variables; id and x are identity-level variables; and fi and xi are

Experiments

To empirically validate the contributions of our proposed approach to adaptive feature selection in KBA, we consider two hypotheses:

Algorithm 3 BN-KBA response-adaptive feature selection

Hypothesis 1

Adaptive feature selection methods are superior to random selection in terms of authentication accuracy and error rates.

Hypothesis 2

As the level of adaptivity increases, the corresponding feature selection methods improve significantly in terms of authentication accuracy and error rates.

Results

The replication means of the performance measures under different thresholds are plotted in Fig. 4. In terms of accuracy, the three adaptive feature selection methods remarkably outperform random selection, as shown in Fig. 4(a), especially when thres ≥ 0.5. In terms of error rates, the three adaptive selection methods have significantly lower FRR than random selection, as shown in Fig. 4(b). And the adaptive methods also obtain comparably small FAR rates. By closely examining Fig. 4(b), we can

Conclusion

In this paper, we first formally addressed the problem of feature selection in the context of KBA. Based on a Bayesian network model of KBA, the proposed feature selection methods find their root in statistical modeling and information theory. Further, under some rationality assumptions about the attacker's behavior, feature selection can be formulated as an optimization problem to maximize the KL divergence between the guessing distribution and the true empirical distribution over the same

Ye Chen is a Sr. Scientist in the Data Mining and Research group at Yahoo! Inc. He received his Ph.D. degree in Information Systems from the Operations and Information Management department, University of Wisconsin-Madison. Dr. Chen's research interests lie in statistical machine learning, particularly Bayesian methods, to solve problems in data mining, exploratory data analysis, security in e-commerce and online social networking. His research work has been published in journals such as IEEE

References (37)

  • C.J.C. Burges

    A tutorial on support vector machines for pattern recognition

    Data Mining and Knowledge Discovery

    (1998)
  • W.E. Burr et al.

    Electronic authentication guideline: recommendations of the National Institute of Standards and Technology

  • Y. Chen, A Bayesian Network Model of Knowledge-Based Authentication, Ph.D. dissertation, University Wisconsin-Madison,...
  • Y. Chen et al.

    Bayesian networks for knowledge-based authentication

    IEEE Transactions on Knowledge and Data Engineering

    (2007)
  • Y. Chen et al.

    An empirical investigation of knowledge-based authentication

  • S. Chokhani

    Knowledge based authentication (KBA) metrics

  • N. Friedman et al.

    Bayesian network classifiers

    Machine Learning

    (1997)
  • W.R. Gilks

    Markov Chain Monte Carlo in Practice

    (1995)
  • Cited by (26)

    • Enriched ant colony optimization and its application in feature selection

      2014, Neurocomputing
      Citation Excerpt :

      Voxel selection is provided by voxel based morphometry which finds statistically significant clusters of voxels that have differences across MRI volumes on a paired dataset of Alzheimer disease and healthy controls. In another example of FS application, Chen and Liginlal [28] used a wrapper method for knowledge-based authentication. Here, the learning machine is a generative probabilistic model, with the objective of maximizing the Kullback–Leibler divergence between the true empirical distribution defined by the legitimate knowledge and the approximating distribution representing an attacking strategy that both reside in the same feature space.

    • Mutual information based input feature selection for classification problems

      2012, Decision Support Systems
      Citation Excerpt :

      The improvements are in terms of wider applicability, as good or better performance on the example classification tasks, a reduction in computational complexity and more information produced on the specific feature–feature interaction presented in a classification problem. A number of algorithms [1,3,4,14,15,17,18,19] have been proposed for feature ranking and selecting the optimal feature space. The clamping method in [18] uses trained multi-layer perceptrons [5,10,16] to rank the relative importance, or salience, of input features.

    • Detection of financial statement fraud and feature selection using data mining techniques

      2011, Decision Support Systems
      Citation Excerpt :

      Mladenic and Grobelnik [31] reviewed various feature selection methods in the context of web mining. Chen and Liginlal [11] developed a maximum entropy based feature selection technique for knowledge based authentication. The dataset analyzed in this paper comprised 35 financial items for 202 companies, of which 101 were fraudulent and 101 were non-fraudulent.

    • Exploring optimization of semantic relationship graph for multi-relational Bayesian classification

      2009, Decision Support Systems
      Citation Excerpt :

      In this work, we also introduce feature selection that works together with relation selection. It has been shown that feature selection can improve the performance of learners in various areas such as text learning and authentication [3,15]. To apply feature selection to the classification tasks, we notice that even among the features that belong to the same relation, some are relevant and useful, while others may be redundant or irrelevant.

    • Histogram distance-based Bayesian Network structure learning: A supervised classification specific approach

      2009, Decision Support Systems
      Citation Excerpt :

      Further work should include the use of more sophisticated paradigms as structure learning algorithms and the application of Estimation of Distribution Algorithms [50] in conjunction with the new proposed metric. A Feature Selection process [12] could also be used. It also would be interesting and highly recommendable to increase the effort in developing classifier comparison methods in ML, in the direction proposed by [7], and testing the approach with incomplete databases [60].

    View all citing articles on Scopus

    Ye Chen is a Sr. Scientist in the Data Mining and Research group at Yahoo! Inc. He received his Ph.D. degree in Information Systems from the Operations and Information Management department, University of Wisconsin-Madison. Dr. Chen's research interests lie in statistical machine learning, particularly Bayesian methods, to solve problems in data mining, exploratory data analysis, security in e-commerce and online social networking. His research work has been published in journals such as IEEE Transactions on Knowledge and Data Engineering and Journal of Database Management.

    Divakaran Liginlal is an Assistant Professor in the School of Business, University of Wisconsin, Madison. He received a B.S. in Telecommunication Engineering from the University of Kerala, M.S. in Computer Science from the Indian Institute of Science at Bangalore, India, and a Ph.D. in Management Information Systems from the University of Arizona. Prof. Liginlal's research interests are in computational and cognitive models of decision making and problem solving and information security technologies and strategies. His research work has been published in journals such as Fuzzy Sets and Systems, IEEE Transactions on Knowledge and Data Engineering, IEEE Transactions on Systems, Man, and Cybernetics, and the European Journal of Operational research. His teaching and research have received funding from organizations such as Microsoft Corporation, Hewlett Packard, CISCO, and the International Center for Automated Information Research at the University of Florida.

    View full text