On the differential benchmarking of promotional efficiency with machine learning modeling (I): Principles and statistical comparison

doi:10.1016/j.eswa.2012.04.017

Expert Systems with Applications

Volume 39, Issue 17, 1 December 2012, Pages 12772-12783

https://doi.org/10.1016/j.eswa.2012.04.017 Get rights and content

Abstract

Sales promotions have become in recent years a paramount issue in the marketing strategies of many companies, and they have even more relevance in the present economic situation. Currently, the empirical models, aimed at assessing consumers behavior in response to certain sales promotions activities such as temporary price reductions, are receiving growing attention in this relevant research field, due to two reasons mainly: (1) the complexity of the interactions among the different elements incorporated inside promotions campaigns attracts growing attention; and (2) the increased availability of electronic records on sales history. Hence, it will become important that the performance description and comparison among all available machine learning promotion models, as well as their design parameters selection, will be performed using a robust and statistically rigorous procedure, while keeping functionality and usefulness. In this paper, we first propose a simple nonparametric statistical tool, based on the paired bootstrap resampling, to allow an operative result comparison among different learning-from-samples promotional models. Secondly, we use the bootstrap statistical description to evaluate the models in terms of average and scatter measurements, for a more complete efficiency characterization of the promotional sales models. These statistical characterizations allow us to readily work with the distribution of the actual risk, in order to avoid overoptimistic performance evaluation in the machine learning based models. We also present the analysis performed to determinate whether the figure of merit has a significant impact on final result, together with an in depth design parameter selection to optimize final results during the promotion evaluation using statistical learning techniques. No significant difference was obtained in terms of figure of merit choice, and Mean Absolute Error was selected for performance measurement. As a summary, the applied technique allows clarifying the design of the promotional sales models for a real database (milk category), according to the influence of the figure of merit used for design parameters selection, showing the robustness of the machine learning techniques in this setting. Results obtained in this paper will be subsequently applied, and presented in the companion paper, devoted to a more detailed quality analysis, to evaluate four well-known machine learning algorithms in real databases for two categories with different promotional behavior.

Highlights

► Empirical models of sales promotion are relevant for marketing strategies. ► A simple statistical tool allows operative comparisons among promotional models. ► Bootstrap statistical description is used to evaluate the models in terms of average and scatter measurements. ► Different figure of merits, and structured parameter selection, allowed an optimized promotion modeling. ► Prediction quality was robust with respect to the design parameters selection.

Introduction

The current economic landscape, characterized by financial instability and the consequent changes in consumer behavior, is driving a transformation in food retailer decision, bringing to a new and more aggressive promotional perspective (Quelch, 2008). As an example of this situation, the dramatic sales reduction of food products in Spain, which has led retailers in the industry to implement new approaches, such as the intense use of private label products, can be mentioned. In addition, it has been also searched to increase consumer’s frequent purchases through promotional activities, such as promotional discounts, feature advertising, and promotional packs (e.g., “buy 3 get 1 free”) (Quelch, 2008). Therefore, there is no denying that sales promotion has become in recent years a key tool for marketing strategies in retail food markets, and for this reason, investment in this area has strongly increased, reaching values over 50% of marketing budgets in relation to other communications tools (Villalba & Iñaki, 2002).

The present economic situation, along with food retailer’s strategies and increasing investment in promotional activities, has motivated an important number of research efforts to characterize sales promotion and to measure promotional efficiency. Existing models for analysing sales promotions effects can be classified into two separate groups. In the first group, namely theoretical models, consumer behavior is basically evaluated considering a sociological and psychological perspective, whilst in the second group of empirical models, promotional structures based on empirical information extracted from historical databases are usually built.

Within that last group, the efforts have been focused during the last decades on the understanding of sales promotion dynamics based on classical statistical analysis methods, and more recent works are concentrated towards the machine learning algorithmic and data mining techniques, as powerful tools to extract information from existing recorded data (Mitchell, 1997, Van Heerde et al., 2000). Machine learning techniques aim to find recurring patterns, trends, or rules, which can explain the data behavior in a given context, and then allows to extract new knowledge on the consumer behavior, to improve the performance of marketing operations, and to estimate the commonly called Deal Effect Curve (DEC). In particular, a vast amount of knowledge has been extracted from machine learning techniques, although not all promotional behaviors have been studied and there is still room for in depth further studies (Bell et al., 1999, Blatterg et al., 1995, Leeflang and Wittingk, 2000). More specifically, operational problems arise in machine learning promotional modeling, when based on nonlinear estimation techniques, for evaluating and demonstrating working hypothesis (Liu et al., 2004, Martínez-Ruiz et al., 2005, Martı´nez-Ruiz et al., 2006b, Martínez-Ruiz et al., 2006a, Van Heerde et al., 2001, Wang et al., 2008). In this paper we present, prior to an in detail analysis of machine-learning performance, a pre-evaluation of different figures of merit, to asses their impact on the final result, together with an in depth analysis for design parameters selection. Results obtained here allowed establishing and validating figures of merit and the design parameters selection procedure to be applied for pricing promotion study. Specifically, we applied these results in the companion paper (Soguero-Ruiz, Gimeno-Blanes, Mora-Jiménez, Martı´nez-Ruiz, & Rojo-Álvarez, 2012), to evaluate four well-known machine learning algorithms in two real databases for two categories presenting dramatically different promotional behavior.

The draw of the paper is as follows. Section 2 includes a review of basic concepts of sales promotion in retailer environments, as well as a summary of previous work on machine learning in the context of marketing. Section 3 gives a short description of nonparametric inference paired hypothesis tests, based on bootstrap resampling, and actual risk evaluation of a set of adequate figures of merit is introduced. The figure of merit benchmark represents a relevant contribution of this paper, as it ends up becoming an operative tool for decision support in promotional modeling with machine learning techniques. Section 4 summarizes experiments and results based on real data. Finally in Section 5, conclusion and remarks are presented.

Section snippets

Background

Though many definitions have been published for the term sales promotion (Blattberg and Neslin, 1990, Kotler and Keller, 2005, Yeshin, 2006), none of them are generally accepted, but general consensus suggests that sales promotions consist basically of short-time sales incentives (Blattberg and Neslin, 1990, Kotler and Keller, 2005). For instance, the American Marketing Association (AMA) defines sales promotion as a media and non media marketing pressure applied for a predetermined, limited

Paired bootstrap resampling of actual risk

In this section, we present the data model for sales promotion, the statistical learning from samples techniques used in this study to design the sales promotion model, and the design parameters selection that has to be addressed usually by means of cross-validation techniques of some kind, which raises the concepts of empirical risk and actual risk. Additionally, different figures of merit are presented, as well as the Bootstrap resampling technique, to yield an estimation of the actual risk.

Experiments

This section presents the first set of experimental studies, using two food products databases, namely, milk and beer categories. In Experiment 1, some examples of the estimation of the own-effect given by the DEC in the absence of other explanatory effects were checked, for evaluating the necessity of considering additional exogenous effects in order to provide with an adequate promotional sales model. Then, the relevant issue of the design parameter selection is analyzed for the proposed

Conclusions

This work first aimed to analyze the multiple and simultaneous effect of different promotional activities from retailers with individual DEC by products. It was subsequently reported the complexity of the cross-effects among different promotional activities taking place at the same time. These promotional activities and discounts strategies represent a second-order effect of the current economic situation where retailers are seeking for strategies that lead to better performance – especially in

Acknowledgment

This work was supported by Fundación Ramón Areces (Spain).

References (41)

W.A. Kamakura et al.
Chain-wide and store-level analysis for cross-category management
Journal of Retailing
(2007)
V. Kumar et al.
Assessing the competitive impact of type, timing, frequency, and magnitude of retail promotions
Journal of Business Research
(1997)
P.S.H. Leeflang et al.
Building models for marketing decisions: Past, present and future
International Journal of Research in Marketing
(2000)
P.S.H. Leeflang et al.
Decomposing the sales promotion bump accounting for cross-category effects
International Journal of Research in Marketing
(2008)
M.P. Martínez-Ruiz et al.
Using daily store-level data to understand price promotion effects in a semiparametric regression model
Retailing and Consumer Services
(2006)
V. Shankar et al.
Relating price sensitivity to retailer promotional variables and pricing policy: An empirical analysis
Journal of Retailing
(1996)
G.B. Voss et al.
Exploring the effect of retail sector and firm characteristics on retail price promotion strategy
Journal of Retailing
(2003)
L. Araujo
Stochastic parsing and evolutionary algorithms
Applied Artificial Intelligence
(2009)
D.R. Bell et al.
The descomposition of promotional response: an empirical generalization
Marketing Science
(1999)
E.A. Blair et al.
The effects of reference prices in retail advertisements
The Journal of Marketing
(1981)

R.C. Blattberg et al.

Sales promotion: Concepts, methods, and strategies

(1990)

Blattberg, R. G. & Neslin, S. (1993). Sales promotions models. In Handbook in operations research and management...

R.G. Blatterg et al.

How promotion work

Marketing Science

(1995)

S.A. Caraballo et al.

New figures of merit for best-first probabilistic chart parsing

Computational Linguistics

(1998)

T. Cover et al.

Nearest neighbor pattern classification

IEEE Transactions on Information Theory

(1967)

R.O. Duda et al.

Pattern classification

(2001)

B. Efron et al.

An introduction to the bootstrap

(1997)

R.C. Goodstein

Marketing for the entrepreneur: Customer focus to multiple constituencies

Entrepreneurship and Economic Growth in the American Economy

(2000)

S. Haykin

Neural networks

(1999)

Van Heerde, H. J., Leeflang, P. S. H. & Wittink, D. R. (2000). Building models for marketing decision: Past, present...

Cited by (14)

Dimensionality reduction and ensemble of LSTMs for antimicrobial resistance prediction
2023, Artificial Intelligence in Medicine
Bacterial resistance to antibiotics has been rapidly increasing, resulting in low antibiotic effectiveness even treating common infections. The presence of resistant pathogens in environments such as a hospital Intensive Care Unit (ICU) exacerbates the critical admission-acquired infections. This work focuses on the prediction of antibiotic resistance in Pseudomonas aeruginosa nosocomial infections at the ICU, using Long Short-Term Memory (LSTM) artificial neural networks as the predictive method. The analyzed data were extracted from the Electronic Health Records (EHR) of patients admitted to the University Hospital of Fuenlabrada from 2004 to 2019 and were modeled as Multivariate Time Series. A data-driven dimensionality reduction method is built by adapting three feature importance techniques from the literature to the considered data and proposing an algorithm for selecting the most appropriate number of features. This is done using LSTM sequential capabilities so that the temporal aspect of features is taken into account. Furthermore, an ensemble of LSTMs is used to reduce the variance in performance. Our results indicate that the patient’s admission information, the antibiotics administered during the ICU stay, and the previous antimicrobial resistance are the most important risk factors. Compared to other conventional dimensionality reduction schemes, our approach is able to improve performance while reducing the number of features for most of the experiments. In essence, the proposed framework achieve, in a computationally cost-efficient manner, promising results for supporting decisions in this clinical task, characterized by high dimensionality, data scarcity, and concept drift.
Informative variable identifier: Expanding interpretability in feature selection
2020, Pattern Recognition
Citation Excerpt :
Special attention has also been lately paid to techniques that assess a feature’s relevance by analyzing the confidence level or consistency of a weight assigned to the feature and estimated with resampling or subsampling techniques. These methods have been proven useful for identification of relevant brain regions or for mental disease biomarkers design [8,15], for block analysis of best student profiles in a worldwide study such as the Programme for International Student Assessment (PISA) [16], for prediction of infection [17], or for forecasting promotional sales [18,19]. The IVI feature selection algorithm proposed here fits into EFS by the ensemble, as far as its main target is to provide us with a better knowledge of the relationships among features.
There is nowadays an increasing interest in discovering relationships among input variables (also called features) from data to provide better interpretability, which yield more confidence in the solution and provide novel insights about the nature of the problem at hand. We propose a novel feature selection method, called Informative Variable Identifier (IVI), capable of identifying the informative variables and their relationships. It transforms the input-variable space distribution into a coefficient-feature space using existing linear classifiers or a more efficient weight generator that we also propose, Covariance Multiplication Estimator (CME). Informative features and their relationships are determined analyzing the joint distribution of these coefficients with resampling techniques. IVI and CME select the informative variables and then pass them on to any linear or nonlinear classifier. Experiments show that the proposed approach can outperform state-of-art algorithms in terms of feature identification capabilities, and even in classification performance when subsequent classifiers are used.
Statistical nonlinear analysis for reliable promotion decision-making
2014, Digital Signal Processing: A Review Journal
Citation Excerpt :
Statistical learning for promotional modeling. Other researchers have proposed nonlinear statistical learning algorithms, including such classic nonparametric methods as k nearest neighbors (k-NN) and kernel estimators [34], as well as the new learning techniques such as neural networks and support vector machines [12,13,35–38]. It is worthwhile to note that nonlinearity, nonnormal errors, and heteroscedasticity are automatically harmonized by these kinds of methods.
New economic conditions have led to innovations in retail industries, such as more dynamic retail approaches based on flexible strategies. We propose and compare different approaches incorporating nonlinear methods for promotional decision-making using retail aggregated data registered at the point of the sale. Specifically, this paper describes a reliable quantification tool as an effective information system leveraged on recent and historical data that provides managers with an operative vision. Furthermore, a new set of indicators are proposed to evaluate the reliability and stability of the data model in the multidimensional feature space by using nonparametric resampling techniques. This allows the user to make a clearer comparison among linear, nonlinear, static, and dynamic data models, and to identify the uncertainty of different feature space regions, for example, corresponding to the most frequent deal features. This methodology allows retailers to use aggregated data in suitable conditions that will result in acceptable confidence intervals. To test the proposed methodology, we used a database containing the sales history of representative products registered by a Spanish retail chain. The results indicate that: (1) the deal effect curve analysis and the time series linear model do not provide enough expressive capacity, and (2) nonlinear promotional models more accurately follow the actual sales pattern obtained in response to the implemented sales promotions. The quarterly temporal analysis conducted enabled the authors to identify long-term changes in the dynamics of the model for several products, especially during the early stage of most recent economic crisis, consistent with the information provided by the reliability indices in terms of the feature space. We conclude that the proposed method provides a reliable operative tool for decision support, allowing retailers to alter their strategies to accommodate consumer behavior.
On the differential benchmarking of promotional efficiency with machine learning modelling (II): Practical applications
2012, Expert Systems with Applications
Citation Excerpt :
The draw of the paper is as follows. Section 2 presents a description of the machine learning techniques analyzed in this work, and a short summary of the method proposed in the companion paper (Soguero-Ruiz et al., 2012). Afterward, the two databases to be used for sales promotion modelling are described (milk and beer category) in Section 3.
The assessment of promotional sales with models constructed by machine learning techniques is arousing interest due, among other reasons, to the current economic situation leading to a more complex environment of simultaneous and concurrent promotional activities. An operative model diagnosis procedure was previously proposed in the companion paper, which can be readily used both for agile decision making on the architecture and implementation details of the machine learning algorithms, and for differential benchmarking among models. In this paper, a detailed example of model analysis is presented for two representative databases with different promotional behaviour, namely, a non-seasonal category (milk) and a heavily seasonal category (beer). The performance of four well-known machine learning techniques with increasing complexity is analyzed in detail here. In particular, k-Nearest Neighbours, General Regression Neural Networks, Multilayer Perceptron (MLP), and Support Vector Machines (SVM), are differentially compared. Present paper evaluates these techniques along the experiments described for both categories when applying the methodological findings obtained in the companion paper. We conclude that some elements included in the architecture are not essential for a good performance of the machine learning promotional models, such as the semiparametric nature of the kernel in SVM models, whereas other can be strongly dependent of the database, such as the convenience of multiple output models in MLP regression schemes. Additionally, the specificity of the behaviour of certain categories and product ranges determines the need to establish suitable and specific procedures for a better prediction and feature extraction.
Marketing Strategy and Artificial Intelligence State of the Art and Research Agenda
2024, Journal of Telecommunications and the Digital Economy
A Subtle Design of Prediction Models Using Machine Learning Algorithms for Advocating Selection and Forecasting Sales of Garments: A Case Study
2024, Lecture Notes in Networks and Systems

View all citing articles on Scopus

View full text

On the differential benchmarking of promotional efficiency with machine learning modeling (I): Principles and statistical comparison

Abstract

Highlights

Introduction

Section snippets

Background

Paired bootstrap resampling of actual risk

Experiments

Conclusions

Acknowledgment

Journal of Retailing

Journal of Business Research

International Journal of Research in Marketing

International Journal of Research in Marketing

Retailing and Consumer Services

Journal of Retailing

Journal of Retailing

Stochastic parsing and evolutionary algorithms

Applied Artificial Intelligence

The descomposition of promotional response: an empirical generalization

Marketing Science

The effects of reference prices in retail advertisements

The Journal of Marketing

Sales promotion: Concepts, methods, and strategies

How promotion work

Marketing Science

New figures of merit for best-first probabilistic chart parsing

Computational Linguistics

Nearest neighbor pattern classification

IEEE Transactions on Information Theory

Pattern classification

An introduction to the bootstrap

Marketing for the entrepreneur: Customer focus to multiple constituencies

Entrepreneurship and Economic Growth in the American Economy

Neural networks