ABSTRACT
Online search systems that display ads continually offer new features that advertisers can use to fine-tune and enhance their ad campaigns. An important question is whether a new feature actually helps advertisers. In an ideal world for statisticians, we would answer this question by running a statistically designed experiment. But that would require randomly choosing a set of advertisers and forcing them to use the feature, which is not realistic. Accordingly, in the real world, new features for advertisers are seldom evaluated with a traditional experimental protocol. Instead, customer service representatives select advertisers who are invited to be among the first to test a new feature (i.e., white-listed), and then each white-listed advertiser chooses whether or not to use the new feature. Neither the customer service representative nor the advertiser chooses at random.
This paper addresses the problem of drawing valid inferences from whitelist trials about the effects of new features on advertiser happiness. We are guided by three principles. First, statistical procedures for whitelist trials are likely to be applied in an automated way, so they should be robust to violations of modeling assumptions. Second, standard analysis tools should be preferred over custom-built ones, both for clarity and for robustness. Standard tools have withstood the test of time and have been thoroughly debugged. Finally, it should be easy to compute reliable confidence intervals for the estimator. We review an estimator that has all these attributes, allowing us to make valid inferences about the effects of a new feature on advertiser happiness. In the example in this paper, the new feature was introduced during the holiday shopping season, thereby further complicating the analysis.
- O. Ashenfelter. Using the longitudinal structure of earnings to estimate the effect of training programs. Review of Economics and Statistics, 67:648--660, 1985.Google ScholarCross Ref
- O. Ashenfelter and D. Card. Estimating the effect of training programs on earnings. Review of Economics and Statistics, 60(1):47--57, 1978.Google ScholarCross Ref
- R. Fisher. The Design of Experiments. Hafner Publishing Company, 1935.Google Scholar
- P. Holland. Statistics and causal inference (with discussion). Journal of the American Statistical Association, 81:945--970, 1986.Google ScholarCross Ref
- D. Horvitz and D. Thompson. A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association, 47:663--685, 1952.Google ScholarCross Ref
- G. Imbens. Nonparametric estimation of average treatment effects under exogeneity: A review. Review of Economics and Statistics, 86(1):4--29, 2004.Google ScholarCross Ref
- J. K. Lunceford and M. Davidian. Stratification and weighting via the propensity score in estimation of causal treatment effects: A comparative study. Statistics in Medicine, 23:2937--2960, 2007.Google ScholarCross Ref
- D. McCaffrey, G. Ridgeway, and A. Morral. Propensity score estimation with boosted regression for evaluating causal effects in observational studies. Psychological Methods, 9(4):403--425, 2004.Google ScholarCross Ref
- J. Pearl. Causality: Models, Reasoning, and Inference. Cambridge University Press, 2000. Google ScholarDigital Library
- J. Robins and A. Rotnitzky. Semiparametric efficiency in multivariate regression models with missing data. Journal of the American Statistical Association, 90:122--129, 1995.Google ScholarCross Ref
- J. Robins, A. Rotnitzky, and L. Zhao. Analysis of semiparametric regression models with missing data. Journal of the American Statistical Association, 90:106--121, 1995.Google ScholarCross Ref
- P. Rosenbaum and D. Rubin. The central role of the propensity score in observational studies for causal effects. Biometrika, 70:41--55, 1983.Google ScholarCross Ref
- D. Rubin. Estimating causal effects of treatments in randomized and non-randomized studies. Journal of Educational Psychology, 66:688--701, 1974.Google ScholarCross Ref
- A. Smith and C. Elkan. A bayesian network framework for reject inference. In Proceedings ACM SIGKDD, pages 286--295, 2004. Google ScholarDigital Library
Index Terms
- More bang for their bucks: assessing new features for online advertisers
Recommendations
More bang for their bucks: assessing new features for online advertisers
Special issue on visual analyticsOnline search systems that display ads continually offer new features that advertisers can use to fine-tune and enhance their ad campaigns. An important question is whether a new feature actually helps advertisers. In an ideal world for statisticians, ...
Online effects of offline ads
ADKDD '08: Proceedings of the 2nd International Workshop on Data Mining and Audience Intelligence for AdvertisingWe propose a methodology for assessing how ad campaigns in offline media such as print, audio and TV affect online interest in the advertiser's brand. Online interest can be measured by daily counts of the number of search queries that contain brand ...
Forecasting Click-Through Rates Based on Sponsored Search Advertiser Bids and Intermediate Variable Regression
To participate in sponsored search online advertising, an advertiser bids on a set of keywords relevant to his/her product or service. When one of these keywords matches a user search string, the ad is then considered for display among sponsored search ...
Comments