Skip to main content

Advertisement

Log in

Stopping rules as experimental design

  • Paper in General Philosophy of Science
  • Published:
European Journal for Philosophy of Science Aims and scope Submit manuscript

Abstract

A “stopping rule” in a sequential experiment is a rule or procedure for deciding when that experiment should end. Accordingly, the “stopping rule principle” (SRP) states that, in a sequential experiment, the evidential relationship between the final data and an hypothesis under consideration does not depend on the experiment’s stopping rule: the same data should yield the same evidence, regardless of which stopping rule was used. In this essay, I reconstruct and rebut five independent arguments for the SRP. Reminding oneself that the stopping rule is a part of an experiment’s design and is no more mysterious than many other design aspects helps elucidate why some of these arguments for the SRP are unsound.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. Despite their name, sequential experiments need not involve any robust experimenter control or manipulation.

  2. Technically, this is restricted to non-informative stopping rules, ones which when learned provide no more information about the hypothesis of interest than the data themselves. All parties are in agreement that the SRP does not apply for informative stopping rules. See Raiffa and Shlaifer (1961, pp. 36–42) and Berger and Wolpert (1988, §4.2.7) for formal definitions, examples, and discussion.

  3. I have chosen to focus on the SRP rather than the likelihood principle in this essay because of its concreteness and the arguments found in the literature concerning it in particular.

  4. Arguments related to this had been much earlier stated (Edwards et al. 1963, p. 237), its conclusion well-known (Savage 1962, p. 17), but Steel (2003) was, as far as I know, the first to point out the implicit assumption about the dependence of the evidential measure on only the priors and posteriors. (This is not because, e.g., Savage and others might have been focusing more on decision rather than evidence; they just seemed to have assumed as a matter of course that evidence for a hypothesis provided by data is given by the posterior probability for that hypothesis.) When this assumption does not hold, Bayesian measures of evidence need not satisfy the SRP.

  5. The example is an amalgam of those by Savage (1962, pp. 17–8) and Mayo and Kruse (2001, pp. 387–8).

  6. Savage continues: “Never having been comfortable with that argument, I am not advancing it myself.” However, he does shortly thereafter (Edwards et al. 1963). (See also the discussion by Mayo (1996, p. 346–7).)

  7. Perhaps it’s a hangover from radical behaviorism?

  8. Perhaps it goes without saying, but this is emphatically not a fringe or speculative research program in psychology: as of the end of September, 2018, when this passage was written, these five works collectively have over 176,000 citations according to Google Scholar.

  9. They also point out that replication is not necessary in some sciences, but this is besides the point: as long as it is a concern in some sciences, it helps block the argument from intentions.

  10. Another possible response, suggested by Livengood (2019), is that in fact two experiments were being performed, since sometimes creative intentions can matter to what exists. This would be the case when the design is part of the experiment itself, so that two different designs entail two different experiments. The evidential import of each experiment, then, can be evaluated separately. It is not yet clear to me how one should understand these two experiments with respect to the problem of use-novelty or double-counting of data, so I won’t discuss it further.

  11. Although credibility can be considered a property of a scientist qua epistemic agent, it seems more important in scientific endeavors as attached to particular claims; Shapin (2010), for example, defends the particularity of credibility claims in science.

  12. Except for insisting on modifications of feasibility to approximate feasibility, I will not challenge this premise further, although one could (Southwood 2016).

  13. The issue of “correctness” is actually orthogonal to the issues here and concerns more whether the statistical model and experimental design for the sequential experiment were misspecified.

  14. Steele (2013, p. 945) suggests that all one needs to do is include the different stopping events in the total outcome space for the experiment, but this does not rebut the argument against the practical impossibility of specifying what these all are, as she seems to suggest.

  15. One can also treat more complicated stopping rules (Raiffa and Shlaifer 1961, pp. 39–40).

  16. See also the discussion in Mayo (1996, p. 350–1).

  17. Data sets that are so intractable as to be insusceptible even to analyses conditional on various plausible stopping rules, have, on our substitution heuristic, a substantial part of their experimental design unarticulated. In such cases I do not see any necessity to analyze them. But even if there were, and even setting aside the other problems with the stronger version of the descriptive premise, establishing that there are some such intractable examples is logically insufficient to establish the SRP, for the same reasons as for the argument from impracticality discussed in Section 4.1.

  18. Sprenger (2009, p. 647) attributes it to Teddy Seidenfeld.

  19. As described in these references, the likelihood principle is equivalent to two other principles of sufficiently and conditionality, respectively. Some arguments for the conditionality principle are similar to those for the SRP.

References

  • Azjen, I., & Fishbein, M. (1980). Understanding attitudes and predicting social behavior. Englewood Cliffs: Prentice-Hall.

    Google Scholar 

  • Azjen, I. (1985). From intentions to actions: A theory of planned behavior. In Kuhl, J, & Beckmann, J (Eds.) Action control (pp. 11–39). Berlin: Springer.

  • Azjen, I. (1991). The theory of planned behavior. Organizational Behavior and Human Decision Processes, 50(2), 179–211.

    Article  Google Scholar 

  • Backe, A. (1999). The likelihood principle and the reliability of experiments. Philosophy of Science, 66(Proceedings), S354–S361.

    Article  Google Scholar 

  • Berger, J.O., & Wolpert, R.L. (1988). The likelihood principle, 2nd edn. Hayward: Institute of Mathematical Statistics.

    Google Scholar 

  • Berry, S., & Viele, K. (2008). A note on hypothesis testing with random sample sizes and its relationship with Bayes factors. Journal of Data Science, 6, 75–87.

    Google Scholar 

  • Birnbaum, A. (1962). On the foundations of statistical inference. Journal of the American Statistical Association, 57(298), 269–306.

    Article  Google Scholar 

  • Carnap, R. (1950). Empiricism, semantics, and ontology. Revue Internationale de Philosophie, 4(11), 20–40.

    Google Scholar 

  • Cohen, A.J. (2010). A conceptual and (preliminary) normative exploration of waste. Social Philosophy and Policy, 27(2), 233–273.

    Article  Google Scholar 

  • Cox, D.R. (1958). Some problems connected with statistical inference. Annals of Mathematical Statistics, 29, 357–372.

    Article  Google Scholar 

  • Edwards, W., Lindman, H., Savage, L.J. (1963). Bayesian statistical inference for psychological research. Psychological Review, 70(3), 193–242.

    Article  Google Scholar 

  • Fishbein, M., & Azjen, I. (1975). Belief, attitude, intention, and behavior: An introduction theory and research. Reading: Addison-Wesley.

    Google Scholar 

  • Fishbein, M., & Azjen, I. (2011). Predicting and changing behavior: The reasoned action approach. New York: Psychology Press.

    Book  Google Scholar 

  • Fletcher, G.P. (1998). Basic concepts of criminal law. New York: Oxford University Press.

    Google Scholar 

  • Franklin, A. (1994). How to avoid the experimenters’ regress. Studies in History and Philosophy of Science, 25(3), 463–491.

    Article  Google Scholar 

  • Franklin, A. (2010). Gravity waves and neutrinos: The later work of Joseph Weber. Perspectives on Science, 18(2), 119–151.

    Article  Google Scholar 

  • Gandenberger, G. (2015). Differences among noninformative stopping rules are often relevant to Bayesian decisions. arXiv:1707.00214.

  • Gillies, D. (1990). Bayesianism versus falsificationism. Ratio (New Series), III (1), 82–98.

    Google Scholar 

  • Hacking, I. (1965). The logic of statistical inference. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Howson, C., & Urbach, P. (2006). Scientific reasoning: The Bayesian approach, 3rd edn. Chicago: Open Court.

    Google Scholar 

  • Huber, F. (2018). Confirmation theory. In: The internet encyclopedia of philosophy. n.d. Accessed 30 Mar.

  • Kelly, T. (2016). Evidence. In Zalta, E.N. (Ed.) The Stanford encyclopedia of philosophy. Winter 2016 edition. Stanford University: Metaphysics Research Laboratory.

  • Koike, D.A. (1989). Pragmatic competence and adult L2 acquisition: Speech acts in interlanguage. The Modern Language Journal, 73(3), 279–289.

    Article  Google Scholar 

  • Livengood, J. (2019). Counting experiments. Philosophical Studies, 176(1), 175–195.

    Article  Google Scholar 

  • Malinksy, D. (2015). Hypothesis testing, “Dutch book” arguments, and risk. Philosophy of Science, 82(5), 917–929.

    Article  Google Scholar 

  • Mayo, D.G. (1996). Error and the growth of experimental knowledge. Chicago: University of Chicago Press.

    Book  Google Scholar 

  • Mayo, D.G., & Kruse, M. (2001). Principles of inference and their consequences. In Corfield, D., & Williamson, J. (Eds.) Foundations of Bayesianism (pp. 381–403). Dordrecht: Kluwer.

  • Raiffa, H., & Shlaifer, R. (1961). Applied statistical decision theory. Boston: Harvard University.

    Google Scholar 

  • Reiss, J., & Sprenger, J. (2017). Scientific objectivity. In Zalta, E.N. (Ed.) The Stanford encyclopedia of philosophy. Winter 2017 edition. Stanford University: Metaphysics Research Laboratory.

  • Romeijn, J.-W. (2017). Philosophy of statistics. In Zalta, E.N. (Ed.) The Stanford encyclopedia of philosophy. Spring 2017 edition. Stanford University: Metaphysics Research Laboratory.

  • Savage, L.J. (1962). The foundations of statistical inference: A discussion. London: Methuen.

    Google Scholar 

  • Schervish, M.J., Seidenfeld, T., Kadane, J.B. (2002). A rate of incoherence applied to fixed-level testing. Philosophy of Science, 69(Proceedings), S248–S264.

    Article  Google Scholar 

  • Shapin, S. (2010). Never pure: Historical studies of science as if it was produced by people with bodies, situated in time, space, culture, and society, and struggling for credibility and authority. Baltimore: Johns Hopkins University Press.

    Google Scholar 

  • Siegmund, D. (1985). Sequential analysis: Tests and confidence intervals. New York: Springer.

    Book  Google Scholar 

  • Southwood, N. (2016). Does “ought” imply “feasible”? Philosophy and Public Affairs, 44(1), 7–45.

    Article  Google Scholar 

  • Sprenger, J. (2009). Evidence and experimental design in sequential trials. Philosophy of Science, 76(5), 637–649.

    Article  Google Scholar 

  • Steel, D. (2003). A Bayesian way to make stopping rules matter. Synthese, 58, 213–227.

    Google Scholar 

  • Steele, K. (2013). Persistent experimenters, stopping rules, and statistical inference. Erkenntnis, 78, 937–961.

    Article  Google Scholar 

  • Wald, A. (1947). Sequential analysis. New York: Wiley.

    Google Scholar 

  • Whitehead, J. (1997). The design and analysis of sequential clinical trials, 2nd edn. New York: Wiley.

    Google Scholar 

Download references

Acknowledgements

Thanks to Greg Gandenberger, Kasey Genin, Jonathan Livengood, Dan Malinksy, Conor Mayo-Wilson, Jan Sprenger, and an anonymous referee for comments on a previous version, and audiences at Minnesota, Munich, Bologna (SILFS2017), Edinburgh (BSPS2017), and Exeter (EPSA2017) for their insightful comments. Part of this work was completed with the support of a European Commission Marie Curie Fellowship (PIIF-GA-2013-628533).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Samuel C. Fletcher.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the Topical Collection on EPSA17: Selected papers from the biannual conference in Exeter

Guest Editors: Thomas Reydon, David Teira, Adam Toon

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fletcher, S.C. Stopping rules as experimental design. Euro Jnl Phil Sci 9, 29 (2019). https://doi.org/10.1007/s13194-019-0252-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13194-019-0252-x

Keywords

Navigation