Skip to main content
Log in

You Can’t Always Get What You Want Some Considerations Regarding Conditional Probabilities

  • Original Article
  • Published:
Erkenntnis Aims and scope Submit manuscript

Abstract

The standard treatment of conditional probability leaves conditional probability undefined when the conditioning proposition has zero probability. Nonetheless, some find the option of extending the scope of conditional probability to include zero-probability conditions attractive or even compelling. This article reviews some of the pitfalls associated with this move, and concludes that, for the most part, probabilities conditional on zero-probability propositions are more trouble than they are worth.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Notes

  1. We will also use the notation \(P(A|C)\), when convenient.

  2. The fact that, in probability theory, we can’t always get what we want, is a familiar fact. We might want our probability function to be defined on arbitrary subsets of our probability space, but, as is well-known, we can’t always do so while satisfying desiderata such as symmetry conditions and countable additivity. Consider, for example, the task of defining a uniform distribution—that is, a distribution invariant under all rotations—on the unit circle. There can be no distribution that is invariant under rotations, is countably additive, and is defined on all subsets of the unit circle. The proof is found in many probability texts, e.g., Billingsley (2012, p. 47). The standard response is to preserve countable additivity and to restrict the domain of definition of the probability function to certain subsets of the probability space, the measurable sets, leaving the probability of other sets undefined. In one and two dimensions, as Banach (1923) showed, one can extend the probability function to one defined on arbitrary subsets, if one is willing to give up countable additivity. The well-known Banach-Tarski paradox shows that we can’t do so in three-dimensional space; there is no finitely additive set function that is defined on all subsets and invariant under translations and rotations.

  3. Adapted from Hájek (2003).

  4. Based on Kolmogorov (1950, §V.2), which in turn is based on Borel (1909, §45) (§8.6 of Borel 1965). See also Jaynes (2003, §15.7), Hájek (2003, §4.4).

  5. To see this: let \(X\) be any random variable, with distribution \(\mu _X\), and take \(g(X)\) to be the function of \(X\) given by

    $$\begin{aligned} g(X) = \int \limits _{- \infty }^X d \mu _X(x). \end{aligned}$$
    (5)

    Then \(g\) has range in \([0, 1]\), and, for any \(a \in [0, 1]\),

    $$\begin{aligned} P(g(X) \le a) = a. \end{aligned}$$
    (6)

    That is, \(g(X)\) is uniformly distributed on \([0, 1]\).

  6. If \(C\) is a finite set, we can have a probability function that always assigns equal probabilities to sets of equal cardinality. This is not possible if \(C\) is infinite. In the infinite case, there must be measurable sets \(A,\,B\), of equal cardinality, with \(P(A) \ne P(B)\). We can then choose some mapping that takes \(A\) to \(B\).

  7. This is necessary because, if one has nonzero credence that the coin is not fair, or that the tosses are not independent, then conditionalization on either \(F\) or \(P\) will send credence that the setup is as described to zero.

  8. See Appendix 1 for definitions of any terms that might be unfamiliar.

  9. In this section, we use \(Pr\) for our probability function to avoid confusion with the proposition \(P\).

  10. In this section, we will find ourselves conditionalizing on some fairly complex propositions, and so it will be convenient to switch from the subscript notation for conditional probabilities used in the rest of the paper to the slash notation.

  11. That is, take

    $$\begin{aligned} A_n = \bigcup _{u \in A} C_{K_n}(u). \end{aligned}$$
    (9)
  12. Perhaps. The more one thinks about what is required to give values to these conditional probabilities, the less clear it becomes that we have intuitions about them at all.

  13. Oddly enough, this has been disputed. In connection with this example, Jaynes (2003, p. 470) writes,

    Nearly everybody feels that he knows perfectly well what a great circle is; so it is difficult to get people to see that the term ‘great circle’ is ambiguous until we specify what limiting operation is to produce it.

    This strikes me as confused. One and the same great circle can be the limit of many different decreasing sequences of subsets of the sphere, but the circle is not itself produced by the limiting operation. Not so with probabilities conditional on a great circle, which, unless stipulated as primitive, are obtained via some limiting operation.

  14. In his discussion of the Borel paradox, Kolmogorov writes, “This shows that the concept of a probability conditional on an isolated given hypothesis whose probability equals 0 is inadmissible” (Kolmogorov 1950, p. 51).

  15. The notation is intended to be both reminiscent of, and distinct from, the notation used for conditional probabilities.

  16. This is example 33.11 of Billingsley (2012).

  17. This is even easier to see in these days in which laboratory equipment has digital readout than it was in the old days of pointers and dials!

  18. To see this: suppose that the probability distribution \(Q\) over the parameters is given by a density function \(\mu \). That is, for all Borel subsets \(\Delta \) of the parameter space,

    $$\begin{aligned} Q(\Theta \in \Delta ) = \int \limits _{\Delta } \mu (\theta ) \, d \theta . \end{aligned}$$
    (35)

    Then, by (32),

    $$ \begin{aligned} P(E \, \& \, \Theta \in \Delta ) = \int \limits _\Delta \mathcal {L}_E(\theta ) \, \mu (\theta ) \, d\theta . \end{aligned}$$
    (36)

    Combing this with (34), we get

    $$ \begin{aligned} P_E(\Theta \in \Delta ) = \frac{P(E \, \& \, \Theta \in \Delta )}{P(E)} = \int \limits _\Delta \frac{\mathcal {L}_E(\theta ) \, \mu (\theta )}{P(E)} \, d\theta . \end{aligned}$$
    (37)

    Suppose, now, that \(P_E\) is given by a density function \(\mu _E\).

    $$\begin{aligned} P_E(\Theta \in \Delta ) = \int \limits _{\Delta } \mu _E(\theta ) \, d \theta . \end{aligned}$$
    (38)

    Then, setting (37) and (38) equal to each other, we get

    $$\begin{aligned} \int \limits _{\Delta } \mu _E(\theta ) \, d \theta = \int \limits _\Delta \frac{\mathcal {L}_E(\theta ) \, \mu (\theta )}{P(E)} \, d\theta . \end{aligned}$$
    (39)

    Since this must be true for every Borel set \(\Delta \), the integrands must be equal almost everywhere.

  19. Though inspired and instructed by Rényi’s treatment, this definition departs from Rényi in two ways. First, Rényi requires \(P_B(A)\) to be defined for every \(A \in \mathcal {A}\). This may be undesirable; see Appendix 2. Second, Rényi requires countable additivity, and we leave open the possibility of conditional probability functions that are merely finitely additive.

    In his later book (2007a), Rényi revises the definition of a conditional probability space to exclude zero-probability conditions, and further requires that the set \(\mathcal {B}\) of conditions be closed under finite disjunctions, and that it contain a sequence \(\{B_n\}\) that covers \(\Omega \) (see §2.2). A subset of a \(\sigma \)-algebra \(\mathcal {A}\) satisfying these two conditions, and not containing the null set, Rényi calls a bunch of sets. In this work, Rényi calls conditional probabilities spaces in which conditionalization on null propositions is permitted generalized conditional probability spaces (see Problem 2.8).

References

  • Banach, S. (1923). Sur le problème de la mesure. Fundamenta Mathematicae, 4, 7–33.

    Google Scholar 

  • Bayes, T. (1763). An essay towards solving a problem in the doctrine of chances. Philosophical Transactions, 53, 370–418.

    Article  Google Scholar 

  • Billingsley, P. (2012). Probability and measure (Anniversary ed.). Hoboken, NJ: Wiley.

    Google Scholar 

  • Borel, E. (1909). Élements de la Théorie des Probabilités. Paris: Librairie Scientifique A. Hermann & Fils. English translation in Borel (1965).

  • Borel, E. (1965). Elements of the theory of probability. Englewood Cliffs, NJ: Prentice-Hall. English translation of Borel (1909).

  • Carnap, R. (1950). The logical foundations of probability. Chicago: The University of Chicago Press.

    Google Scholar 

  • Dorr, C. (2010). The eternal coin: A puzzle about self-locating conditional credence. Philosophical Perspectives, 24, 189–205.

    Article  Google Scholar 

  • Easwaran, K. (2011). The varieties of conditional probability. In P. Bandyopadhyay, & M. Forster (Eds.), Handbook of the philosophy of science. Philosophy of statistics (pp. 137–148). Amsterdam: North-Holland.

  • Easwaran, K. K. (2008). The foundations of conditional probability. Doctoral dissertation, Department of Philosophy, University of California, Berkeley. http://www.kennyeaswaran.org/research.

  • Hájek, A. (2003). What conditional probability could not be. Synthese, 137, 273–323.

    Article  Google Scholar 

  • Harper, W., & Hájek, A. (1997). Full belief and probability: Comments on van Fraassen. Dialogue, 36, 91–100.

    Article  Google Scholar 

  • Harper, W. L. (1975). Rational belief change, Popper functions, and counterfactuals. Synthese, 30, 221–262.

  • Jaynes, E. T. (2003). Probability theory: The logic of science. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Kolmogorov, A. (1950). Foundations of the theory of probability. New York: Chelsea Publishing Company. Tr. Nathan Morrison.

    Google Scholar 

  • Lewis, D. (1980). A subjectivist’s guide to objective chance. In R. C. Jeffrey (Ed.), Studies in inductive logic and probability (Vol. II, pp. 263–293). Caifornia: University of California Press.

    Google Scholar 

  • Popper, K. R. (1938). A set of independent axioms for probability. Mind, 47, 275–277.

    Article  Google Scholar 

  • Popper, K. R. (1955). Two autonomous axiom systems for the calculus of probabilities. The British Journal for the Philosophy of Science, 6, 51–57.

    Article  Google Scholar 

  • Popper, K. R. (1959). The logic of scientific discovery. London: Hutchinson.

    Google Scholar 

  • Rényi, A. (1955). On a new axiomatic theory of probability. Acta Mathematica Hungarica, 6, 265–333.

    Google Scholar 

  • Rényi, A. ([1970] 2007a). Foundations of probability Mineola, NY: Dover Publications, Inc. Reprint of edition published by Holden-Day.

  • Rényi, A. ([1970] 2007b). Probability theory Mineola, NY: Dover publications, Inc. Reprint of edition published by North-Holland.

  • van Fraassen, B. C. (1976). Representation of conditional probabilities. Journal of Philosophical Logic, 5, 417–430.

Download references

Acknowledgments

I thank Alan Hájek, Bill Harper and Joshua Luczak for helpful discussions on these matters. I also thank two anonymous referees for Erkenntnis, who read the manuscript with extraordinary care and saved me from numerous errors (some minor, some less so). This work was sponsored by a grant from the Social Sciences and Humanities Research Council of Canada (SSHRC).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wayne C. Myrvold.

Additional information

But if you try, sometimes, you might find you get what you need.

Appendices

Appendix 1 Terminology

1.1 Probability Spaces

For any set \(S\), an algebra of subsets of \(S\) is a set of subsets of \(S\) that contains \(S\) and is closed under complementation and unions. A \(\sigma \) -algebra of subsets of \(S\) is an algebra that is closed under countable unions. For the real line \(\mathbb {R}\), we define the Borel sets as the smallest \(\sigma \)-algebra containing all open intervals.

If \(\mathcal {A}\) is an algebra of subsets of \(S\), a function \(P: \mathcal {A} \rightarrow \mathbb {R}\) is additive iff, for any disjoint \(A, B \in \mathcal {A}\),

$$\begin{aligned} P(A \cup B) = P(A) + P(B). \end{aligned}$$

If \(\mathcal {A}\) is a \(\sigma \)-algebra of subsets of \(S\), a function \(P: \mathcal {A} \rightarrow \mathbb {R}\) is countably additive iff, for any sequence \(\{A_i\}\) of disjoint sets in \(\mathcal {A}\),

$$\begin{aligned} P\left( \bigcup _{i=1}^\infty A_i\right) = \sum _{i = 1}^\infty P(A_i). \end{aligned}$$

A probability space is a triple \(\langle S, \mathcal {A}, P \rangle \), where \(S\) is a set, to be thought of as the set of elementary events, \(\mathcal {A}\) is an algebra of subsets of \(S\), which are the sets of events (propositions) to which probabilities will be ascribed, and \(P: \mathcal {A} \rightarrow \mathbb {R}\) is a probability function, that is, a positive, additive set function with \(P(S) = 1\). Since we will have reasons to consider probability functions that are not countably additive, we depart from tradition in not assuming countable additivity unless explicitly stated. If we require countable additivity, then \(\mathcal {A}\) is required to be a \(\sigma \)-algebra, and we will refer to \(P\) as a probability measure.

If \(\langle S, \mathcal {A}, P \rangle \) is a probability space, a random variable is a measurable function \(X: S \rightarrow \mathbb {R}\), that is, a function such that, for any Borel set \(B\), the set

$$\begin{aligned} X^{-1}(B) = \{ \omega \in S \, | \, X(\omega ) \in B \} \end{aligned}$$

is in \(\mathcal {A}\). A random variable \(X\) generates a subalgebra of \(\mathcal {A}\), called \(\sigma (X)\), which is the set of all \(X^{-1}(B)\), as \(B\) ranges over Borel subsets of the real line.

1.2 Conditional Probability Spaces

Following Rényi (1955, 2007b),Footnote 19 we define a conditional probability space as a quadruplet \(\langle S, \mathcal {A}, \mathcal {B}, P \rangle \), where \(S\) is a set of events, \(\mathcal {A}\) an algebra of subsets of \(S,\,\mathcal {B}\) a subset of \(\mathcal {A}\), to be thought of as the set of events on which we may conditionalize, and \(P\) is a function that takes \(B \in \mathcal {B}\) to a function \(P_B: \mathcal {A}_B \rightarrow \mathbb {R}\), where, for each \(B,\,\mathcal {A}_B\) is a subalgebra of \(\mathcal {A}\), and

  1. (i)

    For each \(B \in \mathcal {B}\)

    1. (a)

      \(P_B(A) \ge 0\) for all \(A \in \mathcal {A}_B\).

    2. (b)

      For all \(A \in \mathcal {A}\), if \(B \subseteq A\), then \(A \in \mathcal {A}_B\) and \(P_B(A) = 1\).

    3. (c)

      For disjoint \(A, A' \in \mathcal {A}_B,\,P_B(A \cup A') = P_B(A) + P_B(A')\).

  2. (ii)

    For all \(B, C \in \mathcal {B}\) and \(A, B \in \mathcal {A}_C\), if \(BC \in \mathcal {B}\) then

    $$\begin{aligned} P_C(AB) = P_{BC}(A) \, P_C(B), \end{aligned}$$

    provided that \(B \in \mathcal {A}_C\) and \(A \in \mathcal {A}_{BC}\).

A conditional probability space can be thought of as a family of probability spaces \(\{ \langle S, \mathcal {A}_B, P_B \rangle \; | \; B \in \mathcal {B} \}\), required to mesh with each other via (ii).

It is an immediate consequence of (ii) that, for any \(C \in \mathcal {B}\) and \(B \subseteq C\), if \(B \in \mathcal {A}_C\) and \(P_C(B) > 0\), then, for all \(A \in \mathcal {A}_C\),

$$\begin{aligned} P_{B}(A) = \frac{P_C(AB)}{P_C(B)} \end{aligned}$$
(51)

provided that \(B \in \mathcal {B}\) and \(A \in \mathcal {A}_{B}\). This allows us to define probabilities conditional on \(B\), provided they don’t clash with those yielded by some other \(D \in \mathcal {B}\) such that \(B \subseteq D,\,B \in \mathcal {A}_D\), and \(P_D(B) > 0\). For this reason, we will usually assume the further condition,

  1. (iii)

    For all \(C \in \mathcal {B}\) and \(B \subseteq C\), if \(B \in \mathcal {A}_C\) and \(P_C(B) > 0\), then \(B \in \mathcal {B}\) and \(\mathcal {A}_C \subseteq \mathcal {A}_{B}\).

Given a probability space \(\langle S, \mathcal {A}, P \rangle \), let \(\mathcal {A}^*\) be the subset of \(\mathcal {A}\) consisting of sets \(B\) with \(P(B) > 0\). Let \(P^*\) be the function that maps \(B \in \mathcal {A}^*\) to the probability function \(P_B: \mathcal {A} \rightarrow [0, 1]\), given by

$$\begin{aligned} P_B(A) = \frac{P(AB)}{P(B)}. \end{aligned}$$
(52)

Then \(\langle S, \mathcal {A}, \mathcal {A}^*, P^* \rangle \) is a conditional probability space, corresponding to the standard choice of having conditional probability defined only when the condition has nonzero probability.

We will say that a probability space \(\langle S, \mathcal {A}, P \rangle \) is invariant under a bijection \(\mathsf {T}: S \rightarrow S\) if and only if \(\mathsf {T}\) leaves the set \(\mathcal {A}\) of measurable sets invariant, and, for all \(A \in \mathcal {A},\,P(\mathsf {T}(A)) = P(A)\), where \(\mathsf {T}(A)\) is \(\{\mathsf {T}(x) \, | \, x \in A \}\). Similarly, a conditional probability space \(\langle S, \mathcal {A}, \mathcal {B}, P \rangle \) is invariant under \(\mathsf {T}\) if and only if \(\mathsf {T}(\mathcal {A}) = \mathcal {A},\,\mathsf {T}(\mathcal {B}) = \mathcal {B}\), and, for all \(B \in \mathcal {B},\,\mathcal {A}_{\mathsf {T}(B)} = \mathsf {T}(\mathcal {A}_B)\) and \(P_{\mathsf {T}(B)}(\mathsf {T}(A)) = P_B(A)\) for all \(A \in \mathcal {A}_B\).

1.3 Lebesgue Measure

Consider the unit interval \(I = (0, 1]\). Let \(\mathcal {B}\) be the the smallest \(\sigma \)-algebra that contains all intervals \((a, b]\). These are the Borel sets. Extension to the entire real line, or to \(\mathbb {R}^n\), is straightforward; the Borel subsets of \(\mathbb {R}^n\) are the elements of the smallest \(\sigma \)-algebra containing all rectangles of the form

$$\begin{aligned} \{ (x_1, \ldots , x_n) \; | \; x_i \in (a_i, b_i], i = 1, \ldots , n \}. \end{aligned}$$

The uniform measure, or Lebesgue measure, on \(\langle I, \mathcal {B} \rangle \) is the unique countably additive measure that assigns measure \(b - a\) to each interval \((a, b]\). Call this measure \(\lambda \). Though the Borel sets include all sets of interest for most purposes, we can extend our measure to a wider \(\sigma \)-algebra, called the Lebesgue measurable sets. We define an outer measure \(P^*\), defined for any subset \(A\) of the unit interval, by

$$\begin{aligned} P^*(A) = \inf \sum _i \lambda (A_i), \end{aligned}$$
(53)

where the infimum is taken over all countable collections \(\{ A_i \}\) of intervals such that \(A \subseteq \bigcup _i A _i\). The set \(\mathcal {L}\) of Lebesgue measurable sets consists of all sets \(A \subseteq I\) such that

$$\begin{aligned} P^*(AE) + P^*(A^c E) = P^*(E) \end{aligned}$$
(54)

for all \(E \subseteq I\), where \(A^c\) is the complement of \(A\) in \(I\). It can be shown that this is a \(\sigma \)-algebra, and that \(\lambda \) has a unique countably additive extension to \(\mathcal {L}\); this extension is also called Lebesgue measure. Again, extension of this concept to \(\mathbb {R}^n\) is straightforward.

It can be shown that, if \(A\) is a Lebesgue-measurable subset of \(\mathbb {R}^n\) with Lebesgue measure zero, then every subset of \(A\) is Lebesgue measurable (and, of course, also has Lebesgue measure zero).

Appendix 2 A Rotationally Invariant Conditional Probability Space

On the \(n\)-sphere \(S^n\) (that is, the \(n\)-dimensional space of points at unit distance from a fixed point in \(n+1\)-dimensional Euclidean space, which is a circle for \(n=1\), and the surface of a sphere for \(n=2\)), we construct a uniform spherical measure \(\sigma _n\). One way to characterize these measures is in terms of Lebesgue measure on the ambient \((n+1)\)-dimensional space. A subset of \(A\) of \(S_n\) is \(\sigma _n\)-measurable if and only the wedge subtended by \(A\)—that is, the set of points on straight lines between points of \(A\) and the origin—is Lebesgue measurable, and we take the measure of \(A\) to be proportional to the Lebesgue measure of the wedge it subtends. Let \(\mathcal {L}_n\) be the set of all \(\sigma _n\)-measurable subsets of \(S_n\).

Let \(S\) be a \(2\)-sphere–that is, the surface of a sphere in 3D space—and let \(\mathcal {L}_S\) be the set of all \(\sigma _2\)-measurable subsets of \(S\), and let \(\sigma _S\) be \(\sigma _2\) measure on \(\langle S, \mathcal {L}_S \rangle \). Let \(\mathcal {C}\) be the set of all circles on the sphere \(S\). For each circle \(C \in \mathcal {C}\), let \(\mathcal {L}_C\) be the set of \(\sigma _1\)-measurable subsets of \(C\), and let \(\sigma _C\) be \(\sigma _1\) measure on \(\langle C, \mathcal {L}_C \rangle \).

We can construct standard conditional probability spaces \(\langle S, \mathcal {L}_S, \mathcal {L}_S^*, P_S^* \rangle \) and \(\langle C, \mathcal {L}_C, \mathcal {L}_C^*, P_{C}^* \rangle \), where, as in 1.2, \(\mathcal {L}_S^*\) is the set of sets in \(\mathcal {L}_S\) with positive probability. We want to extend \(\langle S, \mathcal {L}_S, \mathcal {L}_S^*, P^*_S \rangle \) to encompass, in our set of conditions, other sets, including, at minimum, all circles, in such a way that conditionalizing on any circle yields uniform probabilities on that circle

Let us take \(\mathcal {B}\) to be

$$\begin{aligned} \mathcal {B} = \mathcal {L}_S^* \cup \bigcup _{C \in \mathcal {C}} \mathcal {L}^*_C. \end{aligned}$$
(55)

Note that each element of \(\mathcal {B}\) is either in \(\mathcal {L}_S^*\) or is a subset of a unique circle \(C\) (this is because the members of \(\mathcal {L}^*_C\) have positive measure, and hence contain infinitely many points, and no two circles share more than two points). Take \(\mathcal {A}_B\) to be \(\mathcal {L}_S\) for \(B \in \mathcal {L}_S^*\). For \(B \in \mathcal {L}^*_C\), take \(A \in \mathcal {A}_B\) iff \(AB \in \mathcal {L}_C\). Define

$$\begin{aligned} P_B(A) = \left\{ \begin{array}{lll} \frac{\lambda _S(AB)}{\lambda _S(B)}, &{} &{} B \in \mathcal {L}_S^*; \\ \\ \frac{\lambda _C(AB)}{\lambda _C(B)}, &{} &{} B \in \mathcal {L}_C^*. \end{array} \right. \end{aligned}$$
(56)

We have constructed a conditional probability space that is invariant under all rigid rotations, and includes conditionalization on circles and some subsets of circles.

For any \(A \in \mathcal {L}_S\) with \(\lambda _S(A) = 0\), every subset of \(A\) is a measurable set, and is assigned measure 0. Since each circle \(C\) has \(\lambda _S(C) = 0\), this means that every subset of \(C\) is in \(\mathcal {L}_S\). Since not every subset of \(C\) is in \(\mathcal {L}_C,\,P_C(A)\) is not defined for arbitrary \(A \in \mathcal {L}_S\). We might want to extend \(P_C\) so that it is defined on all \(A \in \mathcal {L}_S\). But, as already mentioned (see footnote 2), we can do so, and preserve rotational invariance, only at the price of sacrificing countable additivity. We can’t get all that we want.

Appendix 3 The Eternal Coin: Proof of Dorr’s Theorem

We will speak in general terms, but readers should think of the example at hand, that of the Eternal Coin. We assume Axiom (iii) of Appendix 1.2. In this appendix, we will find ourselves conditionalizing on complex propositions, and it will be convenient to shift from the subscript notation \(Pr_C\) for probabilities conditional on conditional probabilities to the notation \(Pr( \; \cdot \; | C)\).

Suppose there is a proposition \(P\), and a transformation \(\mathsf {T}\), such that \(\mathsf {T}(P) \models P\). If \(Pr(P) > 0\), then

$$\begin{aligned} Pr(\mathsf {T}(P) \, | \, P) = \frac{Pr(\mathsf {T}(P))}{Pr(P)}, \end{aligned}$$
(57)

and so \(\mathsf {T}\)-invariance would entail that \(Pr(\mathsf {T}(P) \,|\, P) = 1\). Furthermore, if there exists a proposition \(Z\) such that \(\mathsf {T}(Z) = Z,\,P \models Z\), and \(Pr(P | Z) > 0\), then

$$\begin{aligned} Pr(\mathsf {T}(P) \, | \, P) = \frac{Pr(\mathsf {T}(P) \, | \, Z)}{Pr(P \, | \, Z)}, \end{aligned}$$
(58)

and so, once again, \(\mathsf {T}\)-invariance would entail that \(P(\mathsf {T}(P) \, |\,P) = 1\).

But\(\mathsf {T}\)-invariant propositions of the right sort may be hard to come by, and there may be no such \(Z\). Suppose, however, that there exist propositions \(X,\,Z\), such that \(P \models X \models Z\), and \(\mathsf {T}(P) \models \mathsf {T}(X) \models Z\). Then, if \(P,\,\mathsf {T}(P),\,X\), and \(\mathsf {T}(X)\) are all in \(\mathcal {A}_Z\), and \(Pr(P|Z) > 0\), and if \(Pr(\mathsf {T}(P) \, | \mathsf {T}(X))\) is defined, we have

$$\begin{aligned} \begin{array}{l} Pr(P \,| \,Z) = Pr(P \,| \,X) \, Pr(X \,| \,Z); \\ \\ Pr(\mathsf {T}(P)\,|\, Z) = Pr(\mathsf {T}(P) \, | \mathsf {T}(X)) \, Pr(\mathsf {T}(X) \, |\, Z), \end{array} \end{aligned}$$
(59)

and so, still assuming that \(\mathsf {T}(P) \models P\), we have

$$\begin{aligned} Pr(\mathsf {T}(P) \,| \,P) = \frac{Pr(\mathsf {T}(P) \, | \, Z)}{Pr(P \, | \, Z)} = \frac{Pr(\mathsf {T}(P)\,|\,\mathsf {T}(X))}{Pr(P\,|\,X)} \; \frac{Pr(\mathsf {T}(X)\,|\,Z)}{Pr(X\,|\,Z)}. \end{aligned}$$
(60)

Now suppose that there is also a proposition \(F\) such that \(\mathsf {T}^{-1}(F) \models F\), with \(Pr(F|Z) > 0\). Suppose, also, that \(\mathsf {T}^{-1}(F) \models X\). From this it follows that \(F \models \mathsf {T}(X)\), and we have

$$\begin{aligned} Pr(\mathsf {T}^{-1}(F)\,| \,F) = \frac{Pr(\mathsf {T}^{-1}(F) \,| \, X)}{Pr(F \,| \, \mathsf {T}(X))} \; \frac{Pr(X\, | \,Z)}{Pr(\mathsf {T}(X) \,| \,Z)}. \end{aligned}$$
(61)

Multiplying (60) and (61) gives us,

$$\begin{aligned} Pr(\mathsf {T}(P) \, | \,P) \, Pr(\mathsf {T}^{-1}(F)\,| \,F) = \frac{Pr(\mathsf {T}(P) \,| \, \mathsf {T}(X))}{Pr(P \,| \,X)} \; \frac{Pr(\mathsf {T}^{-1}(F) \,| \,X)}{Pr(F \,| \, \mathsf {T}(X)}. \end{aligned}$$
(62)

So far, we haven’t invoked any symmetry assumptions. If we impose \(\mathsf {T}\)-invariance, we have

$$\begin{aligned} \begin{array}{l} Pr(\mathsf {T}(P) \,| \, \mathsf {T} (X)) = Pr(P \,| \, X); \\ Pr(\mathsf {T}^{-1}(F) \, | \,X) = Pr(F \,| \, \mathsf {T}(X)), \end{array} \end{aligned}$$
(63)

and (62) becomes

$$\begin{aligned} Pr(\mathsf {T}(P) \, | \,P) \; Pr(\mathsf {T}^{-1}(F) \,| \,F) = 1, \end{aligned}$$
(64)

from which it follows that

$$\begin{aligned} Pr(\mathsf {T}(P) \, | \, P) = Pr(\mathsf {T}^{-1}(F) \,| \,F) = 1. \end{aligned}$$
(65)

Now, since we have assumed that \(\mathsf {T}(P) \models P\) and \(\mathsf {T}^{-1}(F) \models F\), there always do exist \(Z, X\) satisfying the conditions stipulated. Take \(Z\) to be \(P \vee F\), and take \(X\) to be \(P \vee \mathsf {T}^{-1}(F)\). Then \(\mathsf {T}(X)\) is \(\mathsf {T}(P) \vee F\).

To sum up: we have established

Proposition 1

Let \(\langle \Omega , \mathcal {A}, \mathcal {B}, Pr \rangle \) be a conditional probability space that satisfies condition (iii) and is invariant under a transformation \(\mathsf {T}\). Suppose there are propositions \(P, F\), such that \(Z = P \vee F \in \mathcal {B}\) and \(P, F \in \mathcal {A}_{Z}\), such that

  1. (i)
    1. (a)

      \(\mathsf {T}(P) \models P\);

    2. (b)

      \(\mathsf {T}^{-1}(F) \models F\);

  2. (ii)
    1. (a)

      \(Pr(P|Z) > 0\);

    2. (b)

      \(Pr(F|Z) > 0\).

Then

$$\begin{aligned} Pr(\mathsf {T}(P) \,| \,P) = Pr(\mathsf {T}^{-1}(F) \,| \,F) = 1. \end{aligned}$$

Applied to the Eternal Coin, let \(\mathsf {T}\) be \(\mathsf {S}_1\), which shifts everything forward 1 day. \(P\), as before, is the proposition that the coin handed Heads every day in the past, and \(F\), the proposition that the coin will land Heads every day in the future. Let \(H\) be the proposition that the coin lands Heads today. Then \(\mathsf {S}_1(P)\) is \(HP\), and \(\mathsf {S}_1^{-1}(F)\) is \(HF\). Clearly, \(HP \models P\) and \(HF \models F\). If

$$\begin{aligned} \begin{array}{l} Pr(P | P \vee F) > 0; \\ Pr(F | P \vee F) > 0; \end{array} \end{aligned}$$
(66)

and if

$$\begin{aligned} \begin{array}{l} Pr(HP | HP \vee F) = Pr( P | P \vee HF); \\ Pr(F | HP \vee F) = Pr( HF | P \vee HF), \end{array} \end{aligned}$$
(67)

then

$$\begin{aligned} Pr(H|P) = Pr(H|F) = 1. \end{aligned}$$
(68)

We can run the same argument with \(\mathsf {S}_k\), for any positive \(k\), yielding the conclusion that, for every \(n \ge 0\), the probability conditional on \(P\) that the coin lands Heads today and \(n\) days into the future is 1, as is the probability, conditional on \(F\), that the coin lands Heads today and \(n\) days into the past.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Myrvold, W.C. You Can’t Always Get What You Want Some Considerations Regarding Conditional Probabilities. Erkenn 80, 573–603 (2015). https://doi.org/10.1007/s10670-014-9656-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10670-014-9656-3

Navigation