Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Prologue

A few years ago, our group had the great pleasure and privilege to receive Juraj Hromkovič as a guest in Ticino. He had announced a discourse on the topic “What is information?” [16]. It was a fascinating lecture in which Juraj was advocating to view information as complexity. This has been inspiring for me, and it is probably not a coïncidence that soon after that I realized that the use of Kolmogorov complexity [18]Footnote 1 instead of probability distributions in the context of the fascinating but strange “non-local” correlations that quantum physics comes with — and that had been my main object of study for over a decade already at that time — offers a significant conceptual advantage: Non-counterfactuality, i.e., no necessity to talk about the outcomes of unperformed experiments.

John Stewart Bell showed in 1964 [6] that quantum theory predicts correlations between measurement outcomes that are too strong to be explained by shared classical information. This is as if identical twins did not only look alike — such correlations can easily be explained by their identical DNA sequences and do not confront us with a “metaphysical” problem — but also behaved in ways so strongly correlated that genetic explanations fail. Bell’s result was a late reply to an attack to quantum theory by Einstein, Podolsky, and Rosen (“EPR”) in 1935 [11] who remarked that if the outcomes of measurements on quantum systems are correlated, then they cannot be spontaneously random as predicted by the theory, but they must already have been determined beforehand, at the occasion of the creation of the entangled pair. To stay with our analogy: If twins look the same, their looks must be determined by their genes; if they also behave the same, then so must their behavior. This is a natural thought — but it is insufficient, and to realize this is Bell’s breakthrough: “Genetic” explanations are too weak for quantum correlations. But then, where do they come from? What mechanism establishes them?

The basis of EPR’s argument has later been called “Reichenbach’s principle” [22]: A correlation in a causal structure is established either by a common cause in the common past or a direct influence from one event to the other. Bell’s celebrated argument rules out the first possibility if that common cause is to be a piece of classical information. Influence stories for establishing quantum correlations cannot be ruled out entirely, but they require the speed of that influence to be infinite, and they are unnatural in other respects expressing the fact that explaining a non-signaling phenomenon with a signaling mechanism is shooting sparrows with cannons. Eventually, the fundamentality of the causal structure is in questionFootnote 2 — the only assumption for Reichenbach’s principle. So motivated, models of relaxed causality have been studied [21] and disclosed an unexpectedly rich world [3] between fixed causal orders and logical inconsistency (à la “grandfather paradox” — you travel to the past and kill your own grandfather — etc.), much like Bell world between locality and signaling.

The fall of rigid causality comes with a further victim, randomness: Common physical definitions of freeness of randomness [9] are based on that very structure and they fall with it. One way out is to consider freeness of randomness as fundamental and causality as emerging from it, via: “What is correlated with a perfectly free bit must be in its future.” For single bits, this is the best we can hope for. For bit strings, however, there can be an intrinsic randomness notion depending only on the data itself but not (otherwise) on the process leading up to them. In our search for such a non-contextual view, we land in a field traditionally related to probability distributions and ensembles but that knows a “non-counterfactual” viewpoint as well: Thermodynamics.

2 From the Steam Pipe ...

The storyFootnote 3 of the second law of thermodynamics starts with Sadi Carnot (1796–1832) and his study of heat engines such as James Watt’s steam pipe. To conclude that the law manifests itself only for such engines and their circular processes means to underestimate a general combinatorial fact. The second law was discovered through steam engines because they first permitted a precise unclouded view on it — but the law is restricted to combustion engines as little as Jupiter’s moons depend on telescopes.Footnote 4

Carnot discovered that the maximal efficiency of a heat engine between two heat baths depended solely on the two temperatures involved. This was his only publication; it appeared when he was 28 and was entitled: “Réflexions sur la puissance motrice du feu et sur les machines propres à développer cette puissance.”

Rudolf Clausius’ (1822–1888) version of the second law reads: “Es kann nie Wärme aus einem kälteren in einen wärmeren Körper übergehen, ohne dass eine andere damit zusammenhängende Änderung eintritt.” — “No process can transport heat from cold to hot and do no further change.”

William Thomson (Lord Kelvin) (1824–1907) formulated his own version of the second law and then concluded that the law may have consequences more severe than what is obvious at first sight: “Restoration of mechanical energy without dissipation [...] is impossible. Within a finite period of time past, the earth must have been, within a finite time, the earth must again be unfit for the habitation of man.”

Also for Clausius, it was only a single thinking step from his version of the law to conclude that all temperature differences in the entire universe will vanish — the “Wärmetod” — and that then, no change will happen anymore. He speaks of a general tendency of nature for change into a specific direction: “Wendet man dieses auf das Weltall im Ganzen an, so gelangt man zu einer eigentümlichen Schlussfolgerung, auf welche zuerst W. Thomson aufmerksam machte, nachdem er sich meiner Auffassung des zweiten Hauptsatzes angeschlossen hatte.Footnote 5 Wenn [...] im Weltall die Wärme stets das Bestreben zeigt, [...] dass [...] Temperaturdifferenzen ausgeglichen werden, so muss es sich mehr und mehr dem Zustand annähern, wo [...] keine Temperaturdifferenzen mehr existieren.” — “When this is applied to the universe as a whole, one gets to the strange conclusion that already W. Thomson had pointed out after having taken my view of the second law. If heat always tends towards reducing temperature differences, then the universe will approximate more and more the state in which no temperature differences exist anymore.”Footnote 6

Ludwig Boltzmann (1844–1906) brought our understanding of the second law closer to combinatorics and probability theory: The second law was for him the expression of the fact that it is more likely to end up in a large set of possible states than in a small one: The higher the number of particular situations (microstates) which belong to a general one (macrostate), the more likely it is to be in that general situation. In other words, time evolution does not decrease a closed system’s entropy, which is proportional to the logarithm of the number of corresponding microstates: Things do not get “more special” with time.Footnote 7

Boltzmann’s notions of macrostate and entropy are subjective, and it is not obvious how to define them in general, e.g., for non-equilibria. We propose instead a version of the second law that is broader and more precise at the same time, avoiding probabilities and ensembles. Crucial steps in that direction were made by Wojciech Zurek [32]. We follow Rolf Landauer [19] whose choice of viewpoint about thermodynamics can be compared with Ernst Specker’s [24] about quantum theory: Logic.

3 ...to the Turing Machine

Landauer [19] investigated the thermodynamic price of logical operations. He was correcting a belief by John von Neumann that every bit operation required free energy \(kT\ln 2\):Footnote 8 According to Landauer — as confirmed by Fredkin and Toffoli’s “ballistic computer” [14] —, that price is unavoidable only for logically irreversible operations such as the AND or the OR. On the positive side, it has been observed [7] that every function, bijective or not, can in principle be evaluated in a logically reversible way, using only “Toffoli gates,” i.e., made-reversible-and-then-universal AND gates; this computation can then be thermodynamically neutral, i.e., it does not dissipate heat.

Landauer’s principle states that the erasure (setting the corresponding binary memory cells to 0) of N bits costs \(kTN\ln 2\) free energy which must be dissipated as heat to the environment, a thermal bath of temperature T. The dissipation is crucial in the argument: Heating up the environment compensates for the loss of entropy within the memory cell which is materialized by some physical system (spin, gas molecule, etc.). Landauer’s principle is a direct consequence of Boltzmann’s view of the second law.

Fig. 1.
figure 1

Bennett’s resolution of the Maxwell-demon paradox.

Charles Bennett [8] used Landauer’s slogan “Information is Physical” for making the key contribution to the resolution of the paradox of “Maxwell’s demon” (see, e.g., [25]). That demon had been thought of as violating the second lawFootnote 9 by adaptively handling a frictionless door with the goal of “sorting a gas” in a container. Bennett took the demon’s memory (imagined to be in the all-0-state before sorting) into account, which is in the end filled with “random” information remembering the original state of the gas: The growth of disorder within the demon compensates for the order she creates outside (i.e., in the gas) — the second law is saved. The initial 0-string is the demon’s resource allowing for her order creation (see Fig. 1): If we break Bennett’s argument apart in the middle, we end up with the converse of Landauer’s principle: The all-0-string has work value, i.e., if we accept the respective memory cells to become “randomized,” we can extract \(kTN\ln 2\) free energy from the environment (of temperature T).

Already Bennett [7] had implied that for some strings S, the erasure cost is less than Landauer’s len\((S)\cdot kT\ln 2\): Besides the obvious \(00\ldots 0\) and \(11\ldots 1\), this is also true, e.g., for the string formed by the first N digits of the binary expansion of \(\pi \): The reason is that there is a short program generating the sequence or, in other words, a logically reversible computation between (essentially) \(0^N\) and \(\pi ^N\) that can be carried out thermodynamically reversibly [14]. Generally, if the string can be compressed in a lossless fashion, then the erasure cost shrinks accordingly.

Let us consider a model for the erasure process (see Fig. 2) in which there is, besides the string S to be erased, another string X on the tape as a “catalyst” summarizing possible a priori “knowledge” about S, so helping to compress it but remaining itself unchanged in the process. The universal Turing machine \(\mathcal {U}\)’s tape is assumed to be finite, but can be extended arbitrarily as needed — there would be infinite “work value” on it otherwise —, and the resulting erasure cost is EC\(_{\mathcal {U}}(S|X)\).

Fig. 2.
figure 2

Erasing S with catalyst X at cost EC\(_{\mathcal {U}}(S|X)\): First, X is reversibly compressed to the shorter string P for free, then that “program” P is erased at cost \(kT\ln 2\cdot len (P)\).

Bennett [7] claims that the erasure cost of a string is, actually, its Kolmogorov complexity times \(kT\ln 2\); this would in our model translate to the erasure cost equalling the conditional complexity of the string to be erased, given the side information: EC\(_{\mathcal {U}}(S|X)=K_{\mathcal {U}}(S|X)\). Unfortunately, this is in general not achievable due to the uncomputability of Kolmogorov complexity and the corresponding compression transformation, and since we assume the extraction to be carried out by a Turing machine, in the spirit of the Church/Turing thesis [17]. It is, however, true that Kolmogorov complexity leads to a lower bound on the erasure cost since it represents the ultimate limit on the lossless compressibility of the string by \(\mathcal {U}\). In the same way, we can argue that any concrete and computable compression algorithm (with side information) C, e.g., Lempel/Ziv [31], leads to an upper bound on the erasure cost: First, we reversibly compress (at no energy cost) and then erase the compression.Footnote 10

Landauer’s principle, revisited. Let C be a computable function,

$$ C:\{0,1\}^*\times \{0,1\}^* \rightarrow \{0,1\}^*\ , $$

such that

$$ (V,W)\mapsto (C(V,W),W) $$

is injective. Then the cost of the erasure of S with catalyst X, carried out by the universal Turing machine \(\mathcal {U}\), is bounded by

$$ K_{\mathcal {U}}(S|X)\cdot kT\ln 2\ \le \ \text {EC}_{\mathcal {U}} (S|X)\ \le \ \text {len}(C(S,X)) \cdot kT\ln 2\ . $$

The principle can be extended to an arbitrary computation starting with input A and leading up to output B with side information X, where (AX) and (BX) are the only contents of the tape, besides 0s, before and after the computation, respectively. Our result is an algorithmically constructive modification of entropic results [12] and a generalization of less constructive but also complexity-based claims [33].

Landauer’s principle, generalized. Let C be a computable function,

$$ C:\{0,1\}^*\times \{0,1\}^* \rightarrow \{0,1\}^*\ , $$

such that

$$ (V,W)\mapsto (C(V,W),W) $$

is injective. Assume that the Turing machine \(\mathcal {U}\) carries out a computation such that A and B are its initial and final configurations. Then the energy cost of this computation with side information X, \(\text {Cost}_{\mathcal {U}}(A\rightarrow B\, |\, X)\), is at least

$$ \text {Cost}_{\mathcal {U}} (A\rightarrow B\, |\, X)\ \ge \ [K_{\mathcal {U}}(A|X)-\text {len}(C(B,X)) ] \cdot kT\ln 2\ . $$

Proof

The erasure cost of A, given X, is at least \(K_{\mathcal {U}}(A|X)\cdot kT\ln 2\) according to the above. One possibility of realizing this complete erasure of A is to first transform it to B (given X), and then erase B — at cost at most len\((C(B,X))\cdot kT\ln 2\). Therefore, the cost to get from A to B given X cannot be lower than the difference between \(K_{\mathcal {U}}(A|X)\cdot kT\ln 2\) and len\((C(B,X))\cdot kT\ln 2\).    \(\square \)

The complexity reductions in these statements quantify the “amount of logical irreversibility” inherent in the respective process, and the quantity of required work — the price for this non-injectivity — is proportional to that. The picture is now strangely hybrid: The environment must pay a thermodynamic (macroscopic) price for what happens logically (microscopically) in one of its parts. What prevents us from looking at the environment with a microscope? If we let ourselves inspire by John Archibald Wheeler’s [27] “It from Bit,” we see the possibility of a compact argument: The price that the environment has to pay in compensation for the irreversibility of the computation in one of its parts is such that the overall computation is reversible.

figure a

Let us first note that this condition on time evolutions is, like traditional “second laws,” asymmetric in time: Logical reversibility only requires the future to uniquely determine the past, not vice versa: So if “reality” is such an injective computation, its reverse can be injective as well (e.g., computation of a deterministic Turing machine, Everett/Bohm interpretation of quantum theory) or this fails to hold since the forward direction has splitting paths (e.g., computation of a probabilistic Turing machine).

The second law has often been linked to the emergence of an arrow of time. How is that compatible with our view? In determinism, logical reversibility holds both ways. What could then be possible origins of our ability to distinguish past and future? (Is it the limited precision of our sense organs and the resulting coarse-graining?) Indeterminism is easier in that sense since it comes with objective asymmetry in time: Randomness points to the future.

4 Consequences

Logical Reversibility Implies Quasi-Monotonicity

The logical reversibility of a computation implies that the overall complexity of the Turing machine’s configuration at time t can be smaller than that at time 0 by at most \(K(C_t)+O(1)\) if \(C_t\) is a string encoding the time span t. The reason is that one possibility of describing the configuration at time 0 is to give the configuration at time t, plus t itself; the rest is exhaustive search using only a constant-length program simulating forward time evolution.

Logical Reversibility Implies a Boltzmann-Like Second Law

The notion of a macrostate can be defined in an objective way, using a “structure” function based on the Kolmogorov sufficient statistics [1, 4]. Roughly speaking, the macrostate is the structure or “compressible part” of a microstate: Given the macrostate — being a set of microstates that is, ideally, small, and its description at the same time short (such as: “a gas of volume V, temperature T, and pressure p in equilibrium”) —, the particular microstate is a “typical element” of it, specifiable only through stubborn binary coding. This notion of a macrostate is typically unrelated to the second law except when the initial and final macrostates both have very short descriptions, like for equilibria: Then, logical reversibility implies that their size is essentially non-decreasing in time. This is Boltzmann’s law.

Logical Reversibility Implies a Clausius-Like Second Law

If we have a circuit — the time evolution — using only logically reversible Toffoli gates, then it is impossible that this circuit computes a transformation of the following nature: Any pair of strings — one with higher Hamming weight than the other — is mapped to a pair of equally long strings where the heavy string has become heavier and the light one lighter. Such a mapping, which accentuates differences, cannot be injective for counting reasons. We illustrate this with a toy example (see Fig. 3).

Example 1

Let a circuit consisting of Toffoli gates map an \(N(=2n)\)-bit string to another string — which must then be N-bits long as well, due to reversibility. We consider the mapping separately on the first and second halves of the full string. We assume the computed function to be conservative, i.e., to leave the Hamming weight of the full string unchanged. We look at the excess of 1’s in one of the halves (which equals the deficit of 1’s in the other). We observe that the probability (with respect to the uniform distribution over all strings of some Hamming-weight couple \((wn,(1-w)n)\)) of the imbalance substantially growing is exponentially weak. The key ingredient in the argument is the function’s injectivity. Explicitly, the probability that the weight couple goes from \((wn,(1-w)n)\) to \(((w+\varDelta )n,(1-w-\varDelta )n)\) — or more extremely —, for \(1/2\le w<1\) and \(0<\varDelta \le 1-w\), is

$$ \frac{\left( {\begin{array}{c}n\\ (w+\varDelta )n\end{array}}\right) \left( {\begin{array}{c}n\\ (1-w-\varDelta )n\end{array}}\right) }{\left( {\begin{array}{c}n\\ wn\end{array}}\right) \left( {\begin{array}{c}n\\ (1-w)n\end{array}}\right) } =2^{-\varTheta (n)}:$$

Logical reversibility is incompatible with the tendency of amplification of differences.

Fig. 3.
figure 3

Logical reversibility does not accentuate differences: Clausius.

Logical Reversibility Implies a Kelvin-Like Second Law

Finally, logical reversibility also implies statements resembling Kelvin’s version of the second law: “A single heat bath alone has no work value.” This, again, follows from a counting argument: There exists no reversible circuit that concentrates redundancy to some pre-chosen bit positions.

Example 2

The probability that a fixed circuit maps a string of length N and Hamming weight w to another such that the first n positions contain only 1’s, and such that the Hamming weight of the remaining \(N-n\) positions is \(w-n\), is

$$ \frac{ \left( {\begin{array}{c}N-n\\ w-n\end{array}}\right) }{\left( {\begin{array}{c}N\\ w\end{array}}\right) }=2^{-\varTheta (n)}\ . $$

In a sense, Kelvin’s law is a special case of Clausius’ formulation.

5 Epilogue

A priori, the relation between physical reality and computation is two-fold: Turing’s famous machine model is physical — besides the tape’s infiniteness —, and the generalized Church/Turing thesis suspects physical processes to be simulatable on a Turing machine (see Fig. 4). This circulus has been the background of our considerations.

Fig. 4.
figure 4

Physics and information: Scenes from a marriage.

In the “Church/Turing view,” a physical law is a property of a Turing machine’s computation: The second law of thermodynamics is logical reversibility.Footnote 11

What can be said about the validity of the Church/Turing thesis? Nothing absolute, just as on “determinism.” But exactly like for that latter problem, an all-or-nothing statement creates, at least, a clear dichotomy [2, 28].

figure b

This results when we pursue the idea that Kolmogorov complexity measures intrinsic randomness and apply it to Bell correlations: If we have access to some “super-Turing machine,” we let that machine choose the particular measurement bases for the two parts of an entangled quantum system. The correlations then imply the sequence of measured values to be Turing-uncomputable as well. This analysis [29] resembles well-known arguments but replaces randomness by complexity. The new approach has two conceptual advantages. The first is its non-counterfactuality: We do not talk about the outcomes of unperformed measurements. (Any Bell inequality does.) The second is context-independence: No process-based notion of randomness is required. Such definitions are typically embedded into a causal structure — but the non-local correlations themselves, with their inexplicability by a “reasonable” mechanism within such a structure, are among the strongest arguments against fundamental causality. The alternative is to consider “free will” (randomness, if you will) as more fundamental and causality as emerging from it through: “If Y is correlated with X, and X is freely random, then Y must be in X’s future.”Footnote 12 Then the arrow of time appears as an accumulation of irreversible binary decisions. The reverse transformation of such splits is not logically reversible and, hence, violates the second law:

$$\text {Logical Reversibility}\ +\ \text {Randomness}\ =\ \text {Thermodynamic Irreversibility .} $$

This equation suggests how it comes that a law which we read here as reversibility is often linked to the exact opposite.