1 Introduction

The use of global climate models to design policies for the mitigation or amelioration of anthropogenic climate change depends upon their (quantitative) representational and predictive adequacy. The principal uses of global climate models (GCMs) that are relevant to policy design are the detection, attribution, and prediction of global climate change, and in order to be effective in any of these capacities the GCMs must be quantitatively reliable. To ‘detect’ a change in climate, observations must be shown to be unlikely to have occurred due to internal variability in the climate system. And to make this assessment, it is essential to have an estimate of the climate’s internal variability. According to the IPCC, “internal climate variability is usually estimated from long control simulations from coupled climate models” (Hegerl et al. 2007, 668). These control simulations lay down a baseline that establishes what range of numerical fluctuations in some variable can be expected to occur randomly within some fixed climate (such as the pre-industrial climate, for example). When the observed values of the climate variables are outside of this range, then this is evidence that the climate has changed. Once a climate change has been detected, the ‘attribution’ of that change to some particular cause, typically anthropogenic emissions, likewise depends on the quantitative reliability of the GCMs. Attribution studies take a variety of forms, but they typically involve some variation on the idea of comparing the outputs of GCMs run using only natural forcings (that is, orbital wobbles, changes in solar output, volcanoes, etc.) to the outputs of GCMs that include anthropogenic factors. If the observations cannot be reproduced by the GCMs using only natural factors, but can be reproduced by the models that include anthropogenic factors as well, then this is evidence that anthropogenic factors are the cause of the detected climate change (Hegerl et al. 2007, 702-3). Similarly, projections of future climate change are based on simulations run according to a series of idealized climate scenarios.Footnote 1 These scenarios are run on multiple different GCMs maintained by research groups scattered throughout the world. The range of outputs of these models is averaged to give a best estimate of how the climate will response under a particular emissions scenario. Furthermore, the spread in model results can be used to help assess the appropriate level of confidence in these projections, even though “a statistical interpretation of the model spread is problematic” (Meehl et al. 2007, 753-4). Policies designed to address climate change obviously presuppose that the climate is in fact changing, and this assessment—as we have seen—depends on the quantitative reliability of GCMs. Furthermore, the appropriateness of a policy intervention designed to mitigate a detected or projected climate change will typically depend on what is causing that change—and this attribution likewise rests on the quantitative adequacy of the models. Lastly, assessments of the urgency, pace and scope of a policy option evidently depend upon the quantitative estimates of what would happen should the policy, or one of its proposed rivals (including the null-policy), be adopted. For these reasons, GCMs are, as Paul Edwards has claimed, the “epistemic core of the climate science/policy community.” (Edwards 2001, 64).Footnote 2 Furthermore, it is not enough for these GCM to be “a means to explore what-if scenarios” (Oreskes 2007, 86); rather they must be quantitatively reliable, at least to a certain extent, if they are to play their epistemic role.

Not surprisingly, then, the quantitative representational and predictive adequacy of GCMs has often been the subject of skeptical attacks by those bent upon undermining climate change policies. What is more surprising, however, is the frequency with which epistemological challenges to the adequacy of such models originate from philosophers and those in the science studies community who reflect on the methodology of global climate modeling (e.g. Oreskes 2003, 2007; Haller 2002, and to a certain extent Edwards 2001; Frigg et al. 2013). Many who have thought philosophically about how climate models are used to generate predictions of future climate change have come away puzzled and unable to endorse the position of the IPCC that, “there is considerable confidence that Atmospheric-Ocean General Circulation Models provide credible quantitative estimates of future climate change” (Randall et al. 2007, 591).Footnote 3 Typically, in the face of a broad, global consensus among those scientists who work in an area, the epistemological anxieties of a couple of philosophers would not be cause for concern.Footnote 4 However, in the contentious political environment surrounding climate change, these epistemological worries can be, and have been, co-opted by skeptics or contrarians who hope to undermine the political will to enact policies that address climate change. Philosophers who are of a more humble strain, unwilling—from their armchairs—to contest the considered opinion of the global community of climate scientists, should be concerned and disturbed at how some of those within their discipline are, inadvertently perhaps, undermining informed consideration of climate change policy. One way to counteract this disturbing new application of philosophy would be to articulate alternative, and perhaps more appropriate, ways to think about global climate modeling and its quantitative reliability.

2 Strategies for Confronting Modeling Skeptics

There are a variety of possible strategies open to a philosopher who hopes to address the epistemological concerns that have been raised about the reliability of climate models. One strategy that is not helpful is to try to disengage the evidence for global climate change and the predictions relevant to policy decisions from the quantitative reliability of GCMs. As I sketched earlier, and as is openly asserted in the IPCC reports, both our understanding of current climate change and our predictions of future change depend on this quantitative reliability.Footnote 5 This means that one has already conceded too much to the skeptic if one is willing to insist on the “reality of human induced climate change,” but unwilling to endorse the reliability of estimates of the “tempo and mode” of such changes (see Oreskes 2007, 74). Even if a skeptic concedes that human induced climate change is real, if he or she maintains that such changes are dwarfed by natural variations due to insolation changes, then this is enough to undermine policies designed to reduce greenhouse gas emissions. In order to be helpful, philosophers must make the case that GCMs are capable of meeting the quantitative demands placed on them in the detection, attribution, and projection of climate change. A sensible first step would be to analyze the epistemological assumptions behind the worries that philosophers have articulated about the quantitative reliability of GCMs. Insofar as the epistemological anxieties about GCMs can be shown to be rooted in implausible assumptions about what a quantitatively reliable theory or model must be like, this therapeutic approach may work to relieve some of these anxieties.Footnote 6

Even if such a therapeutic approach were wholly successful, however, it would be ideal to be able to supplement such an approach with a positive story about why global climate models can, or should, be understood to be quantitatively reliable. There seem to be at least two different positive strategies for supplying an account of why global climate models are predicatively reliable: an absolute strategy and a relative strategy. An absolute strategy would involve showing that the kinds of evidence that global climate modelers present for the predictive reliability of their models do, in fact, constitute good evidence. This, in turn, would require situating the evidence provided by climate modelers against some well- developed backdrop of evidential norms that are independently recognized to be legitimate. In a context broader than, but including, global climate modeling, Winsberg (1999, 2003) has advocated something like this approach. He asks, “What are the factors that contribute to the notion that a computational model constructed on a set of approximations and idealizations is valid?” (Winsberg 1999, 287). He then proceeds to describe many of the important components of the methodology of computer simulation. These components, he stresses, “need to be subjected to epistemological scrutiny,” because, “[i]f the influences and possible pitfalls of each element are not properly understood and managed by the simulationist, then they represent a potential threat to the credibility of simulation results” (Winsberg 1999, 288). Unfortunately, nothing approaching a general and convincing account of why the idealizations, approximations, and other methodological components of computer simulation modeling should be regarded as justified, or reliable, appears forthcoming. Instead, Winsberg has been clear where such a justification does not come from—namely, the background physical theory (at least not by itself). I think that he is right about this, and it is an important insight, but it does not help in fending off skepticism about global climate models. Furthermore, some of Winsberg’s other work makes it clear, I think, why this sort of absolute strategy for establishing the quantitative reliability of GCMs is unlikely to be successful any time soon.

When he comes to the project of providing an epistemological account of the “techniques” of simulation, Winsberg offers a description of the process by which such techniques are credentialed. These techniques, he claims, “have a life of their own” (Winsberg 2003, 121). This means that, “they carry with them their own history of prior successes and accomplishments, and, when properly used, they can bring to the table independent warrant for belief in the models they are used to build” (Winsberg 2003, 122). Again, this seems entirely correct, but it has no normative force at all. It does nothing to establish the legitimacy of such techniques, but instead reports that their acceptance as legitimate is bound up with their detailed history of use. If this description is right, however, then it suggests that any convincing normative accounts of why particular simulation techniques are legitimate, or reliable, will be very local and detailed. Perhaps a simulation modeler could explain to his peers why it was legitimate and rational to use a certain approximation technique to solve a particular problem, but this explanation would—if Winsberg is right—appeal to very context specific reasons and particular features of the history of the technique.Footnote 7 This does not suggest that such techniques are illegitimate or unjustified, but it does make it unlikely that the legitimacy of such techniques can be reconstructed, or rationalized, in terms of generally recognized, ahistorical evidential norms. As is often the case, it may be difficult or impossible to “step back” from the practice of global climate modeling and argue, independently, for its legitimacy; and this makes the absolute strategy for establishing the predictive reliability of global climate models unlikely to be successful. This should not be too troubling, however, since the well-developed background of evidential norms independently recognized to be legitimate has not yet led, among philosophers at any rate, to a consensus on whether, or under what circumstances, our inductive inferences are justified.

Whether or not the absolute strategy ever pans out, there are other options for providing positive support for the quantitative reliability of GCMs. For instance, a relative strategy would proceed by attempting to show that the use of global climate models in prediction (or quantitative description, more generally) is relevantly similar to some class of predictions that are already accepted as reliable. Presumably, insofar as skeptics about GCMs are willing to accept the quantitative reliability of this other set of predictions, they should also accept the reliability of the relevantly similar predictions generated by GCMs. This strategy has its limitations. It depends upon finding a set of predictions that are broadly recognized to be reliable and which can be plausibly argued to be relevantly similar to the sorts of quantitative predictions generated by GCMs. Even if such a set of predictions can be identified, this strategy is always open to the charge that the analogous predictions are not similar in the appropriate epistemological sense. Nonetheless, it offers the potential for a convincing response to skeptical worries about the predictive adequacy of global climate models. At the least, such a relative strategy would shift the burden of proof to the skeptic, who would have to articulate what it is about the predictions generated by GCMs that makes them significantly different from the already accepted class of predictions.

A natural place to start in trying to develop such a relative strategy is by examining the substantial literature that argues for epistemological parallels between traditional experimentation and computer simulations. If a plausible analogy could be drawn between the uses of GCMs in the detection, attribution, and prediction of climate change and some feature of more traditional scientific experimentation, then perhaps the epistemological prestige of such experiments can act a fulcrum for the relative strategy. I will explore two common ways that this parallel has been fleshed out, and then evaluate their potential for supporting the relative strategy.

One common way of relating computer simulation and traditional experimentation is to regard computer simulation studies as being experiments, that is as being, “investigative activit[ies] that involve intervening on a system in order to see how properties of interest of the system change…in light of that intervention” (Parker 2009, 487). According to this view,Footnote 8 in any experiment, observations about the behavior of one system—the experimental system—are used to license inferences about another system, the target system. These inferences are justified, or legitimate, insofar as “the experimental system is similar to the target system in whatever respects are relevant, given the particular question they want to answer about the target system” (Parker 2009, 493). In the case of using GCMs to generate predictions about future climate, the experimental system is a digital computer coded with the GCM. This computer is intervened upon by running it with boundary conditions prescribed by a particular emissions scenario. The output from this run of the computer (perhaps in combination with additional runs on this and other similar experimental systems) is then used to license an inference about the future climate of the earth, which is the target system. Conclusions drawn about the future climate of the earth, on the basis of such a computer simulation experiment, are then reliable insofar as the programmed digital computer is relevantly similar to the earth’s climate. This all seems plausible, but it brackets the issue of whether predictions generated in GCM experiments are quantitatively reliable. Insofar as we think of standard scientific experimentation as a reliable way of making quantitative predictions about target systems—and so might appeal to such experimentation as a fulcrum in our relative strategy—it seems that this reliability is grounded in the obvious or presumptive relevant similarity between the experimental and target system. For instance, everyone would accept the legitimacy of the prediction that a certain kind of donut has about 500 calories based on an experiment in which the caloric content of a random sample of such donuts was measured to be 500 calories. This prediction is accepted because we presume that all such donuts are relevantly similar—they have about the same caloric content. No such presumption seems plausible in the case of the GCMs, though it may still be true that the relevant similarities obtain. In fact, because computer simulation studies on a GCM are performed on an experimental system that is so different from its target system (a programmed computer versus a planetary atmosphere), this way of thinking about the epistemological relationship between predictions based on GCMs and traditional experimentation actually emphasizes their dissimilarities. Computer simulation experiments cannot exploit unresolved similarities between the experimental system and the target system that are grounded in their material similarity. Instead, all of the relevant similarities on which the reliability of the GCM predictions are based must be imposed on the experimental system by way of the computer model that it realizes. So while thinking of GCM studies as experiments in this way is plausible, it does not seem to be a promising way of proceeding with a relative strategy for supporting the quantitative reliability of GCMs.

Another common way of making an epistemological connection between GCMs (or other simulation models) and traditional experimentation is to regard GCMs as scientific instruments. A distinguished example of this analogy occurs in Norton and Suppe (2001) where the authors argue that, “simulation models are scientific instruments” and thus that they are “just another source of empirical data” (Norton and Suppe 2001, 87). If this is right, then, “the epistemological challenges facing simulation models are…identical with those of any other experimental situation involving instrumentation” (Norton and Suppe 2001, 92). In order to address skeptical worries about the quantitative predictive adequacy of global climate models using the epistemological analogy drawn by Norton and Suppe, then, one would just have to identify the sorts of experimental measurement situations that are analogous to the predictions and other quantitative descriptions generated using climate models and then argue that we have no qualms with the epistemic credentials of data obtained in such situations. But here Norton and Suppe run into some awkward consequences of having conceived of climate modeling as analogous to experimental data collection. We don’t measure either how things will be in the future or what the world would have been like had things been different. Instead, experimental data collection results in descriptions of the way the world is or was (though, admittedly, this often requires some antecedent assumptions about, roughly, the causal structure of the world). These measured features of the actual world may then, in a step requiring additional justification, be used to infer facts about how related phenomena will be, or may have been. Climate models, on the other hand, are typically used to directly generate descriptions of how the world might have been (without anthropogenic greenhouse gas emissions, say) or how it might yet be (under a variety of potential emission scenarios). Taking the quantitative results of these sorts of predictions or descriptions seriously requires confidence in the counterfactual robustness of the global climate model—the model must not just accurately model collected empirical data, but it must give the right results for reasons that hold up across a variety of ways that the world might have been or may yet be. This demand on global climate models—which is a consequence of their theoretical role—makes their assimilation to scientific instruments strained. Even if it is true that lots of modeling and theoretical assumptions go into data collection using instruments, it also seems clear that additional demands are placed on our scientific theories or models when they are used not just as probes of the actual world, but also as diviners of our possible futures or pasts.Footnote 9 So while Norton and Suppe approach the issue of how to confront skepticism about climate modeling in a promising way—by a relative argument with empirical experimentation as a fulcrum—the epistemological analogy seems forced at just the wrong moment, when the predictive or counterfactual quantitative reliability of global climate models is at stake.Footnote 10

If the relative strategy is to succeed as a defense of the quantitative reliability of GCMs, then it will have to avoid the pitfalls encountered in using analogies between GCMs and experimentation as the basis of this sort of argument. In the case of thinking of simulation studies as experiments, it was the stark differences between the experimental and target systems that made it unconvincing to think of simulation studies on GCMs as just like any other case of inferring how one part of the world will behave based on observations of another. Whatever their formal similarities, inferring the future state of the climate from the state of a computer seems significantly different from inferring the material properties of a kind of steel, say, from experiments on samples of that steel. Similarly, in the case of thinking of GCMs as scientific instruments, it was the use of these models to characterize non-actual, or not yet actualized, worlds that made them seem importantly different from other sorts of indirect probes. However complicated the business of constructing global temperature data from a scattered array of observations, projecting such temperatures into the future, or describing how they might have been, seemingly requires confidence in our understanding of the climate over and above that required to arrive at a reliable temperature record. Are there, then, cases where scientific theories or models, perhaps realized on a computer, are used to generate descriptions of what the world might be, or have been, like? If so, are these descriptions standardly recognized to be quantitatively reliable? Such cases, if they could be identified, might act as a successful fulcrum in the relative strategy for defending the quantitative reliability of GCMs. Fortunately, I think that the answer to both of these questions is “Yes”, and that it is possible to bring out a rich domain of potential cases by thinking of the use of GCMs in support of climate change policies in a slightly different way.

3 GCMs as Applied Science

The philosophical literature on models and modeling has long been intertwined with work on the philosophy of experiment. This, along with the tendency of scientists who work with simulation models to describe what they do in experimental terms, has made it natural for philosophers to think of computer simulation models against a backdrop provided by work in the philosophy of experiment (e.g. Norton and Suppe 2001; Winsberg 1999, 2003; Lenhard 2007; Parker 2009; Guillemot 2010). Though this way of thinking about computer simulations has been fruitful, I think that, at least for the purposes of defending their quantitative reliability, it might be more productive to think of the uses of GCMs in policy design as a kind of applied science. Applied science, roughly speaking, is the use of scientific knowledge in technological design and development. There is a growing philosophical literature—which also substantially overlaps with work on modeling—that attempts to understand the procrustean efforts required to get our simple scientific theories to apply to the complex world that must be considered during the engineering (or applied) design process.Footnote 11 Not surprisingly, many of the same issues that arise in thinking about how it is possible to make reliable predictions about out future climate also arise when trying to understand how engineers are able to make reliable estimates of the flight characteristics of wings that no one has ever built, or to calculate the effects of turbulence in the pipes of a proposed chemical plant. Furthermore, there is a natural way of regarding GCMs as analogous to the sorts of theories used by engineers for “explaining, predicting, and (mathematically) describing physical phenomena that occur in—or are relevant to—technological artifacts” (Boon 2006, 34). GCMs, just like engineering theories, are used in the policy context as design tools. Their role is to supply descriptions of the possible climate that allow for sensible decisions about how to craft a policy that aims at a particular goal—maintaining a livable climate for the foreseeable future.Footnote 12 Though a climate policy is not a technological artifact, it is still the (potential) product of human design, and so can be thought about as the object of the applied science of global climate modeling. In the remainder of this paper, I will characterize some of the epistemological similarities between more standard engineering theories and global climate models, and finish with some suggestions about what particular sorts of engineering theories might make an effective fulcrum in a relative argument for the quantitative reliability of GCMs.

Applied science is different from experimental science (though they surely overlap in places) in that it aims not to establish general claims about how or why things work, but instead to produce particular information in support of the design of a potential technological artifact or procedure. One consequence of this is that applied science is often used in circumstances other than those appropriate for testing the science from which it originates. Testing a theory involves finding and arranging some circumstances in which the consequences of the theory are clear, whereas applying a theory involves generating useful descriptions or predictions in circumstances dictated by the design problem at hand. A convincing experimental test of a theory might typically involve isolating one causal factor by some combination of experimental and vicarious control and observing its effects, and then comparing the results of the experiment with the predictions of the theory. The demand placed on a theory in order for it to support this kind of experimental test is that it predicts the effects of the one causal factor when it acts alone. In applications, on the other hand, “we want to work in the heterogeneous domains of different theories and we lack a practicable hypertheory that tells us the upshot of interacting elements” (Cartwright 1976, 714). The business of generating predictions, or quantitative descriptions, in such heterogeneous domains involves stitching together laws that may have been tested individually in simplified circumstances, but which must be applied in cases where multiple causal factors are relevant. The techniques for “modifying and applying laws” (Cartwright 1976, 716) so that they generate quantitatively reliable descriptions, often include the introduction of “phenomenological correction factors” (Cartwright 1983, 111) and may include making assumptions in explicit conflict with a relevant background theory.Footnote 13 Furthermore, these techniques may be applied in an “ad hoc fashion [as] demanded by each new situation” (Cartwright 1976, 716). Still, as Cartwright reports, “[w]e have a very large number of phenomenological laws in all areas of applied physics and engineering that give highly accurate, detailed descriptions of what happens in realistic situations” (Cartwright 1983, 127).

When considered as a kind of applied science, may of the features of GCMs that have been sources of concern about their quantitative reliability appear instead to be standard hallmarks of a theory, or model, that has been crafted to make predictions about complex, real world phenomena in support of a design process. GCMs are, quite explicitly, attempts to stitch together into one model the array of causal factors that are thought to have an important role in central features of the climate, such as temperature, precipitation and their distributions. They are ‘autonomous’—the result of a complex and creative process of model building, rather than the simple deductive consequences of some more fundamental theory. The generation of quantitatively reliable descriptions of possible climates using this autonomous model has required the use of techniques of simplification, idealization, and approximation that are quite common in engineering science, but poorly understood by epistemologists and philosophers of science. Parameterization, which is endemic in global climate modeling and the frequent source of epistemological qualms about its reliability, can be thought of as just the standard business of introducing correction factors that do not derive from fundamental theory. The flux adjustments that were the source of such concern about coupled atmospheric ocean models appear to be just another curious case where a phenomenological model must make assumptions in conflict with more fundamental theories in order to get on with the business of generating useable predictions. Even the fact that many of the techniques used by climate modelers have “a life of their own” that appears from the outside to be a series of kludges would not make them exceptional from the point of view of engineering science. Like most models or theories intended to serve practical or design purposes, GCMs are constructed and modified in a context that includes limited resources and time pressure. This results in a willingness, or perhaps a need, to sacrifice elegance and detail for solutions that rapidly give the sorts of general information needed to supply answers to design questions.

Still, one might think, some of the distinctive features of global climate models make them importantly different from standard engineering theories. GCMs are simulation models of one historical and ongoing complex system carried out entirely on a computer, and they cannot be tested in the central cases relevant to the design problem they are intended to address (e.g. the future climate) by comparing their quantitative predictions with observations. It is probably true that there are some features of global climate modeling that are not common in engineering science, such as, perhaps, the lack of any direct experimental control over planetary climates (though there are natural experiments used to test climate models). However, the whole point of engineering theories is to makes estimates, accurate enough for design purposes, of how some particular possible entity for which there is no direct experimental access would behave in certain conditions. These estimates are based on theories crafted from observations of related entities in similar conditions. Furthermore, there is always the risk that some causal factors important to the performance of the design object are missing in the theory (remember Galloping Gertie). At this general level then, global climate models are not so different from more standard examples of applied science. Even though philosophers of science are just beginning to tease out the kinds of simplifications, approximations, and idealizations that facilitate applied science and can do little to explain why they work, there is no broad epistemological anxiety about the predictive reliability of these sorts of applied theories—they are implicitly relied upon every time one boards a plane or crosses a bridge. This makes engineering models, and the predictions generated from them a rhetorically forceful analog for global climate models and the potential backbone of a relative strategy for developing predictive confidence in global climate models.

Of course some engineering theories are more closely analogous to global climate models than others, and the ideal fulcrum for a relative argument would be an engineering theory or model that is a close as possible to a climate model, but which is still presumed to be epistemically trustworthy. Because the physical theories that are the starting point of climate models are fluid dynamics and thermodynamics, it is natural to look for engineering theories that have this same physical core. Likewise, it would also be most rhetorically effective if the chosen applied science also made use of simulation models run entirely on the computer. These requirements suggest that the use of computational fluid dynamics (CFD) (in aviation, for example) would be the ideal fulcrum for a relative defense of the predictive reliability of global climate models. Climate modeling skeptics, like the rest of us, frequently bet their lives on the reliability of the aviation design process, which increasing depends on CFD simulations (see Moin and Kim 1997). Finishing off the relative argument requires a convincing account not only of how the numerical predictions of CFD are used in, say, aviation design, but also a demonstration that the simulations in CFD share the features that have been sources of epistemologically anxiety with global climate models.Footnote 14 There will, of course, be substantial differences between the use of CFD in other engineering contexts (such as aerodynamics or aeronautics design) and global climate modeling, but the hope would be that they could be understood as differences in degree rather than as differences in kind. It helps that the design demands placed on GCMs are so much less than the demands placed on other typical CFD simulations. Whereas CFD in aeronautical contexts may have to get lift and drag estimates correct within some small error bar in order to be useful, GCMs may just need to get the signs and orders of magnitude of various climate variables correct in order to be useful to climate policy designers. Differences in the rigor and completeness of the robustness testing, for instance, that GCMs undergo relative to CFD models may be understood to be differences in degree appropriate to, or at least compatible with, the different design objectives.Footnote 15 Of course it is hard to predict how convincing such an argument will be until it is actually produced in detail.

In conclusion, I have described–and distinguished from its alternatives–a strategy for defending the quantitative reliability of global climate models as they are used to support policy design. I hope to have made it plausible that this strategy can succeed by sketching its general contours and describing how it might be defended against some plausible objections. In addition, even if a relative argument of this sort is not wholly convincing, it will have the benefit of forcing epistemological concerns about global climate modeling to focus on what distinguishes this sort of modeling from the much less epistemologically suspect(ed?) sorts of modeling that take place all the time in engineering design contexts. Getting clear about what is distinctive about these particular modeling efforts can only help philosophers hoping to understand the epistemology of climate science.