Contextual utility affects the perceived quality of explanations

Vasilyeva, Nadya; Wilkenfeld, Daniel; Lombrozo, Tania

doi:10.3758/s13423-017-1275-y

Contextual utility affects the perceived quality of explanations

Brief Report
Published: 06 April 2017

Volume 24, pages 1436–1450, (2017)
Cite this article

Download PDF

Psychonomic Bulletin & Review Aims and scope Submit manuscript

Contextual utility affects the perceived quality of explanations

Download PDF

Nadya Vasilyeva¹,
Daniel Wilkenfeld² &
Tania Lombrozo¹

1884 Accesses
17 Citations
1 Altmetric
Explore all metrics

Abstract

Are explanations of different kinds (formal, mechanistic, teleological) judged differently depending on their contextual utility, defined as the extent to which they support the kinds of inferences required for a given task? We report three studies demonstrating that the perceived “goodness” of an explanation depends on the evaluator’s current task: Explanations receive a relative boost when they support task-relevant inferences, even when all three explanation types are warranted. For example, mechanistic explanations receive higher ratings when participants anticipate making further inferences on the basis of proximate causes than when they anticipate making further inferences on the basis of category membership or functions. These findings shed light on the functions of explanation and support pragmatic and pluralist approaches to explanation.

Evaluating everyday explanations

Article 08 March 2017

Jeffrey C. Zemla, Steven Sloman, … David A. Lagnado

Is Evidential Fit Grounded in Explanatory Relations?

What Are the Pragmatics of Explanation?

Article 03 January 2020

John W. Carroll

Do people evaluate the quality of explanations differently depending on how well the explanations suite their needs in a given context? Suppose, for instance, that Ana and Bob are both interested in marsupials. Ana is studying marsupials to diagnose their ailments; Bob is interested in understanding their adaptations. When it comes to explaining why kangaroos have large tails, will Ana find mechanistic explanations (for instance, in terms of development or genes) more compelling than Bob? Will Bob find teleological explanations (for instance, that appeal to balance) more compelling than Ana?

Research increasingly supports the idea that (many) representations and judgments are sensitive to contextual factors, including an individual’s goals and the task at hand (e.g., Aarts & Elliot, 2012; Barsalou, 1983; Markman & Ross, 2003). This raises the possibility that judgments concerning the quality of explanations are similarly flexible. Moreover, some accounts of explanation can naturally accommodate forms of context sensitivity. Lombrozo and Carey (2006), for example, suggest that one function of explanation is to support future reasoning and behavior by highlighting generalizable or “exportable” relationships (see also Craik, 1943; Heider, 1958). Given that information is differentially useful in different contexts, one might expect judgments of explanation quality to reflect contextual utility: Explanations should be perceived as better to the extent they contain information that’s inductively useful given one’s current or expected context (see also Leake, 1995, for a relevant discussion).

Within philosophy, so-called pragmatic accounts of explanation also allow for the possibility of substantial context sensitivity. For example, van Fraassen (1980) proposes that context shapes the contrast class—that is, the set of possible alternatives to the target observation that the explanation needs to account for—as well as the relevance relationship between the explanation and what it explains (that is, the relationship that makes the answer explanatory with respect to a particular question in a particular context; see also Gorovitz, 1965; Hilton, 1990; Hilton & Erb, 1996; Lipton, 1990, 1993 for theories of explanatory relevance). Asking whether kangaroos have long tails (as opposed to short tails) involves a different contrast class from asking why they have long tails (as opposed to, say, long front paws), whereas providing a mechanistic explanation arguably involves a different relevance relationship from a teleological explanation. Such proposals raise the possibility that different contexts call for different kinds of explanations to account for one and the same observation. More concretely: Ana might be right, given her context, to favor a mechanistic explanation for the kangaroo’s tail, and Bob might be right, given his context, to favor a teleological explanation.

In the current studies we test the hypothesis that judgments of explanation quality are sensitive to contextual utility. More specifically, we test the prediction that explanations of a given kind (e.g., mechanistic vs. teleological) will receive a relative boost when they highlight explanatory relationships that are likely to support the evaluator’s inductive aims in the context of a given task. In the remainder of the introduction, we clarify our notion of “contextual utility,” briefly review past work on related questions, and provide an overview of the three experiments we go on to report.

Explanations in context

Explanation generation and evaluation are affected by many factors, including the explainer or evaluator’s beliefs (e.g., Hilton, 1990; Pennington & Hastie, 1993), the intended recipient of the explanation (e.g., Vlach & Noll, 2016), and social/motivational considerations, such as “saving face” or persuasion (see Patterson, Operskalski, & Barbey, 2015, for review). Explanations are also generated and evaluated in contexts that involve various (potentially inconsistent) goals operating at multiple scales. In explaining why kangaroos have long tails, Ana might want to diagnose a specific medical condition, understand marsupial physiology, do well on a veterinary exam, and impress her instructor all at once. While these forms of “context sensitivity” are of interest in their own right, the focus of the present research is on the extent to which an explanation of a particular kind is privileged in virtue of highlighting a generalization that is relevant to the recipient’s immediate task. In other words, we examine whether the perceived value of explanations is proportional to the degree of guidance they provide for inferences anticipated in the current context. This is what we refer to by contextual utility. To isolate this facet of explanation, it is important to keep other factors fixed, including the question being asked, the source of the explanation, and the background knowledge of the participant evaluating a given explanation.

Prior work provides evidence that the evaluation and generation of explanations is indeed sensitive to context, but context has been varied alongside participants’ explanatory task or background beliefs. For example, shifts in contrast class have been shown to influence both the generation and the evaluation of causal explanations (McGill, 1989; Hilton & Erb, 1996), but in such cases the explanations are effectively answering different questions. Chin-Parker and Bradner (2010) found that the frequency with which participants generated mechanistic and teleological explanations for an event sequence was influenced by changing background conditions, but this manipulation also changed background knowledge. Finally, Hale and Barsalou (1995) had participants complete a task with an initial system-learning phase followed by a trouble-shooting phase, and found that the types of explanation that participants generated varied across phases. However, phase was confounded with several factors, including task order, changes in background knowledge, and task instructions (think aloud vs. explanation). It thus remains an open question whether contextual utility—as we have defined it—affects the perceived quality of explanations.

If anything, research to date suggests that when all of these factors are held constant, explanatory preferences are quite stable (e.g., Kelemen, Rottman, & Seston, 2013; Lombrozo, 2007). Moreover, mainstream accounts of explanation from philosophy have typically set pragmatic and contextual considerations to the side, instead focusing on a specification of formal relationships or features that are constitutive of explanations, such as deductive arguments of a particular form (Hempel & Oppenheim, 1948) or causal processes that generate an effect (Salmon, 1984). On these views, pragmatic factors have a limited influence, perhaps in what one chooses to explain or in the level at which an explanation is pitched.

Here we investigate whether people evaluate an explanation differently depending on its contextual utility, that is, the degree of guidance the explanation is expected to provide for the kinds of inferences that a person anticipates making in the context of a given task. We aim to provide a direct test of this effect by keeping background knowledge and what is being explained fixed across evaluators’ tasks. We expect any effects, if found, to be relatively small, as they should operate on top of relatively stable preferences determined by the features that are held constant across our experimental manipulations.

We report three experiments in which we ask people to evaluate explanations of different kinds: formal (which appeal to category membership; see Prasada & Dillingham, 2009a), mechanistic (which appeal to proximate causes), and teleological (which appeal to goals or functions). Importantly, we experimentally manipulate the contextual utility of each kind of explanation by varying whether the relationships that underwrite each type of explanation—that is, between a property and category membership (formal), its proximate causes (mechanistic), or its function (teleological)—are more or less useful in light of participants’ task. If judgments of explanation quality are sensitive to the contextual utility of the generalization that underwrites a given explanation, then formal, mechanistic and teleological explanations should receive higher ratings in the context of tasks involving generalizations along corresponding dimensions. For example, if a task requires participants to predict the presence of a given feature on the basis of its function (as opposed to its category membership or a proximate cause), this should make participants value generalizations that relate the feature to the function (e.g., “long tails support balance”). Because this is a generalization that underwrites a teleological explanation (“kangaroos have long tails because they improve balance”), the perceived quality of teleological explanations in this context should be boosted relative to their perceived quality in contexts involving category-based or cause-based generalizations.

Experiment 1

Participants learned about novel artifacts and biological kinds with target features that supported multiple explanations. For instance, participants read about a microorganism with a property (rises to the ocean's surface) supporting a formal explanation (because it is a glenta), a mechanistic explanation (because it has special photosensitive receptors), and a teleological explanation (because doing so helps it replenish oxygen reserves). These explanations were evaluated in the context of a generalization task that required participants to predict the presence of the target feature in a new item based on information about either its category membership (category-based task), its proximate causal structure (cause-based task), or its functions (function-based task). As additional reference points, we included circular explanations and a baseline condition in which participants evaluated explanations in the absence of any additional task. We predicted that ratings of explanation “goodness” would be affected by the type of generalization that the specified task involved, with a boost for explanations congruent with that task.

To isolate the effects of contextual utility (as opposed to background knowledge), all participants received the same information about the target phenomena. Thus, all explanations (formal, mechanistic, and teleological) drew from the same pool of stated facts but pointed out different regularities that would support different generalizations (i.e., inferences based on shared category, cause, or function).

Method

Participants

Four-hundred-and-twelve participants were recruited on Amazon Mechanical Turk in exchange for $1.65; an additional 95 participants were excluded for failing a memory check that consisted of classifying descriptions of living things and artifacts as seen vs. unseen (12 descriptions in Experiments 1 and 3, allowing for up to two errors; 36 descriptions in Experiment 2, allowing for up to six errors). In all experiments, participation was restricted to workers with an IP address within the United States and with an approval rating of 95% or higher from at least 50 previous tasks on Mechanical Turk.

Materials, design, and procedure

Participants were presented with descriptions of 16 fictional living things and artifacts, each described with a label and three features organized into a causal chain (see Table 1 for an example, and Supplementary Materials for the full list of stimuli). For each entity, participants evaluated one of four explanations for the middle feature in the causal chain (formal, mechanistic, teleological, or circular) using a 9-point scale anchored at very bad explanation (1) and very good explanation (9). All explanations cited information familiar from the item description. During training, participants were specifically instructed to rate explanation goodness rather than truth (see Supplementary Materials for details). Each participant evaluated four explanations of each type, with item-explanation pairings counterbalanced across participants.

Table 1 Sample instructions, explanations, and task-reinforcers used in Experiment 1. Task was manipulated between subjects; explanation type and domain were manipulated within subjects (with one explanation presented for evaluation on each trial). Text referring to artifacts is provided in square brackets

Full size table

Crucially, participants rated explanations in either a baseline condition, which did not involve an additional task, or in a generalization condition that specified one of three additional tasks: category-based, cause-based, or function-based generalization. In each generalization condition, participants were informed that after evaluating explanations (as illustrated with two training trials), they would be making predictions about new objects and organisms, where the predictions would be based on known category membership, cause features, or function features. This served as the manipulation of contextual utility, and it was reinforced after each explanation evaluation by having participants perform an inference of the promised type. Specifically, participants were given information about an entity behind a black box and had to rate how likely it was that the target feature of the original item generalized to the occluded item (see Table 1 for examples). The information provided varied across tasks: participants were told whether the occluded entity belonged to the same category as the original (category-based), shared the same cause feature (cause-based), or shared the same function (function-based). The main purpose of this step was to maintain participants’ focus on the task. Ratings were therefore not analyzed and are not reported.

Participants completed 16 trials, each consisting of an explanation evaluation (in which they rated the quality of an explanation), and for participants in one of the three generalization conditions, a subsequent prediction to reinforce the specified task. Because domain was not a variable of central theoretical interest, and because it did not interact with the effect of task in Experiments 1 or 2, we collapsed across this variable for analyses.

Results and discussion

Explanation ratings were analyzed in an ANOVA with explanation type as a within-subjects factor and task as a between-subjects factor. This revealed significant main effects of both explanation type, F(3, 1224) = 1365.60, p < .001, η_p ² = .770, and task, F(3, 408) = 6.81, p < .001, η_p ² = .048. Overall, participants preferred mechanistic and teleological explanations over formal explanations, all of which were preferred over circular explanations, all ps < .001 (see Table 2). Mechanistic and teleological ratings did not differ from each other, t(411) = .63, p = .531. We take this pattern to reflect chronic explanatory preferences, which form the basic profile on top of which we might expect to see shifts driven by contextual utility.^{Footnote 1} Ratings were also higher under the categorical task than the causal task (Tukey’s HSD p = .039) and baseline (p < .001) conditions; however, this main effect of task has no bearing on the questions we investigate here.

Table 2 Mean explanation goodness ratings as a function of explanation type and task in Experiments 1, 2 and 3 (SDs in brackets)

Full size table

Most importantly, we found a significant interaction between explanation type and task, F(9, 1224)=5.73, p < .001, η_p ² = .040. A series of planned contrasts supported our prediction that explanation ratings would be boosted in the context of a congruent task. Three separate contrasts compared ratings of formal, mechanistic, and teleological explanations in the context of the congruent task versus the average of ratings for that explanation type in the other two (incongruent) task conditions. As predicted, each explanation type was rated as significantly better under the congruent task compared to the other generalization conditions: formal explanation F(1, 408) = 9.85, p = .002, η_p ² = .024; mechanistic explanation, F(1, 408) = 7.36, p = .007, η_p ² = .018; teleological explanation, F(1, 408) = 7.23, p = .006, η_p ² = .019 (see Fig. 1). Circular explanations were not significantly influenced by task (one-way ANOVA, F(3, 408) = 2.48, p = .061).

As a further test of the relationship between explanatory preferences and task context, we classified participants based on the explanation type for which they gave the highest average ratings. Twenty ties (18 between mechanistic and teleological explanations) were excluded. As shown in Fig. 2, the distribution of explanation preferences varied significantly across tasks, χ ²(9, N = 392) = 31.87, p < .001. Standardized residuals indicate that the effect was driven by participants being more likely to favor mechanistic and, marginally, teleological explanations within the corresponding congruent task contexts (standardized residuals 2.5, 1.9), and less likely to favor these explanations within incongruent task contexts (standardized residuals -2.3, -2.4). The latter pattern, which suggests competition between cause- and function-based reasoning, was additionally supported by a negative correlation between ratings of mechanistic and teleological explanations, r(410) = -.19, p < .001. No other pair of explanation ratings was significantly negatively correlated.

The baseline condition was originally included to evaluate whether the perceived quality of explanations of a given type was improved, relative to baseline, in the context of a congruent task, or instead depressed, relative to baseline, in the context of an incongruent task. However, the experiments did not, as a whole, support a clear and consistent story; we therefore bracket consideration of this condition (see Supplementary Materials for details).

In sum, Experiment 1 reveals that contextual utility affects the perceived quality of explanations. Statements that explained an observation in terms of category membership (formal), in terms of proximal causal mechanisms (mechanistic), or in terms of functions (teleological) were perceived as better explanations in the context of tasks that called for the information provided by these explanations. As expected, these effects acted on top of more stable explanatory preferences (favoring mechanistic and teleological explanations over formal explanations), which likely reflect the chronic utility of these explanations across a variety of contexts (see Lombrozo & Rehder, 2012, for a relevant discussion of functional explanations). Finally, these effects were observed even though all participants received exactly the same information about the categorical, mechanistic, and functional relationships involving the explained features.

Experiment 2

Experiment 2 had two objectives. First, to verify the reliability of the small effects observed in Experiment 1, we aimed to replicate the interaction between task and explanation types, but with a different manipulation of contextual utility: Participants were given a task as a museum assistant, which involved classification (grouping items), proximate causes (identifying how something came about), or functions (identifying functions). Second, we aimed to better understand the mechanism underlying the effect of contextual utility on explanations. On van Fraassen’s (1980) account, context can influence an explanation in several ways: by changing the general topic, the contrast class, or the relevance relation. Even when a topic and contrast class are fixed, however, the relevance relation can remain underspecified. For example, if one asks why blood circulates through the body (as opposed to not circulating through the body), either a mechanistic explanation (“because the heart pumps the blood through the arteries”) or a teleological explanation (“to bring oxygen to every part of the body tissue”) would stand in an appropriate relevance relation to the question, even if the contrast class is fixed to {blood circulates; blood does not circulate} (p. 142). Given our interest in effects of contextual utility, we aimed to investigate whether our task manipulation could influence the evaluation of formal, mechanistic, and teleological explanations even when the contrast class of the why-question was explicitly fixed across tasks.