1 Introduction

Gross Domestic Product (GDP) has been adopted by many national and international bodies over the last century as a proxy for the health and progress of a society (Kubiszewski et al. 2013; Van den Bergh 2009). However, it is widely acknowledged that this was never the intended purpose of the GDP indicator (Kubiszewski et al. 2013) and there have been countless efforts to devise better suited measures, which capture not only the economic, but the social and environmental components of our well-being too. Notably, initiatives like the OECD’s ‘Better life’ initiative (OECD 2018), and the Commission on the Measurement of Economic Performance and Social Progress (Stiglitz et al. 2009) have made large strides towards identifying, articulating and measuring what makes society prosperous, equitable and sustainable (Jackson 2010). Although there is no universally agreed definition of societal wellbeing, we situate our understanding of ‘indicators of societal wellbeing’ in the context of these initiatives. This, therefore, captures both objective and subjective notions of wellbeing and encompasses all those indicators that attempt to measure the progress of our societies and the health of our ecosystems. These indicators take a wide range of forms and foci, with ongoing debates in the literature focusing on the monetisation of nature and wellbeing, the use of objective versus subjective measures of wellbeing, and whether and how to aggregate fundamentally incommensurable measures (Barrington-Leigh and Escande 2018; Yang 2014). However, there is often an over focus in the literature on the technical characteristics of these new indicators, without due attention being paid to the ecosystem surrounding the indicator, including who the end-users are, how they interpret the indicators, and the role that the indicator and its end-user ultimately play in the policy-making process.

1.1 Use of Indicators in Policy Development

Much has been written about the policy process in different countries and policy domains. Authors have variously scrutinised the actors involved, the influence of power and politics in agenda setting (Gerston 2014; Birkland 2015), the (mis- or non-)use of different types of information and tools for designing and appraising policy (Marmot 2004), and what constitutes a policy cycle (Howlett et al. 2009), among other areas. Of particular interest to the scientific community has been the ways in which policy makers interact with and use different forms of evidence and information in policy making. From experiential or expert-based knowledge, to public surveys, ad-hoc scientific studies, assessments and indicators, there is a rich literature devoted to this issue (Bauler 2012; Weible 2008). Here we consider specifically the use of indicators of societal wellbeing by civil servants.

Civil servants fill a range of key roles in policy development, appraisal and implementation. Their ability to positively affect societal well-being through these roles is, in part, dependent on their ability to effectively absorb, translate and apply relevant evidence and information to the policy problems they face. Indicators in particular act as a succinct and accessible form of information with the ability to track trends across time and compare different sub-groups within the population. Given these characteristics, and the continued dominance of economic indicators such as GDP (Bell and Morse 2011), indicators of societal wellbeing may have an important part to play in centralising wellbeing and the environment in policy decision-making (Allin and Hand 2017, pp. 17). The analyses and inputs of civil servants are among many factors (e.g. public opinion, political agendas, financial constraints) considered by high level civil servants and government ministers, and indicators themselves are only one form of evidence that civil servants may choose to use. Nonetheless, understanding how and why civil servants use indicators in their work is one crucial facet of the policy process.

The literature on indicator use among policy makers has largely focused on issues of policy relevance or indicator content (Hezri and Dovers 2006), and on assessing the technical characteristics of the indicator, such as statistical robustness and accuracy (Lehtonen et al. 2016; Bauler 2012). Much of this research rides on the underlying assumption that indicators inform decisions in a direct and linear way; otherwise known as instrumental use of information (Lehtonen et al. 2016; Weible 2008; Hezri and Dovers 2006). However, where there is a high level of complexity and conflicting opinions—as there is in national-level policy making—such instrumental use of information is often impractical (Rinne et al. 2012). Instead, the information delivered through these indicators may lend itself more readily to conceptual or political use (Lehtonen et al. 2016; Bauler 2012; Hezri and Dovers 2006). In conceptual use of information, indicators operate as message carriers, shaping decision-makers’ “frameworks of thought”, rather than as direct tools for decision making (Lehtonen et al. 2016, p. 2). Political use, by contrast, describes the use of indicators in contributing to complex types of learning; for example, being used as ammunition to influence political agendas and to redefine problems (Lehtonen et al. 2016).

This distinction in types of ‘use’ is important because it shapes what we see as relevant in determining who uses indicators and how. In particular, conceptual and political use of indicators brings into focus the importance of the characteristics of end-users and the political conditions in which the indicator is deployed. For example, Sébastien and Bauler (2013, p. 3) note that user-factors such as the “expectations, belief systems [and] mental models” of policy actors may be more significant in determining the use and influence of sustainable development indicators at the EU level than their technical characteristics (Sébastien et al. 2014). Crucially, they (Sébastien and Bauler 2013, p. 5) also suggest that the degree of resonance between the mental models of the end-users and the way in which the indicator “frames the reality and the problems in question” may be a key determinant of the likelihood the indicator will be used and embedded at the collective level. Of course, this is only one part of the complexity that forms end-user characteristics, with indicator literacy, organisational information cultures, and other factors also forming important parts of the puzzle.

The concept of bounded rationality helps us to understand how mental models, or ‘worldviews’ may play a role in determining the use/non-use of information by policy-makers (Turnhout et al. 2007). We briefly define worldviews as “general social, cultural and political attitudes toward the world and ‘orienting dispositions’ that guide individual responses in complex situations” (Leiserowitz 2006). Individual actors, including civil servants, often fail to make rational decisions in complex decision environments because of cognitive artefacts or limitations (e.g. the inability to calculate complex trade-offs accurately, attentional deficits, the influence of emotion, habit and unreliable memory), which interfere with their decision processes (Jones 2002). This results in the use of cognitive shortcuts which aid decision making (Jones 2002). In particular, individuals may disfavour certain types of information over others. For example, information from sources external to their network (Rich 1991), or information which contrasts with their worldview (Zagorin 1998), may be more readily rejected. Bell and Morse (2011) find that practitioners and policy-makers themselves recognise the importance of these factors, with many noting that the success of indicators is partially determined by “who has developed the [indicator] and who is championing it” (Bell and Morse 2011, p. 292).

The use of short-cuts for deciding which information is more or less trustworthy, combined with the high error costs associated with making the ‘wrong’ decision at the national policy level, may lead policy-makers to be heavily critical of new information which does not resonate with their existing worldview (Turnhout et al. 2007; Collingridge and Reeve 1986). One result of this is the pursuit of “endless technical debates” between scientists and policy-makers, as neither party fully recognises the role that these end-user characteristics play in determining whether an indicator will prove acceptable to its intended users (Turnhout et al. 2007, pp. 223). Understanding the plurality of views that exist among civil servants may therefore be important in breaking this deadlock and designing indicators that are likely to have wider uptake.

1.2 Case Study

We take the UK as our case study for better understanding the (non-)use of indicators of societal wellbeing. The UK’s Measuring National Well-being (MNW) programme was launched in 2010 by the Office for National Statistics (ONS) in order to “start measuring our progress as a country, not just by how our economy is growing, but by how our lives are improving” (Cameron 2010). The MNW programme collects and reports on a dashboard of 41 measures of well-being, covering personal well-being, relationships, health, what we do, where we live, personal finance, economy, education and skills, governance and environment (Office for National Statistics 2018). This work has been complemented by a number of companion programmes including the National Performance Framework’ in Scotland (Scottish Government 2018) and the ‘National Indicators for Wales’ (National Assembly for Wales 2015).

While it certainly sits ‘beyond GDP’, the MNW framework still faces some major limitations as a way of measuring societal well-being. Of particular interest for this study, there is limited evidence of the widespread uptake and use of the indicators produced by the MNW programme in driving UK policy. The intentions of government in creating the MNW programme were explicitly focused on measuring well-being, with no clear commitments made about how the new measures would be used, and by whomFootnote 1 (Cameron 2010). Since its launch there have been only a handful of concrete examples of use of the MNW indicators to assess a specific policy problem (e.g. for the assessment of a series of airport schemes, Pwc 2014). In 2013, the UK government stated that “it should be emphasised that this is a long-term programme… and as such we should not expect to have examples of major decisions that have been heavily influenced by wellbeing at this stage” (GOV.UK 2013). Nevertheless, accounting for policy effects on wellbeing has certainly been encouraged more generally in recent years, both within government (e.g. HM Treasury’s ‘The Green Book’ 2018) and by intermediaries (e.g. What Works Centre for Wellbeing’s ‘Wellbeing in Policy Analysis’ 2018). This may have impacted attitudes towards, and use of, national indicators of wellbeing by civil servants. However, the lack of publicly available evidence and guidelines for indicator use in policy making means that it is still unclear whether and how things have progressed in the 7 years since that statement.

Despite the likely importance of understanding the views and underlying mental models of indicator end-users, there appears to be little research looking at the views of civil servants about measuring societal well-being. The ONS talk about “engagement with policy departments” during the development of the MNW programme (Matheson 2011, p. 20). However, the contents of this engagement appear not to be documented in the ONS archives, meaning that indicator developers and the broader scientific community are not able to utilise its insights. There is, therefore, a space for transparent analysis of what views civil servants hold about measuring societal wellbeing, and how these might be affecting their use (or not) of indicators.

Our study begins to address this gap in the literature by asking: what views exist among civil servants in the UK about measuring societal well-being? From this point, we then aim to reflect on whether these views are adequately catered for by the MNW programme or other indicators of societal wellbeing. For this task we used Q methodology; an interview-based methodology, lauded for its ability to explore and capture the diversity of views that exist among a group of stakeholders about a particular topic, in a formal way (Gall and Rodwell 2016; Steelman and Maguire 1999). Because of their position as a central group of indicator end-users, better understanding the views of civil servants about measuring societal wellbeing may also contribute more broadly to understandings of how we can improve the efficacy and universality of indicator use in policy making.

In the remaining sections of this paper we give a background to the methodology, including its benefits in the context of our study (Sect. 2), followed by a detailed account of our methods (Sect. 3). In Sect. 4 we present the results of our study. The significance of these results and their implications for measuring societal well-being in the UK and beyond are then discussed in Sect. 5, alongside some recommendations for future research.

2 Review of the Methodology

2.1 The Process

Q methodology is a quali-quantitative technique for eliciting the subjective views of participants about a topic, which are not ordinarily observable (Cross 2004), in a structured way (Gall and Rodwell 2016). It achieves this by presenting participants with a set of carefully constructed, opinion-based statements, known as the ‘Q-set’, which in theory represent the full array of views held about the topic (Watts and Stenner 2005). Participants are then asked to sort these statements into a grid which consists of a series of numbered columns labelled from ‘least agree’ to ‘most agree’ (or some variant thereof), according to how they feel about the statement (Watts and Stenner 2005).

The grid shape, or distribution, is selected by the researcher and often takes a quasi-normal shape, with columns at the extremes of the grid holding fewer statements than those in the middle (see Fig. 1).

Fig. 1
figure 1

This figure shows an example quasi-normal grid distribution for Q statements to be sorted into. Each statement is given a number, and one number is allocated to each cell. For example, in this grid, only 2 statements can be sorted into the least agree (− 5), and most agree (+ 5) columns

Participants’ ‘sorted’ grids (i.e. those for which one statement has been assigned to each grid cell) are analysed using Principle Component Analysis (PCA). PCA identifies similarities in the way that participants have sorted the statements, resulting in a set of participant groupings, or ‘factors’ (Watts and Stenner 2005). Information about each factor is then brought together with any qualitative data collected from interviews with participants to develop a ‘discourse’ (i.e. text that describes the views held by the participants associated with that factor). This process is detailed in Fig. 2.

Fig. 2
figure 2

Q-study procedure, from statement sorting to discourse analysis

2.2 Can it Really Work?

Q methodology assumes a finite diversity in the ways that people express their views (Cross 2004), meaning that there are a limited number of discourses in circulation about a topic at any one time. This leads Q researchers to claim that the methodology can identify the full range of existing views held by a population about a specific topic, using a relatively small sample size (Brown et al. 1999; Stainton Rogers et al. 1995; Brown 1980). This idea is reflected in the literature, with more than a third of Q studies published in the last 10 years using fewer than 30 participants (“Appendix 1”). Central to this point is the argument that “Q methodology has no interest in estimating population statistics” and so has no need for a large or representative sample of participants (Cross 2004, p. 210). Instead it is more important to prioritise a diverse sample of participants likely to hold differing views (Zabala and Pascual 2016; Cuppen et al. 2010). Further, Q is considered to be structurally different to traditional R methodology, with the Q-set forming the equivalent of the ‘sample’, and the participants instead representing something akin to the ‘experimental condition’ (Cross 2004). In this way, criticisms based on sample size are often considered misguided (Brown et al. 2015).

The methodology has also been criticised as “impotent” to find all existing opinions within a population, owing to the limited nature of the Q-statements as compared to the potentially infinite nature of the opinion domain (Kampen and Tamás 2014, pp. 3113; Cross 2004). However, Q methodology is a scientific tool, and as such there are, of course, limitations to both its accuracy and precision (Brown et al. 2015). This cannot fairly be levelled as a criticism against it. The more important question is whether the outputs of the study can be considered useful and reliable. More statements could be added to the Q-set to increase the ‘precision’ with which participant views are characterised. However, large numbers of statements can result in participant fatigue, risking the reliability of the study. In any case, the purpose of most Q studies is to identify broad commonalities in the viewpoints held by individuals in a population (Brown et al. 2015), with qualitative interviews providing more detailed information, where needed. Even those studies with large numbers of Q-statements rarely identify more than 3−6 distinct views (Brown et al. 2015), validating the position that current ‘best practice’ applications of Q methodology are perfectly adequate to meet their aims.

It is also important to note that the ability of a Q-study to capture the full range of opinions that exist within a population will in practice depend on a number of other factors, in addition to the number of statements presented to participants. For example, the construction of the statement set by the researcher (i.e. is it thorough and does it represent the diversity of discourses currently in use?), the size of the population, and the degree of heterogeneity of opinions within it, will all affect the efficacy of a Q-study. Variability in these factors is a limitation of Q methodology, not because a single study may not capture the full diversity of opinions within a population, but rather because it is difficult to confirm the validity of the results through further research. That is to say, we could only attempt to confirm that we have captured the full diversity of views within a population by conducting a Q-study with a very large number of statements, involving the whole population. Importantly, this limitation does not undermine the views that are revealed through the study, which still themselves represent valid expressions of opinion that exist within the population, given the set of statements presented to the participants. Rather it is a limitation that should be considered from the outset when deciding on the desired outcomes of a Q-study. In particular, if the study is exploratory in nature, there is no reason this limitation should present a barrier to such research, although it should be taken into consideration when drawing conclusions.

2.3 Example Applications

Q methodology has been used widely to inform policy development, most commonly in relation to specific environmental management issues (Ockwell 2008; Ellis et al. 2007; Steelman and Maguire 1999). However, it has only rarely been used in the development or appraisal of social, environmental and economic policy indicators, as we do here. Of particular relevance to our study, Doody et al. (2009) sought to identify publicly acceptable sustainable development indicators in the UK. Using Q methodology, the authors were able to identify key areas of concern for the public, and areas that appeared to be irrelevant or of little interest. This ultimately enabled them to develop indicators which better reflected the views of the public (Doody et al. 2009).

Doody et al. (2009)’s study highlights two significant benefits of using Q methodology for investigating a complex and multi-faceted issue, such as measuring societal well-being. First, participants can make clear and nuanced prioritisations by integrating complex trade-offs implicitly into their internal decision-making process (Zabala and Pascual 2016). Second, by presenting all participants with the same set of opinion statements, analysts can directly compare the views of participants on all of the issues covered. This allows for the identification of specific areas of consensus and conflict (Steelman and Maguire 1999), which can guide future research and indicator development. Specifically, by moving beyond ‘practiced’ rhetoric on a topic, which is often elicited using more traditional interview techniques, it becomes more straightforward to identify areas of common ground to bridge between differing views. This characteristic of Q has proven to be particularly useful in assessing environmental policy where there is pre-existing conflict (Barry and Proops 1999; Van Eeten 2000). These strengths make Q methodology a strong candidate for investigating the range of views that exist about measuring societal well-being within the UK civil service.

3 Methods

3.1 Study Design and Data Collection

Q is a flexible method that can be implemented in many different ways, from the types of items being sorted (e.g. O’Neill et al. 2013 used images instead of statements) to the interview technique and selected grid shape. For this reason, transparency is a key element of Q studies. We have therefore included a table below detailing each design component of this study and a justification for our selected approach (Table 1). In brief, participants were given a set of statements (the Q-set) which reflected the central debates in the literature, the media and among civil servants themselves around measuring societal wellbeing. We asked participants to sort these statements into an 11-column grid, ranging from − 5 (least agree) to + 5 (most agree) (as per Fig. 1 above). This process results in one completed grid, or ‘Q-sort’, per participant (see Fig. 2 for diagram detailing the process). After the sorting exercise, each participant was interviewed to provide context to the quantitative results.

Table 1 Q methodology study design decisions

3.2 Data Analysis

We conducted a Principle Component Analysis of the completed grids, or Q-sorts (see “Appendix 3” for full R code). The PCA identified clusters in the way that participants sorted their statements into the grid. Each of the identified clusters, or ‘factors’, represents a distinct group of Q-sorts, reflecting participants with similar views on the study topic (Zabala and Pascual 2016). In order for each of the factors in the PCA to be considered distinct from one another (i.e. that they each represent a genuinely unique view point), they must all meet the set of criteria laid out in Table 2.

Table 2 Criteria used for factor extraction (Zabala and Pascual 2016; Davies and Hodge 2007)

Once the final number of factors was decided on, a representative Q-sort was constructed for each factor. This reflects the mean view of the participants associated with the factor (Zabala and Pascual 2016). ‘Distinguishing’ and ‘consensus’ statements were also identified for each factor at this stage. Distinguishing statements are those statements (from the Q-set) for which one factor’s mean positioning of that statement in the sorting grid is significantly different from the other factors’ positioning of the same statement, at the 5% level. Consensus statements, by contrast, are those statements for which each factor’s positioning of the statement was not significantly different from one another; in other words, their views on the statement were not distinguishable. The idealised Q-sorts, distinguishing and consensus statements, along with the qualitative interview data provided by participants, formed the basis for discourse construction. One discourse was developed per factor extracted from the PCA.

3.3 Reliability Testing

We used reliability testing here to better understand how stable our Q-study results were; i.e. how consistent the PCA outputs were under repeated samples. We chose a bootstrapping methodology which allowed us to calculate distributions and new standard errors for various key statistics, such as factor loadings and z-scores (Zabala and Pascual 2016) (See “Appendix 4” for a detailed explanation of the bootstrapping methodology). This enabled us to calculate more accurate measures of reliability through repeated re-sampling and replacement of Q-sorts (Zabala and Pascual 2016).

We opted for 1000 bootstrap repetitions, in line with recommendations of “at least 40 times the size of the sample” (Zabala and Pascual 2016, pp. 8). Because this Q-bootstrapping methodology is relatively new, and because our sample size is less than the 45 Q-sorts recommended to achieve highly accurate results (Zabala and Pascual 2016), we used the bootstrapping results primarily as a guide for interpretation. Hence, although we used the bootstrapping results to inform discourse development, we reported both the standard and bootstrapped PCA results, and supported the discourse development with the qualitative interview data. Further, we relaxed the range for Q-sort instability, such that a Q-sort must be flagged in between 20 and 75% of repetitions to be considered unstable.Footnote 2 This reflects our cautious approach to using this new methodology.

4 Results

4.1 Factor Scores and Distinguishing Statements

Forty-eight statements were selected from the concourse to form the final Q-set to be sorted by participants (see Table 1 for Q-statement selection criteria; see “Appendix 5” for list of statements and breakdown by topic area). Thirty-five UK civil servants were contacted for participation in the study. We obtained a 59% response rate, with 20 civil servants ultimately taking part from a range of departments (see Table 3). Participants had a variety of job roles, largely focused on policy design, implementation and appraisal in societal wellbeing relevant domains. Thirteen respondents were classified as mid-level civil servants, and seven as senior-level.Footnote 3 Twenty Q-sorts (or sorted grids) of 48 statements each were therefore analysed using a standard and bootstrapped PCA. From the standard PCA (i.e. without bootstrapping repetitions) we found that a three-factor solution met all the relevant criteria for extraction (see Table 2 for extraction criteria; see “Appendix 6” for full factor results against each extraction criteria). Eighteen out of the 20 Q-sorts loaded significantly onto one of the three factors, and two Q-sorts were confounded. Together the three factors accounted for 72% of the study variance, well above the threshold of 35% set out in Table 2 (see “Appendix 7” for full bootstrapping results, including bootstrapped factor scores and standard errors).

Table 3 Breakdown of participants, by government department

The factor scores calculated for each Q statement against each factor, in both the standard and bootstrapped PCAs, can be found in Table 4. Distinguishing and consensus statements from the standard and bootstrapped PCAs are also shown here. After applying the bootstrapping procedure, we found a number of unstable statements associated with each factor, whose factor score or status as a distinguishing or consensus statement changed (Table 4). Importantly, our analytical choice to use these bootstrapped factor scores in place of the standard factor scores when developing the discourses (see Sect. 3.3) did not dramatically change the interpretation of the factors. In particular, factors 1 and 2 were largely unaffected. However, it did lead to a slightly different emphasis for factor 3, with six distinguishing statements becoming no longer distinguishing. The bootstrapping analysis also highlighted a number of unstable Q-sorts with large standard errors or ambiguous flagging frequencies. The significance of these unstable Q-sorts is discussed in Sect. 4.2.

Table 4 Q statements and their standard (std.) and bootstrapped (bts.) factor scores, for each of the three factors (f1, f2 and f3)

4.2 Discourses

Qualitative data was collected from 18 of our Q-sort participantsFootnote 4 and used to aid construction of the final three discourses. Below we give brief summaries of each of the discourses; full discourses can be found in “Appendix 1”, alongside discussions of the implications of any unstable statements and Q-Sorts.

4.2.1 #1 The Socio-Environmental Discourse

This discourse is defined largely by a concern that measurement of, and decision-making about, societal well-being should include the full range of natural, human and social capital; taking proper account of the potentially damaging effects of economic activity on each of them, in both the short and long term. Factor 1 formed the basis for this discourse, for which summary information can be found in Table 5.

Table 5 Summary information for factor 1

Participants who loaded onto this factor were concerned that GDP does not capture a holistic view of the world around us (S1: + 5*).Footnote 5 In particular, they showed concern that certain elements of value generated by the environment are overlooked (S2: + 4*), and strongly supported better integration of the value of natural capital into decision making (S4: + 4*). In support of these ideas, participants commented that:

We should be measuring economic growth, but also natural capital, social capital, human capital. That just gives you a much more well-rounded view of society as a whole (Participant 14)

When things like health and education are clearly so important and so immediate, I think there’s a danger of some environmental things getting left out of the assessment of how we’re doing as a society (Participant 11)

While GDP remains (wrongly in my view) the indicator of choice of wellbeing it should at least include a value for the resources used so that sustainability is more central to policy making (Participant 12)

In this vein, the participants who loaded onto this factor strongly believed that building sustainable well-being is not only important but in fact necessary, both for future generations and for current generations too (S47: − 5). Even when challenged with the idea that the concept of sustainability may be poorly defined (S22: − 5*), these participants felt that:

Current sustainable well-being and future sustainable well-being are inextricably linked and if we make bad decisions… now, the impact for current and future generations is significant (Participant 6)

Although the concept of sustainability [is] ill-defined, [it is] crucial to understanding the state of our population, and we should make work to define [it] further rather than disregard [it] (Participant 3)

Additionally, those who were associated with this factor drew attention to the need to take proper account of the damage caused by economic activity (S31: + 5*), such as the negative health effects of the tobacco industry.

It seems to me that GDP and some other indicators or measures of progress completely neglect the damage that we cause in the process (Participant 11)

The tobacco industry is doing absolutely no good whatsoever for society, and yet being propped up and… allowed to function… Even doctors argue for it at times, reasoning that the taxes people pay on cigarettes funds the NHS. I think this is fundamentally twisted and flawed logic and we need to seriously re-think our society (Participant 6)

4.2.2 #2 The Self-determination Discourse

This discourse is defined by the strong belief that access to opportunity and the ability to define one’s own destiny are key determinants of wellbeing. Factor 2 formed the basis for this discourse, for which summary information can be found in Table 6.

Table 6 Summary information for factor 2

Participants who loaded onto this factor felt strongly that being empowered to make choices about your own destiny was central to well-being (S39: + 5). This was exemplified by the quote:

I think the key to happiness and ‘well-being’ is being in control of your own life and feeling as though you have the freedom to influence its direction and outcomes (Participant 18)

This concept was also reflected in their opinion that quality of life should be assessed in terms of the opportunities people have to achieve well-being, rather than whether or not they actually achieve it (S13: + 4*). This came from two distinct perspectives, one reacting against the idea of a ‘nanny state’—“I think it’s patronising to kind of prescribe ‘this is what makes people happy” (Participant 2)—and another advocating the idea that a “[level] playing field” in terms of access to opportunity is key for societal well-being (Participants 7 and 8).

These participants also placed an emphasis on economic and job security (S9: + 4, S12: + 3), which is consistent with the ideas above about the importance of autonomous decision-making, commenting that:

The availability of a job lets you access all the other [elements of well-being] that might be measured. [For example], without a job you might not have the social life that you want… or [be able to] raise your children how you want (Participant 5)

This discourse was further distinguished by a more favourable view on subjective measures of well-being than the other two factors, which again supports the ideas expressed above that people generally know what is best for them. This was manifest in participants disagreeing that subjective measures were unreliable and might lead people to be contented with their ‘lot in life’, no matter how bad (S45: − 1*, S43: − 3). One participant stated:

I objected to the ones that suggested you shouldn’t trust people to know what they’re talking about when they give subjective opinions… Particularly when you aggregate them all, despite variations, they probably know what they’re saying (Participant 8)

Participants associated with this factor were also distinct from those associated with other factors in their consistent indifference towards GDP as a measure of societal well-being, and any adjustments to it (S15: 0; S16: 0; S17: 0*; S18: 0; S19: 0).

4.2.3 #3 The Technocratic Discourse

Participants associated with this factor give close attention to the technical difficulties of measuring societal well-being and the potential pitfalls of trying to alter GDP. Factor 3 formed the basis of this discourse, for which summary information is included in Table 7. Of note, only one of the three Q-sorts associated with this factor using the standard PCA procedure was found to still be exemplary after bootstrapping. This calls into question the status of this factor as representing a unique view point (as per the extraction criteria in Table 2). However, closer inspection of the qualitative data justifies maintaining three factors (instead of dropping to two). The interview data clearly supports the idea that factor 3 brings a unique perspective when compared to the other two factors but leads us to cautious interpretation of the factor outputs for discourse development.

Table 7 Summary information for factor 3

Participants associated with this discourse acknowledged the complexity of the concept of societal well-being and the difficulty of capturing it adequately in a single measure (S5: + 5*, S7: + 5*). They further expressed that they felt GDP was not the best way to capture this complexity (S15: − 2, S17: − 5). However, participants from this discourse did not think that altering the way in which GDP is calculated would be the solution to this problem (S19: − 5*, S16: − 3). These sentiments were supported by interview quotes:

There’s always more than one number (Participant 10)

I don’t think [GDP] is enough to say [whether] someone has societal welfare or not. There are loads of other factors (Participant 1)

GDP is primarily an economic indicator and it is useful for that… It would be [better] to have multiple indices that look at different things, rather than changing something that essentially was never intended to be a measure of societal welfare (Participant 1)

Those who associated with this factor were also distinguished by a belief that we need to better capture the contribution of non-traditional sectors of the economy, such as the gig economy, to societal well-being (S20: + 4). This is again more of a technical issue than a value-based issue about what we should measure as part of societal well-being.

Finally, participants showed general indifference or indecision (particularly when compared to other factors) towards more moralistic issues such as: whether we should be concerned about sustainable well-being for future generations (S47: − 2*); whether empowerment is a key part of well-being (S39: 0*); the relative importance of community and interpersonal relationships (S35: 0, S34: − 1); and whether well-being can be expressed in monetary terms (S29: − 1). They were, further, reluctant to show strong views on the role of government in promoting stable relationships and parenting (S36: 0*), and whether government should prioritise economic growth over other (perhaps less well defined) factors, such as sustainability and wellbeing (S22: 0).

4.2.4 Areas of Consensus Between Discourses

There was a broad base of consensus among all factors, with 24 consensus statements identified after bootstrapping (Table 8). This means that there were 24 statements for which the mean positioning of the statement was indistinguishable between all three factors.

Table 8 Summary information for consensus statements

The need to measure inequality and basic human rights in the UK was a stance that was shared across all factors (S25: − 2, − 4, − 4; S26: + 3, + 4, + 3; S27: − 4, − 4, − 4, S38: − 3, − 3, − 2). In particular, one participant felt that:

We might be better than many other countries on some of these measures, but we are a very long way from perfect. And actually, if we assess these [things] we might not find we are quite as good as we like to think (Participant 11).

All discourses also shared the stand point that economic growth is not the foundation of well-being (S21: − 2, − 3, − 3).Footnote 6 Qualitative data from participants suggest that the primary reason for disagreeing with statement 21 was the importance of other factors in determining well-being too. In particular, they indicate an aversion to the centrality of economic growth and GDP in measuring societal well-being, rather than a disagreement with it having any role at all. This is demonstrated by the following quotes:

[I don’t believe that] economic growth is the essential foundation of everything, of all our wellbeing. I think there’s lots of other things that are important as well. (Participant 11)

There are things that don’t necessarily correlate with GDP like people’s mental health or people’s relationships, so I just wouldn’t call it a reliable measure at all (Participant 14)

In line with this, all discourses also agreed that GDP per capita is not a good measure of standard of living (S12: + 3, + 3, + 3), in particular that it does not give a fair reflection for most people in the UK.

Economic wealth is largely in the hands of a few individuals – so GDP doesn’t tell you much about the quality of life for the citizens of that country (Participant 12)

They also felt that all aspects of well-being cannot be fairly expressed in monetary terms (S29: − 3, − 4, − 1) and that focusing on enhancing GDP as a way to improve well-being might, therefore, lead to unintended, negative consequences (S30: + 2, + 3, + 1). One participant highlighted some of the potential negative consequences of this ‘over-focus’ on monetary values and GDP, such as increasing inequality and environmental decline (Participant 19). This stance was exemplified by the following quote:

I think there are elements or aspects of wellbeing where it’s so difficult to put a monetary value on them that we don’t, and because there’s so much emphasis on the monetary value, those factors just get left out altogether (Participant 11)

5 Discussion and Conclusion

5.1 Recap of the Discourses

Using Q methodology, we have investigated the views that exist among civil servants about how we should measure societal well-being in the UK. The three discourses identified accounted for 72% of the study variance, with each representing a distinct perspective on measuring societal well-being. In brief, those participants who aligned with the socio-environmental discourse (#1) were concerned about the potential consequences of ignoring natural, social and human capital in decision making. Those associated with the self-determination discourse (#2) held the strong belief that access to opportunity and the ability to define one’s own destiny were key determinants of well-being, with an emphasis on economic security as a way to facilitate individual autonomy. Lastly, those participants associated with the technocratic discourse (#3) were reluctant to express strong views on moralistic issues or on statements about the role of government; instead they tended to focus on the merits and disadvantages of specific ways of measuring societal well-being.

5.2 Implications for Measuring Well-Being in the UK

There were very few statements where discourses were in direct contradiction with one another, with most distinguishing statements differentiating between strong feelings towards a statement and less strong, or neutral, feelings. The three discourses therefore represent different focuses on what is considered by civil servants to be most central to well-being in the UK, rather than direct disagreements about whether or not certain elements contribute to well-being at all. In many ways, this makes the differences between the discourses easier to resolve and highlights the role of Q methodology in allowing differences of opinion to be highlighted in a nuanced and transparent way.

Three recommendations can be drawn for indicator development from this work: first, to increase the use of a capitals-based approach; second, to use both outcome and opportunity-based metrics; and third, to include more disaggregated measures of inequality. We discuss each briefly below.

First, some civil servants clearly favour a more extensive use of the capitals model of national wellbeing, particularly with respect to natural capital. All discourses agreed with this sentiment to some degree, with discourse 1 showing a particularly strong preference for a capitals-based approach. Indicators of capital currently appear in the MNW programme in a very limited way (e.g. only one measure of natural capital is used). More comprehensive capital accounts already exist for the UK in other places (Office for National Statistics 2017, 2019a, b), and integrating them more fully into a centralised indicator would offer decision-makers in the civil service a more complete picture of the ‘stock’ of wellbeing in the UK today.Footnote 7 New Zealand, for example, has integrated a capitals-based dashboard into their national ‘Living Standards Framework’, and are using it to help identify budget priorities and distinguish between department funding bids (The Treasury 2019). In fact, there are many capitals-based indices from which the MNW programme could draw (e.g. Index of Sustainable Economic Welfare (Cobb and Daly 1989), Inclusive Wealth Index (Thiry and Roman 2014), etc.).

Second, the most contentious statement in our Q-study was that “we should be assessing quality of life in terms of the opportunities people have to achieve well-being… rather than whether or not they actually achieve it…” (S13). Here, the point of contention between discourses centred around whether it would be sufficient to assess opportunity, or whether this would be a meaningless measure in the face of a complex society, where other factors could hinder someone’s ability to fulfil that opportunity. In this instance, the MNW dashboard of indicators captures almost exclusively measures of outcome, with no significant inclusion of measures of opportunity (Office for National Statistics 2018). It is worth noting that attempting to aggregate outcome measures with measures of input or opportunity can lead to ‘double counting’ wellbeing, introducing sources of uncertainty into an index (Fu et al. 2011). However, given the dashboard structure of the MNW programme (i.e. measures are not aggregated), there is no theoretical reason not to add a sub-section to the dashboard that reflects citizens’ opportunities to flourish.

Third, there was a strong emphasis in all three discourses on the continuing need to measure inequality and human rights in the UK. For example, most indicators in the MNW dashboard are broken down by age and gender. However, the dashboard currently only reports headline figures for these subgroups, and not spread; it gives no indication of the statistical significance of any differences between subgroups; and there is no break down by other important subgroups, such as ethnicity or socio-economic status (Office for National Statistics 2018). Given the apparent importance of disaggregated information for civil servants—a finding that is supported elsewhere in the literature (Sébastien and Bauler 2013)—this may be hindering the use and usefulness of such indicators. Other indicator frameworks do address this issue to some degree (e.g. Global Gender Gap Index, and Inequality-Adjusted Human Development Index, Index of Sustainable Economic Welfare, Genuine Progress Indicator) (Yang 2014). However, their limited focus on a single axis of equality, such as income or gender, leaves room for further development.

In addition to these three concrete recommendations, our study also appeared to reveal a view about economic growth that was common to all three discourses. Specifically, that economic growth is not the foundation of societal well-being, and that monetary expressions of that well-being, such as GDP, do not adequately reflect the standard of living of most people in the UK. These results indicate support for the existence of a view among civil servants that economic growth is not the central and sole driver of our well-being. However, discourse three in particular has a large number of statements with large standard errors (i.e. there was a low level of agreement between participants within the discourse) and, as a whole, is highly focused on technical issues. The combination of these two facts causes us to question the simple narrative of a shared sentiment about economic growth, and gives rise to two possible interpretations. The first possible interpretation is that the premise of our study—the need to measure societal well-being, beyond GDP—does not fit the worldview of the participants associated with discourse three. This explanation draws from the “overcritical model” of the use of science in policy making (Turnhout et al. 2007, pp. 223), where actors will try to “deconstruct, discredit and reject scientific knowledge that does not fit with already existing opinions, fixed interests or established consensus” (Turnhout et al. 2007, pp. 223). This could hint at a partial explanation for why uptake of indicators of societal wellbeing is still low within government. If key actors within government are highly critical of the producers of, or the conceptual framework underpinning, the societal-wellbeing indicators, then no matter which specific indicators are chosen it is unlikely that the indicators will have any influence on policy, even if they become embedded in the policy process. The second possible interpretation is that these actors may simply have a clear understanding of the complexity of societal well-being and the limitations of GDP as a measure of it, reflected in their strong opinions about adjustments to GDP and their uncertainty about whether and how to include moralistic aspects of well-being. Although the unstable Q-sorts and statements caution us to tread lightly with any concrete conclusions from discourse three, this result certainly points towards an interesting area for future research.

These insights offer some practical examples of how understanding the views of end-users can help with indicator development, and may support wider use, echoing what civil servants and practitioners have expressed in other studies: that “policy-makers need to become far more engaged in the [indicators] discourse if these tools are to succeed” (Bell and Morse 2011, p. 298).

5.3 Limitations and Future Research

This research was designed as an exploratory study, offering a practical example of the way that better understanding the views of indicator end-users could support improved indicator development. We hypothesise that this might then support more widespread use by civil servants. There is now need for research which takes a less exploratory and more systematic approach, (1) to reveal the actual extent of use of indicators of societal wellbeing within government, and (2) to capture the prevalence of certain views among civil servants about those indicators. Further research that seeks to better understand how a range of factors—including worldviews, organisational culture, data literacy, seniority, supporting legislation, public opinion, and political agendas—affect the use and influence of indicators of societal wellbeing by civil servants in practice is also needed. Recent developments in New Zealand might offer a rich potential case study, as the government released their first “wellbeing budget” in 2019. This provides arguably the most advanced example of a national government integrating indicators of national wellbeing into policy decision-making. Of particular interest, the national indicator set—the ‘Living Standards Framework’—was developed by Treasury itself (The Treasury 2018) and is now, in theory, being used to direct policy proposals and to inform budgetary decisions (The Treasury 2019). Further, the Scottish ‘National Performance Framework’ and the ‘National Indicators for Wales’ are both supported by legislation mandating that ministers set targets and monitor progress against a set of national wellbeing indicators (Community Empowerment (Scotland) Act 2015; Well-being of Future Generations (Wales) Act 2015). This offers the opportunity, for example, to derive insights about the effectiveness of specific tools to create an environment that encourages indicator use.

A central limitation of this study is that it is not possible to verify how successful we have been in identifying the full diversity of opinions across the civil service about measuring societal wellbeing. In particular, we might have reason to examine the validity of our results due to the very high level of consensus across the statements and factors. As stated in Sect. 4.2, there were 24 consensus statements identified through the bootstrapping analysis, and there was also a high level of correlation between factors 1 and 2 (see Table 11 in Appendix 6). This might suggest that there is a broad base of agreement which underpins all three of the discourses, and particularly discourses 1 and 2. However, it might also speak to the precision of our Q-study, indicating that the selected Q statements were too general to detect nuanced differences in viewpoint. Alternatively, it might even be a result of sample bias, introduced by the snowball sampling technique, where the inclusion of individuals with inter-relationships may “overemphasise [the] cohesiveness in [the] social network” (Atkinson and Flint 2001, pp. 2). As we argued in Sect. 2, this does not invalidate the opinions expressed by participants, particularly given the exploratory nature of the study. However, providing a clearer answer as to why there was such a high level of consensus would be a useful next step for further research, perhaps by extending this research to a more diverse set of civil servants and adding more (and more specific) statements to improve precision (Brown et al. 2015), or augmenting it with in-depth interviews (as per Steelman and Maguire 1999; Valenta and Wigger 1997; Brown 1993). Alternatively, as suggested above, a more systematic approach to identifying civil servants’ views (e.g. through survey-based methods) might now be appropriate. Nonetheless, the views identified here can provide first valuable insights for indicator development.

5.4 Conclusion

Through this study we have explored the views of civil servants in the UK towards measuring societal well-being. Three distinct discourses emerged from our analysis: one concerned about the consequences of ignoring natural, social and human capital in decision making; one that emphasised opportunity and autonomy as key determinants of well-being; and one that focused on the technical aspects of measuring societal well-being. These discourses hold insights that have particular relevance for the further development of the Measuring National Wellbeing programme, as the primary indicator framework used the UK. The data gathering, valuation and aggregation methodologies are generally already advanced enough to implement these kinds of changes to an indicator framework like the MNW. This again draws attention to the need to bring the focus of the indicator literature away from issues of technical development, and towards questions about how end-users’ worldviews, organisational culture, data literacy, supporting legislation, and political agendas affect indicator use and influence in policy making. This is not to negate the importance of improvements to data availability and valuation methodologies, but rather to acknowledge that they are one element in a complex indicator ecosystem, of which many parts have so far been understudied. We therefore hope that this paper has effectively highlighted the potential benefits that considering the views of end-users might bring to indicator development initiatives.