Abstract
Technologies that deploy data science methods are liable to result in epistemic harms involving the diminution of individuals with respect to their standing as knowers or their credibility as sources of testimony. Not all harms of this kind are unjust but when they are we ought to try to prevent or correct them. Epistemically unjust harms will typically intersect with other more familiar and well-studied kinds of harm that result from the design, development, and use of data science technologies. However, we argue that epistemic injustices can be distinguished conceptually from more familiar kinds of harm. We argue that epistemic harms are morally relevant even in cases where those who suffer them are unharmed in other ways. Via a series of examples from the criminal justice system, workplace hierarchies, and educational contexts we explain the kinds of epistemic injustice that can result from common uses of data science technologies.
Similar content being viewed by others
Notes
In some cases, decision makers have no choice but to rely on computational models and simulations as means of determining the best course of action. For a discussion of why this is the case see Boschetti et al. (2012).
Some of these issues predate the computational context. For an account of the development of data-tracking and large-scale record keeping that connects the pre-computational era to present concerns see Colin Koopman (2019). For a conceptual analysis of the development of statistical methods in general, see also Desrosières (1998). In their edited volume Life by Algorithms, Catherine Besteman and Hugh Guterson provide an overview of the morally significant effects of what they call ‘roboprocesses’ on contemporary life (Besteman & Gusterson, 2019).
There are formal features of judgments with respect to the collective aspects of knowledge, for example, common knowledge, that epistemic logicians have shown are foundational for inclusion in certain kinds of norm-governed social behavior. Participation in some norms seems to require that one is judged capable of sharing in common knowledge (see Rendsvig, 2021).
Notable exceptions are philosophers of technology who do not draw a conceptual line between social contexts and the technical artifacts that emerge from or are deployed within it (See Latour 1988, written under the pseudonym of J. Johnson; Simondon, 2017; Slater, 1980). For these philosophers, the ethical aspect of technological development is not addressed by exploring the relationship between societal harm and technological use, but rather by exploring the values and interests of society that bring these technologies into being. Hence, scholars that follow this approach, such as Amoore (2020), claim that “the algorithm already presents itself as an ethicopolitical arrangement of values, assumptions, and propositions about the world and cannot/should not be analyzed on its own”. Similarly, Green states that “Data scientists must recognize themselves as political actors engaged in normative constructions of society and, as befits political work, evaluate their work according to its downstream material impacts on people's lives” (2020). While we agree that treating technology in isolation from its societal context is limited, in this paper we also defend conceptual distinctions that allow us to identify harms particular to a specific technology independently of the settings in which they were developed or deployed. See our discussion of Jeroen van den Hoven’s (2000) taxonomy of the kinds of moral wrong-doing associated with different kinds of technologies in Section Five of this paper.
It is important here to note the originality of Miranda Fricker’s contribution while contextualizing it with other works that also deploy or function within an epistemic framework and that are an important contribution to the understanding of epistemic harms. Fricker’s original contribution lies in the fact that her account of epistemic injustice sought to identify the possible harms directly related to the unjust diminution of an agent’s epistemic status, due in part to irrelevant social factors (we thank an anonymous reviewer for helping us emphasize this point). As we will see below, similar concepts such as epistemic violence (Dotson, 2011) sought to capture and account for a different set of phenomena such as physical or social harms done to agents in virtue of epistemic reasons or elements such as ignorance. Similarly, as we will see in detail below, distributive accounts of the exact same term ‘epistemic injustice’ seek to identify social harms and obstacles such as poverty or segregation that result in an unjust distribution of an epistemic good such as education. A social harm that has an epistemic source and an unfair distribution of epistemic goods that was caused by a social harm, though often contiguous or related, are not conceptually the same phenomenon as a discriminatory diminution of an agent’s epistemic status. It is this latter phenomenon that Fricker’s framework brings to the fore (2017) and it is this discriminatory account of epistemic injustice that best fits the kind of phenomenon we seek to account in the context of our interactions with data science technologies and methods.
As we will see, this is an important distinction made by Fricker herself (2017) that addresses a prevalent conflation in the literature between discriminatory harms of an epistemic nature and distributive asymmetries of epistemic goods such as education or other observable harms stemming from epistemic motives such as ignorance or so.
It is important to note that some epistemic injustices are so extreme that scholars, such as Medina (2017), use the term ‘hermeneutical death’ to refer to them. These are instances in which there is a total erasure of an agent’s voice and capacity to engage in meaning-making cultural activities. In hermeneutical injustices, the agent has difficulty articulating the harm they are being subjected to and in some cases may not even recognize their diminished condition as a harm. We thank one of our anonymous reviewers for suggesting that in these extreme cases, individuals will not be indignant.
In this sense, the concept of epistemic injustice differs substantially from terms such as ‘epistemic violence’ used by Kristie Dotson, for example. In particular, while the term ‘epistemic injustice’ entails an unjust diminution of someone’s epistemic status— whether by a discriminatory diminution of their testimony or by a systemic neglect of or the imposition of obstacles to epistemic participation—the term ‘epistemic violence’ picks out the wrongdoers pernicious ignorance. By contrast, ‘epistemic’ as an adjective in the term ‘epistemic injustice’ picks out the aspect of the agent whose status is being diminished. See Dotson (2011), p. 240, especially her example of the pyromaniac toddler. This distinction seems to also apply to other neighboring concepts such as “discursive harms” (Keyes, 2020).
Injustices can also happen in the event of wrongful ascription of an inflated epistemic status. We thank our anonymous reviewer for encouraging us to mention this point.
The sense of trust being discussed here is one close to the epistemology of science and in particular to the epistemology of computational methods in the sciences. Hence, we can call it ‘scientific trust’. In such settings, important debates are taking place regarding the relationship the term trust has to reliability, explanatory understanding and transparency. For a thorough though deflationary account of these efforts, see Durán and Formanek (2018) and Durán and Jongsma (2021). Hence, some theoretical frameworks assume that trust within scientific inquiry is necessarily linked to transparency and should not be otherwise. See Symons and Alvarado (2016) and Alvarado (2021a) to see how and why this applies to issues of data science in particular and Alvarado and Symons (2019) to see why this applies to other software-intensive technology in scientific inquiry and policy-making. We thank our anonymous reviewers for encouraging this clarification.
By irreversibility Boschetti and Symons mean the fact that computational models can generally arrive at the same state via many possible sequences of previous states. “Thus, while in the natural world, it is generally assumed that physical states have a unique history, representations of those states in a computational model will usually be compatible with more than one possible history in the model” (2013, p. 809).
Likewise, the norms governing marriage in some cultures make it difficult for victims of marital rape to explicitly articulate what has happened to them in an institutional context such as a court.
Note here that while COMPAS has been widely cited in research involving bias and fairness metrics, here we are talking about a different problematic aspect of this kind of technology: their opacity. While the bias accusations towards COMPAS have been widely deemed as problematic and taken to be a construct of the metric by which the results of the COMPAS software were measured (Corbett-Davies & Goel, 2018), the issue of whether COMPAS is or is not biased may be more complicated than simply looking at the COMPAS results or the metrics used by those who judge it as biased (to see an insightful discussion of how this may be the case see Hübner, 2021).
Trade secrecy—which according to Burrell (2016) belongs to the category of social epistemic opacity—is at the root of the epistemic injustice in this case. As Wexler notes, this highlights another worrying aspect of these arrangements since “private companies increasingly purport to own the means by which the government decides what neighborhoods to police, whom to incarcerate, and for how long. And they refuse to reveal how these decisions are made—even to those whose life or liberty depends on them.” (Wexler, 2017).
Wexler notes that “We do know certain things about how COMPAS works. It relies in part on a standardized survey where some answers are self-reported and others are filled in by an evaluator. Those responses are fed into a computer system that produces a numerical score.” The important part is that the developers and owners of the software “consider the weight of each input, and the predictive model used to calculate the risk score, to be trade secrets. That makes it hard to challenge a COMPAS result” (Wexler, 2017).
Since we have no access to the algorithm itself or the way its weights and inputs are entered and computed it is simply not possible for us to judge whether it was indeed this one question that was skewing results against Rodriguez. COMPAS, as a data science tool takes into consideration many other attributes and features of those subjected to its processes. The qualitative survey referred to by Rodriguez is but one of the more tangible aspects of the system.
For a demonstration of the user interface of the system marketed by Academic Analytics see https://youtu.be/U_Li7ZEp3e0 (Last accessed Oct. 3rd, 2021).
To see why novel technical artifacts, such as computational methods, should not be granted the same levels of trust as human experts see Symons and Alvarado (2019).
The problems with such a position have been extensively addressed by historians of technology such as Lewis Mumford and Langdon Winner. Regarding computational processes in particular, similar positions regarding the neutrality of information technologies and methodologies was noted by Richard De George (2008, p. 5) in his discussion of “the myth of amoral computing”.
See Alvarado (2020) for an argument for this distinction.
Sofya Noble (2016) highlights the ways in which Google’s search algorithms exacerbate racial bias and the sexualization of young girls via its search results. Since racism and sexualization of young girls existed independently of and prior to the existence of algorithmic technology, the moral problem related to Google’s algorithms is rather that of either exacerbating, enabling, or perpetuating an existing moral problem rather than creating the moral problem itself.
Arguably, the example of internet pornography is not a good choice for him in the formulation of this distinction given novel characteristics of the pornography industry that have arisen in conjunction with the emergence of internet technologies.
Similarly, as Hübner (2021) points out, some instances of algorithmic bias may have their sources in existing historical inequities while others may be the product of an analytic process.
References
Alvarado, R. (2020). Epistemic opacity, big data, artificial intelligence and machine learning. In K. Macnish & J. Galliot (Eds.), Big data and the democratic process. Edinburgh University Press.
Alvarado, R. (2021a). Should we replace radiologists with deep learning? Pigeons, error and trust in medical AI. Bioethics (Forthcoming).
Alvarado, R. (2021b). Explaining epistemic opacity. (Preprint).
Alvarado, R., & Humphreys, P. (2017). Big data, thick mediation, and representational opacity. New Literary History, 48(4), 729–749.
Amoore, L. (2011). Data derivatives: On the emergence of a security risk calculus for our times. Theory, Culture & Society, 28(6), 24–43.
Amoore, L. (2014). Security and the incalculable. Security Dialogue, 45(5), 423–439.
Amoore, L. (2020). Cloud ethics: Algorithms and the attributes of ourselves and others. Duke University Press.
Anderson, E. (2012). (2012) Epistemic justice as a virtue of social institutions. Social Epistemology, 26(2), 163–173.
Basken, P. (2018). UT-Austin professors join campaign against faculty-productivity company. Chronicle of Higher Education.
Barberousse, A., & Vorms, M. (2014). About the warrants of computer-based empirical knowledge. Synthese, 191(15), 3595–3620.
Benjamin, R. (2019). Race after technology: Abolitionist tools for the new jim code. Social Forces.
Besteman, C., & Gusterson, H. (Eds.). (2019). Life by algorithms: How roboprocesses are remaking our world. University of Chicago Press.
Boschetti, F., Fulton, E., Bradbury, R., & Symons, J. (2012). What is a model, why people don't trust them, and why they should. In Negotiating our future: Living scenarios for Australia to 2050, Vol. 2. Australian Academy of Science.
Boyd, D., & Crawford, K. (2012). Critical questions for big data: Provocations for a cultural, technological, and scholarly phenomenon. Information, Communication & Society, 15(5), 662–679.
Bratu, C., & Haenel, H. (2021). Varieties of hermeneutical injustice: A blueprint. Moral Philosophy and Politics.
Buolamwini, J., & Gebru, T. (2018). Gender shades: Intersectional accuracy disparities in commercial gender classification. In Conference on fairness, accountability and transparency (pp. 77–91).
Burrell, J. (2016). How the machine ‘thinks’: Understanding opacity in machine learning algorithms. Big Data & Society, 3(1), 2053951715622512.
Butterworth, M. (2018). The ICO and artificial intelligence: The role of fairness in the GDPR framework. Computer Law & Security Review, 34(2), 257–268.
Coady, D. (2010). Two concepts of epistemic injustice. Episteme, 7(2), 101–113.
Coady, D. (2017). Epistemic injustice as distributive injustice 1. In The Routledge handbook of epistemic injustice (pp. 61–68). Routledge.
Code, L. (2017). Epistemic responsibility. In J. Kidd, J. Medina, & G. Pohlhaus (Eds.), The routledge handbook of epistemic injustice (pp. 107–117). Routledge.
Collins, P. H. (2017). Intersectionality and epistemic injustice. In J. Kidd, J. Medina, & G. Pohlhaus (Eds.), The Routledge handbook of epistemic injustice (pp. 115–124). Routledge.
Corbett-Davies, S., & Goel, S. (2018). The measure and mismeasure of fairness: A critical review of fair machine learning. arXiv preprint arXiv, 1808.00023.
Chouldechova, A. (2017). Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big Data, 5(2), 153–163.
De George, R. T. (2008). The ethics of information technology and business. Wiley.
Desrosières, A. (1998). The politics of large numbers: A history of statistical reasoning. Harvard University Press.
Dieterich, W., Mendoza, C., & Brennan, T. (2016). COMPAS risk scales: Demonstrating accuracy equity and predictive parity. Northpoint Inc, 7(74), 1.
Dotson, K. (2011). Tracking epistemic violence, tracking practices of silencing. Hypatia, 26(2), 236–257.
Durán, J. M., & Formanek, N. (2018). Grounds for trust: Essential epistemic opacity and computational reliabilism. Minds and Machines, 28(4), 645–666.
Durán, J. M., & Jongsma, K. R. (2021). Who is afraid of black box algorithms? On the epistemological and ethical basis of trust in medical AI. Journal of Medical Ethics, 47(5), 329–335.
Else, H. (2021). Row erupts over university's use of research metrics in job-cut decisions. Nature.
Feller, A., Pierson, E., Corbett-Davies, S., & Goel, S. (2016). A computer program used for bail and sentencing decisions was labeled biased against blacks. It’s actually not that clear. The Washington Post, 17.
Flores, A. W., Bechtel, K., & Lowenkamp, C. T. (2016). False positives, false negatives, and false analyses: A rejoinder to machine bias: There’s software used across the country to predict future criminals and it’s biased against blacks. Fed. Probation, 80, 38.
Fricker, M. (2007). Epistemic injustice: Power and the ethics of knowing. Oxford University Press.
Fricker, M. (2017). Evolving concepts of epistemic injustice. 53–60.
Glick, P., & Fiske, S. T. (1997). Hostile and benevolent sexism: Measuring ambivalent sexist attitudes toward women. Psychology of Women Quarterly, 21, 119–135. https://doi.org/10.1111/j.1471-6402.1997.tb00104.x
Green, B. (2020). Data science as political action: grounding data science in a politics of justice. Available at SSRN 3658431.
Grasswick, H. (2018). Understanding epistemic trust injustices and their harms. Royal Institute of Philosophy Supplements, 84, 69–91.
Harding, S. (2016). Whose science? Whose knowledge? Cornell University Press.
Horner, J. K., & Symons, J. (2019). Understanding error rates in software engineering: Conceptual, empirical, and experimental approaches. Philosophy & Technology, 32(2), 363–378.
Horner, J. K., & Symons, J. F. (2020). Software engineering standards for epidemiological models. History and Philosophy of the Life Sciences, 42(4), 1–24.
Hubig, C., & Kaminski, A. (2017). Outlines of a pragmatic theory of truth and error in computer simulation. In M. Resch, A. Kaminski, & P. Gehring (Eds.), The science and art of simulation I. Cham: Springer. https://doi.org/10.1007/978-3-319-55762-5_9
Hübner, D. (2021). Two kinds of discrimination in AI-based penal decision-making. ACM SIGKDD Explorations Newsletter, 23(1), 4–13.
Humphreys, P. (2009). The philosophical novelty of computer simulation methods. Synthese, 169(3), 615–626.
Hutchinson, B., & Mitchell, M. (2019). 50 years of test (un) fairness: Lessons for machine learning. In Proceedings of the conference on fairness, accountability, and transparency (pp. 49–58).
Jo, E. S., & Gebru, T. (2020). Lessons from archives: Strategies for collecting sociocultural data in machine learning. In Proceedings of the 2020 conference on fairness, accountability, and transparency (pp. 306–316).
Kalluri, P. (2020). Don’t ask if artificial intelligence is good or fair, ask how it shifts power. Nature, 583(7815), 169–169.
Kaminski, A., Resch, M., & Küster, U. (2018). Mathematische opazität. Über rechtfertigung und reproduzierbarkeit in der computersimulation. In Arbeit und Spiel (pp. 253–278). Nomos Verlagsgesellschaft mbH & Co. KG.
Keyes, O., Hutson, J., & Durbin, M. (2019). A mulching proposal: Analysing and improving an algorithmic system for turning the elderly into high-nutrient slurry. In Extended abstracts of the 2019 CHI conference on human factors in computing systems (pp. 1–11).
Keyes, O. (2020). Automating autism: Disability, discourse, and Artificial Intelligence. The Journal of Sociotechnical Critique, 1(1), 8.
Kidd, I. J., Medina, J., & Pohlhaus, G. (2017). Introduction to the Routledge handbook of epistemic injustice (pp. 1–9). Routledge.
Kitchin, R. (2014). The data revolution: Big data, open data, data infrastructures and their consequences. Sage.
Koopman, C. (2019). How we became our data: A genealogy of the informational person. University of Chicago Press.
Latour, B., & Venn, C. (2002). Morality and technology. Theory, Culture & Society, 19(5–6), 247–260.
Leonelli, S. (2016). Locating ethics in data science: Responsibility and accountability in global and distributed knowledge production systems. Philosophical Transactions of the Royal Society a: Mathematical, Physical and Engineering Sciences, 374(2083), 20160122.
McKinlay, S. (2020). Trust and algorithmic opacity. In K. Macnish & J. Galliot (Eds.), Big data and the democratic process. Edinburgh University Press.
Medina, J. (2017). Varieties of hermeneutical injustice 1. In The Routledge handbook of epistemic injustice (pp. 41–52). Routledge.
Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., & Galstyan, A. (2021). A survey on bias and fairness in machine learning. ACM Computing Surveys (CSUR), 54(6), 1–35.
Mittelstadt, B. D., Allo, P., Taddeo, M., Wachter, S., & Floridi, L. (2016). The ethics of algorithms: Mapping the debate. Big Data & Society, 3(2), 2053951716679679.
Neal, B. (2019). On the bias-variance tradeoff: Textbooks need an update. https://arxiv.org/abs/1912.08286
Noble, S. U. (2018). Algorithms of Oppression: How search engines reinforce racism. NYU Press.
O’Neil, C. (2017). Weapons of math destruction: How big data increases inequality and threatens democracy. Broadway Books.
Origgi, G., & Ciranna, S. (2017). Epistemic injustice: the case of digital environments. In The Routledge Handbook of Epistemic Injustice (pp. 303–312). Routledge.
Rendsvig, R. & Symons, J. (2021) Epistemic Logic. The Stanford Encyclopedia of Philosophy (Summer 2021 Edition) Edward N. Zalta (ed.), https://plato.stanford.edu/archives/sum2021/entries/logic-epistemic/
Rudin, C. (2019). Do simpler models exist and how can we find them?. In KDD (pp. 1–2).
Rudin, C., & Ustun, B. (2018). Optimized scoring systems: Toward trust in machine learning for healthcare and criminal justice. Interfaces, 48(5), 449–466.
Ruiz, A. G. (2019). White knighting: How help reinforces gender differences between men and women. Sex Roles, 81(9), 529–547.
Saam, N. J. (2017). What is a computer simulation? A review of a passionate debate. Journal for General Philosophy of Science, 48(2), 293–309.
Saltz, J. S., & Stanton, J. M. (2017). An introduction to data science. Sage Publications.
Saxena, N. A., Huang, K., DeFilippis, E., Radanovic, G., Parkes, D. C., & Liu, Y. (2019). How do fairness definitions fare? Examining public attitudes towards algorithmic definitions of fairness. In Proceedings of the 2019 AAAI/ACM conference on AI, ethics, and society (pp. 99–106).
Slater, P. (Ed.). (1980). Outlines of a critique of technology. Inklinks.
Simondon, G. (2017). On the mode of existence of technical objects (p. 59). Univocal Publishing.
Spivak, G. C. (2003). Can the subaltern speak? Die Philosophin, 14(27), 42–58.
Symons, J., & Alvarado, R. (2016). Can we trust Big Data? Applying philosophy of science to software. Big Data & Society, 3(2), 2053951716664747.
Symons, J., & Alvarado, R. (2019). Epistemic entitlements and the practice of computer simulation. Minds and Machines, 29(1), 37–60.
Symons, J., & Boschetti, F. (2013). How computational models predict the behavior of complex systems. Foundations of Science, 18(4), 809–821.
Symons, J., & Horner, J. (2014). Software Intensive Science. Philosophy & Technology, 27(3), 461–477.
Symons, J., & Horner, J. (2019). Why there is no general solution to the problem of software verification. Foundations of Science, 1–17.
Suresh, H., & Guttag, J. V. (2019). A framework for understanding unintended consequences of machine learning. Preprint https://arxiv.org/abs/1901.10002
Van den Hoven, J. (2000). Moral Wrongdoing. Internet ethics, 127.
Vapnik, V. (2013). The nature of statistical learning theory. Springer.
Weltz, J. (2019). Over-Policing and Fairness in Machine Learning (Doctoral dissertation, Pomona College).
Wexler, R. (2017). When a computer program keeps you in jail: How computers are harming criminal justice. New York Times, 13.
Wexler, R. (2018). The odds of justice: Code of silence: How private companies hide flaws in the software that governments use to decide who goes to prison and who gets out. Chance, 31(3), 67–72.
Winner, L. (1980). Do artifacts have politics?. Daedalus, 121–136.
Yong, E. (2012). Nobel laureate challenges psychologists to clean up their act. Nature News.
Acknowledgements
This paper has benefited enormously from the critical feedback of three excellent referees for this journal, we are greatly indebted to them for their careful and rigorous work. We have presented earlier versions of this paper to a wide range of venues over the past three years and are deeply grateful to audiences for both their criticism and encouragement. Critical feedback from Markus Ahlers, Brooke Burns, Martin Cunneen, Nico Formanek, Luciano Floridi, Miranda Fricker, Stephanie Harvard, Jack Horner, Dietmar Hübner, Denisa Kera, Colin Koopman, Nicolae Morar, Kasper Luppert-Rasmussen, Camisha Russell, Laura Schelenz, Jake Searcy, Paul Showler, Irina Symons, Eran Tal, Michał Wieczorek, and Eric Winsberg has been especially helpful to us as this manuscript developed. John Symons’s work is partly supported by NSA Science of Security initiative contract #H98230-18-D-0009.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Symons, J., Alvarado, R. Epistemic injustice and data science technologies. Synthese 200, 87 (2022). https://doi.org/10.1007/s11229-022-03631-z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11229-022-03631-z