Introduction

It has become a commonplace in educational research and policy discourses to state that digital technology has ‘transformed’ the nature of higher education and even the university itself. It is also claimed that this may lead to more interactive or less hierarchical formats and engagement, in which ‘traditional’ modes of teaching such as the lecture, are claimed to be obsolete. These narratives seem to express a widespread desire for overall ‘transformation’ in the university. In terms of digital education, this has been expressed variously in the apparently benign ideologies of ‘active learning’, ‘connectivism’ and ‘the flipped classroom’. These, I propose, share common values, in that they all prize student interaction, and observable engagement, both online and in the face-to-face setting. Although these ideas might appear to be ‘student centred’ and progressive, critics (e.g., Macfarlane 2017) have pointed out that an emphasis on student performance of a particular form of engagement or identity may in fact threaten the fundamental values of academic freedom. I have argued elsewhere (Gourlay 2020) that this tendency is greater in the context of digital education, where performativity, surveillance and regulation are intensified via digital technologies. This might be seen as part of a broader ‘culture of surveillance’ in society (Lyon 2018). As commentators have argued (e.g., Williamson 2017a), the increased use of digital technology data to track and monitor student activity has the effect of ‘datafying’ them as human subjects.

As I have contended (Gourlay 2021), this tendency towards datafication arises from a very particular set of ideas about the university in the digital age. This centres on notions of academics and students as somewhat abstract, disembodied human subjects, removed from their social and material settings. It can be argued that digital technology is used here to express fantasies of human transcendence in higher education and to promote notions of extensions of human intellectual and embodied capacity. These ideas form part of a web of highly contradictory notions about the ontological status of the student, the lecturer, the text, the university and knowledge itself. Utopian desires for extended human agency, untrammelled by the ‘confines’ of embodiment, time and materiality, sit alongside increasingly prevalent digitally mediated regimes of surveillance and control in university settings. As algorithms and data play an increasingly all-pervasive role in society (e.g., Finn 2017; Cheney-Lippold 2017; Beer 2019, 2022), education has become an increasingly datafied (Williamson 2017a, 2019; Selwyn and Gasevic 2020; Fawns et al. 2021). ‘Learning analytics’ is one such tendency; a growing approach to the monitoring of student activity online, typically as part of a learning management system, which, while justified in terms of ‘student support’ and the promotion of vulnerable and disadvantaged groups, may in fact be regarded as primarily driven by the logics of national audit systems imposed on higher education.

Meanwhile, all dimensions of academic practice are increasingly made subject to performative regimes of surveillance, as can be seen in the UK-based Research Excellence Framework and Teaching Excellence Framework.Footnote 1 The marketised model of higher education arguably demands this level of surveillance, in order to record, document and make visible aspects of study, practices and subject positions, which in the past were not amenable to observation and audit. Digital technology has been co-opted as the primary site of surveillance, in which embodied, ephemeral and copresent epistemic practices at the heart of educational process—face-to-face teaching, reading, independent study—are subject to processes which ultimately render them not only as data, but also as documents.

It is beyond the scope of this short piece to provide a full review of the relevant literature, but important critical work has emerged in recent years, which has interrogated the role of datafication in education (e.g. Williamson 2019; Williamson et al. 2020, Selwyn and Gasevic 2020); also, work critiquing attempts to ‘measure excellence’ (e.g. Branković 2018; Hayes 2021; Jandrić 2020). There is arguably a need for this valuable strand of critique to look in more detail at the effects of datafication on the day-to-day life and practices in the university. This commentary piece aims to contribute to this growing critical literature, by proposing an analysis drawing on science and technology studies, and documentation science, looking at two examples of surveillance in higher education: learning analytics and author-level publication metrics, using the example of the h-index.

Datafication and Informative Material Objects

Writing from a new materialist perspective, Kosciejew (2017) proposes the concept of material-documentary literacy, reminding us that one of the main functions of documentation is to materialise information. He points out that ‘information’ is commonly regarded as being an abstract, dematerialised entity, and there is a distancing from its materiality, which is regarded as secondary. In contrast, he foregrounds the materiality of documentation, in order to ‘help (re)configure our understanding of information, as something not immaterial and intangible, but something material and tangible’ (Kosciejew 2017: 97). Kosciejew also focuses on how documentation science ‘can illuminate bureaucratic tentacles that actually do, in a material sense, reach into and control ordinary lives, helping to ensure the effective functioning of governance and governmentality and to manage embodied subjectivities’ (Kosciejew 2017: 98). He emphasises the centrality and ubiquity of documents to contemporary life, also suggesting that their very ubiquity and apparent banality causes us to be inured to them. He points out that documents do not merely record, they are constitutive. For him, ‘[a] document does more than reconstitute. It constitutes different things, such as ideas or entities and materializes them in order that they can be analysed, classified, placed, routinized, viewed, and used’ (Kosciejew 2017: 101).

Kosciejew cites Suzanne Briet’s (1951) example of the antelope as document. Briet asks us—in a compelling manner—to consider the case of an antelope which is captured in Africa, brought to Europe, put in a zoo, and examined by experts and members of the public. She argues that the zoo in this case is effectively a laboratory in which the antelope is analysed, displayed and discussed like a document. As Kosciejew puts it, ‘[o]n its own, the antelope is just an antelope; however, when these material assemblages and components surround it, it becomes a document’ (Kosciejew 2017: 101). Briet refers to it as a ‘catalogued antelope’ (Briet 1951: 11), from which a series of secondary documents are derived. (See Tourney (2003) for an in-depth discussion of Briet’s contribution to the Documentation movement.) Kosciejew (2017) goes on to propose that in the field of library and information science, documentation has been neglected and information has been regarded as more important. This has led, he argues, to a conceptualization of information as either immaterial, or at least separate from its material instantiation.

Kosciejew (2017) also refers to Orom (2007), who argues that this shift towards information is a result of increased interest in digital technologies, and also the increased prominence of the concept of information processing in cognitive science. Orom argues that this emphasis has spread across society more broadly, including into the academic disciplines. He contends that we should shift ‘the object of study from mental phenomena of ideas, facts and opinion, to social phenomena of communication, documents and memory institutions’ (Orom 2007: 58 in Kosciejew 2017: 105), in particular the study of informative material objects.

This provides a conceptual starting point with which to examine the phenomenon of datafication in higher education in a manner which avoids the limitations of mainstream analyses in educational research so far. The concept of the informative material object allows us to analyse information and data as material phenomena which are embedded in specific sociomaterial instantiations and enmeshed with human agency. This contrasts with the dominant paradigm of data and information being abstract, disembodied entities. This, I propose, is a subtle but important distinction which moves the focus onto the entanglement of human, material, digital and analogue agency which constitutes the ‘datafied’ university.

Learning Analytics as a Documenting Practice

In order to provide an illustrative example, in the next section, I will examine a specific case of datafication: ‘learning analytics’. Learning analytics is described as follows in the executive summary of a review document produced by the UK government agency the Joint Information Services Committee (JISC):

Every time a student interacts with their university - be that going to the library, logging into their virtual learning environment or submitting assessments online – they leave behind a digital footprint. Learning analytics is the process of using this data to improve learning and teaching. Learning analytics refers to the measurement, collection, analysis and reporting of data about the progress of learners and he contexts in which learning takes place. Using the increasing availability of big datasets round learner activity and digital footprints left by student activity in learning environments, learning analytics takes us further than data currently available can. (Sclater et al. 2016: 4)

What is immediately of interest in this introduction is the mention of the ‘digital footprint’, with the emphasis on the documenting of the footprint, and the corralling of the student’s steps. The JISC document makes a case for the expansion of the use of learning analytics in UK universities, suggesting four main uses, the first of which is ‘as a tool for quality assurance and quality improvement with … many institutions proactively using learning analytics as a diagnostic tool’ in the context of the state-run audit the Teaching Excellence Framework, in order to demonstrate compliance with this framework (Sclater et al. 2016: 5).

However, what is not discussed is how the pedagogic relationship between the teacher and student, where problems may have previously been identified and addressed by the teacher, has effectively been ‘contracted out’ to the technology, in response to massification of the system. It also shifts the locus of student engagement fully, or in large part, over to the digital setting of the learning management system, requiring intensive engagement in that as a primary, or even sole, marker of student engagement in general. Although this type of analysis may indeed have utility in identifying students who have disengaged, it would also render a student who chooses to work fully or mostly offline as deviant, or in need of remediation. The use of learning analytics risks making displays of interaction in learning management system discussion boards a formal requirement.

There are, however, critical voices in the educational literature and commentators who seek to establish a more nuanced understanding of the effects of learning analytics. Jandrić et al. (2017: 101) recognise the complexity of agencies, stating that in education studies ‘algorithmic cultures signal a shift away from the centrality of individual or social concerns and toward the complex relations between the human and nonhuman agencies that proliferate our digitally networked activities’. Williamson (2017b) flags up the political and economic implications, pointing out that educational data science has become a ‘trans-sector enterprise’, with ownership and power moving over to commercial vendors. He identifies learning analytics as arising from a ‘sociotechnical imaginary’ (Jasanoff 2015) and defines these imaginaries as ‘socially shared visions of technologically mediated progress, that have moved from single inspired individuals to much wider communities and fields of action’ (Williamson 2017b: 107). He argues that educational data science is driven by such an imaginary regarding the future of educational research, leading to claims of a ‘paradigm shift’ towards a position which assumes ‘the inherent truthfulness and unbiased, impartial agnosticism of numbers’ (Williamson 2017b: 109). This goes hand-in-hand, Williamson argues, with a disavowal of any need for educational theory, as the data are seen as able to ‘speak for themselves’.

Prinsloo (2017) also looks at this sociotechnical imaginary, framing his critique explicitly in terms of student surveillance. He refers to Latour (2012), who proposes that, in relation to the design and development of technologies, ‘unintended consequences are part and parcel of any action’ (Latour 2012: 25 in Prinsloo 2017: 139). Prinsloo explores our relationship to algorithms, comparing it to that of Frankenstein to the monster he created, following Latour. He also references the ‘claustrophobic maze’ (Prinsloo 2017: 139) of Kafka’s Trial, in which the protagonist finds himself trapped in a world with no way out, comparing this to a bureaucratic organization in possession of a large body of information about those within its ambit, such as a university using learning analytics. He refers to the concept of algocracy, coined by Aneesh (2006, 2009), in which ‘code appears to have … taken over the managerial function of supervision and guidance’ (Aneesh 2009: 355). Prinsloo explores in his paper the conditions in which algorithmic decision-making may collapse into algocracy. In educational settings, algorithms underpin learning analytics, as he reminds us. He quotes Williamson et al. (2014), who warn that the ‘algorithms that enable learning analytics appear to be “theory-free” but are loaded with political and epistemological assumptions. The data visualizations produced by learning analytics – data dashboards as they are frequently described—also act semiotically to create meanings’.

Prinsloo points out the prevalence of referring to algorithms in terms of human knowing and intentionality, by way of anthropomorphic metaphors such as ‘knowing’ or ‘acting’ (Dijkstra 1985). Turning to education, he reminds us that algorithms should not be regarded as neutral technical entities but are themselves both normative and political. He describes how human agency is encoded into them (Introna 2011), and how that encoding ‘becomes part of organisational architecture and shapes/informs/enacts decision-making that in turn shapes and informs human lives’ (Prinsloo 2017: 143), in particular the power of algorithms to prioritise what is to be regarded as important, and what should be visible. As Beer (2017: 97) puts it, ‘algorithms “govern” because they have the power to structure possibilities’. Prinsloo (2017: 145) sets out how increased digitization has combined with the proliferation of regimes of audit and quality, to lead to greater use of algorithmic decision-making in higher education. For him, learning analytics is ‘a structuring device. It is not neutral. It is informed by current beliefs about what counts as knowledge and learning’.

What is of relevance here is the process by which learner analytics operates, and in particular—I suggest—how it both documents the student, rendering the student as document in Kosciejew’s (2017) terms. Here we see students under surveillance and subject to ideological and normative force, expected to exhibit certain types of behaviour and engagement in support of these ideologies. It is not sufficient for this behaviour to take place, in must also be observable, and ideally recordable. In addition to approved ‘teaching and learning’ behaviours, there are a range of other surveillance practices which have become prevalent in contemporary higher education, as Macfarlane (2017) points out. Returning to Kosciejew’s (2017) analysis, it could be argued that the students themselves are datafied through the processes of learning analytics and the algocracy.

However, I would suggest that this is not merely a process of documentation, with all the ethical complexities discussed by Prinsloo and others. I contend that its effect is more far-reaching, serious and fundamental—in that learning analytics, in my view, alters the very ontological status of the student, who unwittingly becomes a digital document. The student’s ontological status, her being, is in a sense contaminated, by this intervention, and she can no longer exist outside of the baroque entanglements of digital surveillance, rather like Briet’s antelope in the zoo. The next section will consider a further example of datafication in higher education, with an analysis of the author-level metric h-index.

The H-Index as a Documenting Practice

The ‘h-index’ is an author-level metric which was proposed by Hirsch (2005) to measure an individual’s productivity and citation impact. An h-index is the largest number h, such that h articles each have h citations—for example, if an author has 15 papers of which 10 have been cited 10 times, their h-index is 10. In evaluative bibliometrics, measuring performance at the micro-level of the individual is regarded as problematic, as the individual’s output may not be sufficiently large to obtain statistically reliable indicators. It has also been critiqued as an approach because research productivity, publication numbers and citation impact are not necessarily correlated variables (Glanzel 2006; Bornmann and Daniel 2007). However, despite these shortcomings, the h-index was quickly adopted by the scientific community (e.g. Ball 2005) and has become a commonly used metric of academic achievement which can be calculated by setting up an account via Google Scholar, and may be referred to in applications for tenure, academic promotion and funding.

Returning to the critiques of learning analytics, we can analyse the h-index in terms of Williamson’s (2017b) application of Jasanoff’s (2015) sociotechnical imaginary. His positing of a ‘paradigm shift’ towards a position which assumes ‘the inherent truthfulness and unbiased, impartial agnosticism of numbers’ (Williamson 2017b: 109) may also apply to this case. The complexities of an individual’s publication career, which has unfolded within the complex contexts, epistemologies and conventions of a particular discipline, and within that individual’s particularly material and embodied life, are subject to a reductive methodology which results in a single numerical score. The metric favours a large number of papers which have garnered roughly equal numbers of citations, arising relatively quickly after a paper has been published. This can be contrasted with a writing career which has resulted in a small number of very highly cited pieces, or one in which there have been bursts of activity, punctuated with periods where productivity has been lower.

This might be seen with female academics who have taken maternity leave, for example, or early-career academics on precarious contracts, or colleagues who have not been able to avail themselves of the funding, resources or time required to consistently publish to the pattern demanded by the h-index. Members of marginalised groups within academia may not be brought into powerful networks of senior scholars, through discrimination or exclusion. Writers of languages other than English may not attract the same numbers of citations as English-language publications. National and institutional contexts where digital technology and library access is limited may also blunt the distribution of papers. It can be argued then, that although the h-index purports to demonstrate ‘impartial agnosticism’, it carries within it an ideology and values relating to what makes a ‘successful’ academic career, and what constitutes ‘impact’.

The imaginary is one of an implicitly privileged scholar working full-time in a position of high professional security and prestige, shielded from the pressures and blocks to success listed above. It is also arguably an imaginary based on the assumption of scientific publishing practices, as opposed to the ‘slow’ scholarship of humanities academics, who may be working as lone authors on books which may take several years to produce.

Turning to Prinsloo’s (2017) application of Aneesh (2006, 2009), we can consider whether the h-index is an instance in which ‘code appears to have … taken over the managerial function of supervision and guidance’ (Aneesh 2009: 355). The relationship between a scholar and the h-index is somewhat different to that of a student to a learning analytics platform, in that a scholar may choose whether to set up a Google Scholar profile or other means of deriving their score, although in some national or institutional contexts, individual bibliometric scores are strictly required. The h-index, unlike the learning analytics technology used within a university, does not represent a particular authority or form part of an assessment or monitoring process per se. However, it is commonly referred to in promotions and funding applications, and is compulsory in some contexts, and in that regard may be seen as part of assessment of performance in a broader sense, and so might be deemed to have a ‘supervisory’ function.

In terms of ‘guidance’, it is worth considering the normative effect that the h-index may have, insofar as it may shape authorial decision-making regarding the type of paper which will be written and when. It may also lead to a ‘gaming’ of the system or cronyism amongst associates, in order to boost one’s score. In that respect, the h-index may be implicated in ‘guidance’ which privileges performance, along the lines discussed above. It is therefore an algorithmic practice which is both political and normative, as Prinsloo proposes. Like learning analytics, the h-index ‘becomes part of organisational architecture and shapes/informs/enacts decision-making that in turn shapes and informs human lives’ (Prinsloo 2017: 143), structuring what comes to be seen as important, and what is made visible.

Returning to Kosciejew (2017), the h-index may also be seen as a documenting practice, in that the individual is rendered into what is effectively a report, which may be viewed on the screen. The embodied, intricate, extended and messy mature of academic writing, the data collected, the communities engaged with, the travel, the time spent, the reading, the emotions experienced, the interactions engaged with—all the complexity, mess and struggle of every academic paper or book published by that individual—is reduced not only to the published text, but that text is further reduced to a score. These are combined to produce a number which refers not to texts, but to the human author who has produced these. The scholar is then displayed online for all to see, rather like Briet’s (1951) antelope which was rendered into a document by quantification, analysis and display in the zoo.

Discussion and Conclusions

This speculative commentary paper has considered critical literature which has looked at datafication and the effects of algorithmic practices in education, also drawing on the concept of the sociotechnical imaginary (Jasanoff 2015) from science and technology studies, and Kosciejew’s (2017) concept of documentation taken from library and information sciences. These theoretical perspectives were used to consider the nature of two datafying technologies in higher education and academia: learning analytics and the h-index metric. I argue that despite important differences, both of these technologies share features discussed in the literature; they both act as vehicles and drivers of particularly ideological position regarding what constitutes ‘good’ or even ‘ideal’ academic performance.

They are both necessarily reductive, and in that act of reduction inevitably they ‘tidy up’ the extensive sociomaterial, embodied, political and complex realities of academic engagement and writing. In both cases, what can be observed and recorded is what comes to stand as a proxy for what took place, stripping out aspects of engagement which are private, unseen, relational, or ephemeral. The technologies have a normative force, not only recording practices but also normatively structuring them. The messy realities of lives and practices are also tidied up and packaged, rendering the individuals into documents which are then open to scrutiny.

It might be argued that any form of quantification of human activity is necessarily reductive, but the question then arises as to whether the reduction is necessary. Why do it? In terms of student engagement, although it offers some benefits, the overall case for learning analytics is far from conclusive, as recognised in the literature (e.g. Wilson et al. 2017; Joksimovic et al. 2019). It may be argued that the distorting effect of the process not only fails to recognise important aspects of the experience of study, but also does damage to student epistemological practices through normative distortion. The same might be argued for the h-index; in that, its operation as a proxy may lead to not only an impoverished imaginary of knowledge practices and meaning-making, but also to actual harm to the pursuit of knowledge through forces of normativity, standardization, and performativity. For these reasons, I contend that ‘discourses of inevitability’ surrounding the use of technologies of datafication, surveillance, and audit in higher education should be resisted, to keep in check these tendencies towards documentation which may ultimately undermine the richness, variety, complexity, and ephemerality of scholarship itself.