Regular article
Can Google Scholar and Mendeley help to assess the scholarly impacts of dissertations?

https://doi.org/10.1016/j.joi.2019.02.009Get rights and content

Highlights

  • Citations to doctoral dissertations can be semi-automatically extracted from the Google Scholar.

  • A fifth of the doctoral dissertations had at least one citation in Google Scholar.

  • There are more Google Scholar citations than Mendeley readers for older dissertations.

  • Mendeley reader counts are higher than Google Scholar citation counts for recently published dissertations.

Abstract

Dissertations can be the single most important scholarly outputs of junior researchers. Whilst sets of journal articles are often evaluated with the help of citation counts from the Web of Science or Scopus, these do not index dissertations and so their impact is hard to assess. In response, this article introduces a new multistage method to extract Google Scholar citation counts for large collections of dissertations from repositories indexed by Google. The method was used to extract Google Scholar citation counts for 77,884 American doctoral dissertations from 2013 to 2017 via ProQuest, with a precision of over 95%. Some ProQuest dissertations that were dual indexed with other repositories could not be retrieved with ProQuest-specific searches but could be found with Google Scholar searches of the other repositories. The Google Scholar citation counts were then compared with Mendeley reader counts, a known source of scholarly-like impact data. A fifth of the dissertations had at least one citation recorded in Google Scholar and slightly fewer had at least one Mendeley reader. Based on numerical comparisons, the Mendeley reader counts seem to be more useful for impact assessment purposes for dissertations that are less than two years old, whilst Google Scholar citations are more useful for older dissertations, especially in social sciences, arts and humanities. Google Scholar citation counts may reflect a more scholarly type of impact than that of Mendeley reader counts because dissertations attract a substantial minority of their citations from other dissertations. In summary, the new method now makes it possible for research funders, institutions and others to systematically evaluate the impact of dissertations, although additional Google Scholar queries for other online repositories are needed to ensure comprehensive coverage.

Introduction

Doctoral dissertations are important single-authored scholarly works written by early career researchers and form a significant minority of the scientific output of universities. For instance, according to the British Library EThOS service (https://ethos.bl.uk/), 1114 doctoral theses were awarded by the University of Cambridge in 2015, in comparison to 8174 Scopus-indexed journal articles. Doctoral theses usually include a comprehensive literature review, detailed original findings, a discussion, or another significant contribution to scholarship based upon three or more years of full-time equivalent research. They are assessed by independent examiners before being published, and so could be described as peer-reviewed. Dissertations may make substantial contributions to scholarship or professional practice in some fields, such as for supporting clinical practice (McLeod & Weisz, 2004). Citations to dissertations may also be important impact evidence for early career researchers since they may have had too little time to have journal articles published (ACUMEN Consortium, 2014). This would be particularly relevant for job applications but may also impact early promotion decisions or grant applications. Moreover, many universities, departments, research committees or funders of doctoral research may wish to monitor the success of their doctoral programs, including their scientific, social, economic, clinical or cultural benefits, requiring alternative metrics for the impact assessment of non-standard academic outputs (Kousha & Thelwall, 2015; Schöpfel & Prost, 2016).

Although peer review is the best method to assess the quality of theses within a doctoral programme, they may be too large to be read by recruiters, promotion committees or funding agencies. Hence, many researchers have examined the publication rates generated from doctoral dissertations as an indication of doctoral program success or productivity (e.g., Echeverria, Stuart, & Blanke, 2015; Lee, 2000; Stewart, Roberts, & Roy, 2007; Thomas & Reeve, 2006). The articles published from a dissertation don’t necessarily reflect all its impact, however (Morse, 2005). Citation analysis could be useful for dissertations in theory but they are absent from the traditional citation indexes, such as the Web of Science and Scopus. Alternatively, counting citations to publications resulting from dissertations may be used to estimate their scientific impact (Larivière, 2012). Nevertheless, many doctoral students do not produce publications from their dissertations even several years after graduation (e.g., Anwar, 2004; Caan & Cole, 2012; Evans, Amaro, Herbert, Blossom, & Roberts, 2018). For instance, there is large-scale evidence that most doctoral students in the arts and humanities (96%) and in social sciences (90%) had no Web of Science publications during 2000–2007 (Larivière, 2012). Moreover, it might be difficult to correctly track publications from dissertations several years after graduation and to assign authorship credit for multi-authored publications resulting from dissertations (Hagen, 2010). Given the lack of a comprehensive citation index, cited references searches in conventional citation indexes have also been used to assess citations to theses or dissertations by searching for related terms (e.g., *thesis* or *dissertation*) in the reference sections of other scholarly publications (Larivière, Zuccala, & Archambault, 2008; Rasuli, Schöpfel, & Prost, 2018). This method is useful to estimate the total number of citations to all dissertations rather than to individual doctoral dissertations across different fields, institutions or years. Moreover, using conventional citation databases for this purpose could be problematic in the social sciences and arts and humanities, where poorly-indexed books and publications in languages other than English could be important (Archambault, Vignola-Gagné, Côté, Lariviere, & Gingras, 2006; Huang & Chang, 2008).

Google Scholar is not primarily a citation index for dissertations, but automatically indexes dissertations from many institutional repositories, databases and commercial publisher websites and reports counts of citations to them based on its indexed publications. By November 2018, Google Scholar had indexed the metadata or text of 264,0001 UK doctoral dissertations from The British Library EThOS service, which is the UK’s national database of doctoral theses (https://ethos.bl.uk/) and 323,0002 French Ph.D. theses from theses.fr. Google Scholar has also indexed many dissertations from university or institutional repositories, making it possible to assess the citation impact of dissertations at institutional level. For example, the University of Glasgow (http://theses.gla.ac.uk/) and London School of Economics and Political Science (http://etheses.lse.ac.uk/) have repositories for postgraduate theses which have been mostly indexed by Google Scholar (est. 6,650 of 7,3723 and 3,560 of 3,6914 respectively). Moreover, 56% (1,991 of 3,520) of repositories indexed by the Directory of Open Access Repositories (OpenDOAR, www.opendoar.org/) had theses or dissertations subsets in different countries. Nevertheless, many dissertations might not be searchable by commercial search engines or digital libraries due to copyright, publication embargos, submission of dissertations in print format, restricted access, or non-searchable websites (Kettler, 2016). For instance, although DART-Europe E-theses Portal (http://www.dart-europe.eu/) claims to include “805,570 open access research theses from 617 Universities in 28 European countries”, Google Scholar has not directly indexed any from this portal.

ProQuest Dissertations & Theses is a digital library that indexes and provides full-text access to dissertations and theses. It claims to include “2 million full text dissertations” from more than “3000 schools”, mostly North American universities.5 In October 2017, ProQuest announced that the contents of “half a million dissertations” would be indexed by Google Scholar, linking users via their library ProQuest subscription. Unsubscribed users can also usually “access the first 24 pages at no charge” (Arbor, 2017). In November 2018, Google Scholar had indexed about 250,000 records from the ProQuest Dissertation & Theses.6

Although a few studies have manually analysed Google Scholar citations to small numbers of dissertations (see below), there have been no large-scale multidisciplinary citation assessments of doctoral dissertations. This is a regrettable omission, given the potential value of dissertations for early career researchers. The current study fills this gap by testing a method to systematically extract Google Scholar citations to 77,884 ProQuest-indexed doctoral dissertations from 2013 to 2017 across 18 fields. Mendeley reader counts for all doctoral dissertations were also collected for comparison with citation counts.

Section snippets

References in dissertations

Previous studies have investigated references in dissertations mainly to identity the cited publication types, the core serials cited or the citation practices of students in different subject areas (e.g., Barnett-Ellis & Tang, 2016; Haycock, 2013; Yeap & Kiran, 2017). These have all been small-scale studies, but suggest that few of the references in a dissertation are other dissertations and that there are disciplinary differences in this proportion.

An analysis of references from 49 management

Research questions

The research goal is to assess the value of Google Scholar for dissertation citation analysis based on its new partnership with ProQuest. In addition to the basic ability to find citations, the types of citations are important to interpret the nature of their impact and the types of people that read them, especially those that do not cite. The following questions drive address this issue.

  • 1

    Can citations to ProQuest doctoral dissertations be semi-automatically extracted from Google Scholar with a

Methods

This section introduces a practical method to systematically identify Google Scholar citations to dissertations for large scale evaluations from the ProQuest Dissertation & Theses database. Because Google Scholar does not support automatic API searches, a new method is introduced using the Publish or Perish software (Harzing & van der Wal, 2008) with pre-defined queries to limit searches to ProQuest dissertations in a manageable manner. All data collection was conducted during November 2018.

Results

As the methods section shows, large numbers of doctoral dissertations and their citation counts can be extracted semi-automatically from Google Scholar using curated queries and with the aid of Publish or Perish. The above method only located 30% (77,884 of 264,149) of the American doctoral dissertations indexed in ProQuest Dissertations & Theses for 2013–2017, however. It is not clear whether this is because Google Scholar has not yet or will never index all of ProQuest or because the above

Discussion

As discussed above, the results are from 30% of all American doctoral dissertations indexed in ProQuest Dissertations & Theses from 2013 to 2017 investigated in this study (see discussion below). If this subset is biased in some way such as ProQuest indexing a biased subset of all US dissertations or providing a subset of dissertations to be indexed by Google Scholar (Arbor, 2017) then this will affect the graphs above. Moreover, the results for other countries may well be different and there

Conclusion

The method introduced by this paper is the first attempt to investigate the citation impact of dissertations on a large-scale through Google Scholar, making it more practical for universities, departments, employers or funders of doctoral research to assess the scholarly influence of doctoral research or junior researchers. This will not be useful for most individual researchers since most dissertations are uncited, but can help large scale evaluations of doctoral programmes or doctoral funding

Author contributions

Kayvan Kousha: Conceived and designed the analysis, Collected the data, Contributed data or analysis tools, Performed the analysis, Wrote the paper, Other contribution.

Mike Thewall: Contributed data or analysis tools (Mendeley reader), Wrote the paper (Introduction; discussion), Other contribution (Discussing and interpreting the results).

References (67)

  • É Archambault et al.

    Benchmarking scientific output in the social sciences and humanities: The limits of existing databases

    Scientometrics

    (2006)
  • S. Bangani

    The impact of electronic theses and dissertations: A study of the institutional repository of a university in South Africa

    Scientometrics

    (2018)
  • P. Barnett-Ellis et al.

    User-centered collection development: A citation analysis of graduate biology theses

    Collection Management

    (2016)
  • L. Bennett et al.

    Measuring the impact of digitized theses: A case study from the London School of Economics

    Insights: the UKSG Journal

    (2016)
  • M. Boeker et al.

    Google Scholar as replacement for systematic literature searches: Good relative recall and precision are not enough

    BMC Medical Research Methodology

    (2013)
  • W. Caan et al.

    How much doctoral research on clinical topics is published

    Evidence Based Medicine

    (2012)
  • M. Coates

    Using Google Analytics to explore ETDs use

    Paper presented at the proceedings of the ACM/IEEE joint conference on digital libraries

    (2013)
  • R. Costas et al.

    Do “altmetrics” correlate with citations? Extensive comparison of altmetric indicators with citations from a multidisciplinary perspective

    Journal of the Association for Information Science and Technology

    (2015)
  • M. Echeverria et al.

    Medical theses and derivative articles: Dissemination of contents and publication patterns

    Scientometrics

    (2015)
  • S.C. Evans et al.

    “Are you gonna publish that?" peer-reviewed publication outcomes of doctoral dissertations in psychology

    PloS One

    (2018)
  • R. Fairclough et al.

    The influence of time and discipline on the magnitude of correlations between citation counts and quality scores

    Journal of Informetrics

    (2015)
  • T. Ferreras-Fernández et al.

    Open access repositories as channel of publication scientific grey literature

    Paper presented at the ACM international conference proceeding series

    (2015)
  • A. Gohain et al.
    (2014)
  • N. Hagen

    Deconstructing doctoral dissertations: How many papers does it take to make a PhD?

    Scientometrics

    (2010)
  • A.W. Harzing

    Publish or Perish

    (2007)
  • A. Harzing et al.

    Google Scholar as a new source for citation analysis

    Ethics in Science and Environmental Politics

    (2008)
  • S. Haustein et al.

    Mendeley as a source of readership by students and postdocs? Evaluating article usage by academic status

    (2014)
  • S. Haustein et al.

    Tweets vs. Mendeley readers: How do these two social media metrics differ?

    IT-Information Technology

    (2014)
  • L.A. Haycock

    Citation analysis of education dissertations for collection development

    Library Resources & Technical Services

    (2013)
  • V. Henning et al.

    Mendeley-a last. fm for research?

    IEEE fourth international conference on eScience (eScience’08)

    (2008)
  • M.H. Huang et al.

    Characteristics of research output in social sciences and humanities: From a research evaluation perspective

    Journal of the American Society for Information Science and Technology

    (2008)
  • M. Kettler

    Ways of disseminating, tracking usage and impact of electronic theses and dissertations (ETDs)

    9th Conference on grey literature and repositories

    (2016)
  • K. Kousha et al.

    Sources of Google Scholar citations outside the Science Citation Index: A comparison between four science disciplines

    Scientometrics

    (2008)
  • View full text