1 Introduction

Today’s scholarly communication is a document-centred process and as such, rather inefficient. Scientists spend considerable time in finding, reading, and reproducing research results from PDF files consisting of static text, tables, and figures. The explosion in the number of published articles [14] aggravates this situation further: It gets harder and harder to stay on top of current research, that is to find relevant works, compare and reproduce them, and later on, to make one’s own contribution known for its quality.

Some of the available infrastructures in the research ecosystem already use knowledge graphs (KG)Footnote 1 to enhance their services. Academic search engines, for instance, such as Microsoft Academic Knowledge Graph [38] or Literature Graph [1] utilise metadata-based graph structures which link research articles based on citations, shared authors, venues, and keywords.

Recently, initiatives have promoted the usage of KGs in science communication, but on a deeper, semantic level [3, 50, 56, 73, 78, 84, 115]. They envision the transformation of the dominant document-centred knowledge exchange to knowledge-based information flows by representing and expressing knowledge through semantically rich, interlinked KGs. Indeed, they argue that a shared structured representation of scientific knowledge has the potential to alleviate some of the science communication’s current issues: Relevant research could be easier to find, comparison tables automatically compiled, own insights rapidly placed in the current ecosystem. Such a powerful data structure could, more than the current document-based system, also encourage the interconnection of research artefacts such as datasets and source code much more than current approaches (like Digital Object Identifier (DOI) references etc.), allowing for easier reproducibility and comparison. To come closer to the vision of knowledge-based information flows, research articles should be enriched and interconnected through machine-interpretable semantic content. The usage of Papers With Code [80] in the machine learning community and Jaradeh et al.’s study [56] indicate that authors are also willing to contribute structured descriptions of their research articles.

The work of a researcher is manifold, but current proposals usually focus on a specific use case (e.g. the aforementioned examples focus on enhancing academic search). In this paper, we present a detailed analysis of common literature-related tasks in a scientist’s daily life and analyse (a) how they could be supported by an ORKG, (b) what requirements result for the design of (b1) the KG and (b2) the surrounding system, (c) how different use cases overlap in their requirements and can benefit from each other. Our analysis is led by the following research questions:

  1. 1.

    Which use cases should be supported by an ORKG?

    1. (a)

      Which user interfaces are necessary?

    2. (b)

      Which machine interfaces are necessary?

  2. 2.

    What requirements can be defined for the underlying ontologies to support these use cases?

    1. 1.

      Which granularity of information is needed?

    2. 2.

      To what degree is domain specialisation needed?

  3. 3.

    What requirements can be defined for the instance data in context of the respective use cases?

    1. 1.

      Which completeness is sufficient for the instance data?

    2. 2.

      Which correctness is sufficient for the instance data?

    3. 3.

      Which approaches (human vs. machine) are suitable to populate the ORKG?

Our analysis concentrates on eliciting use cases, defining quality requirements for the underlying KG to support these use cases, and elaborating construction strategies for the KG. We follow the design science research (DSR) methodology [51]. In this study, we focus on the first phase of DSR and conduct a requirements analysis. The objective is to chart necessary (and desirable) requirements for successful KG-based science communication, and, consequently, provide a map for future research.

Compared to our paper at the 24th International Conference on Theory and Practice of Digital Libraries 2020 [16], this journal paper has been modified and extended as follows: The related work section is updated and extended with the new sections Quality of knowledge graphs and Systematic literature reviews. The new “Appendix 1” section contains comparative overviews of datasets for research knowledge graph population tasks such as sentence classification, relation extraction, and concept extraction. These comparisons are intended to give a sense of what kind of information can be automatically extracted from scientific texts with what accuracy using current state-of-the-art methods. This is important to suggest appropriate construction strategies (i.e. manual, semi-automatic, automatic) for the respective use cases based on their data quality requirements. To be consistent with the terminology in related work, we use the term “completeness” instead of “coverage” and “correctness” instead of “quality”. The requirements analysis in Sect. 3 is revised and contains more details with more justifications for the posed requirements and approaches.

The remainder of the paper is organised as follows. Section 2 summarises related work on research KGs, scientific ontologies, KG construction, data quality requirements, and systematic literature reviews. The requirements analysis is presented in Sect. 3, while Sect. 4 discusses implications and possible approaches for ORKG construction. Finally, Sect. 5 concludes the requirements analysis and outlines areas of future work. “Appendix 1” section contains comparative overviews for the tasks of sentence classification, relation extraction, and concept extraction.

2 Related work

This section gives a brief overview of (a) existing research KGs, (b) ontologies for scholarly knowledge, (c) approaches for KG construction, (d) quality dimensions of KGs, and (e) processes in systematic literature reviews.

2.1 Research knowledge graphs

Academic search engines (e.g. Google Scholar, Microsoft Academic, Semantic Scholar) exploit graph structures such as the Microsoft Academic Knowledge Graph [38], SciGraph [113], the Literature Graph [1], or the Semantic Scholar Open Research Corpus (S2ORC) [70]. These graphs interlink research articles through metadata, e.g. citations, authors, affiliations, grants, journals, or keywords.

To help reproduce research results, initiatives such as Research Graph [2], Research Objects [7], and OpenAIRE [73] interlink research articles with research artefacts such as datasets, source code, software, and video presentations. Scholarly Link Exchange (Scholix) [20] aims to create a standardised ecosystem to collect and exchange links between research artefacts and literature.

Some approaches connect articles at a more semantic level: Papers With Code [80] is a community-driven effort to supplement machine learning articles with tasks, source code, and evaluation results to construct leaderboards. Ammar et al. [1] link entity mentions in abstracts with DBpedia [66] and Unified Medical Language System (UMLS) [11], and Cohan et al. [23] extend the citation graph with citation intents (e.g. citation as background or used method).

Various scholarly applications benefit from semantic content representation, e.g. academic search engines by exploiting general-purpose KGs [112], and graph-based research paper recommendation systems [8] that utilise citation graphs and mentioned entities. However, the coverage of science-specific concepts in general-purpose KGs is rather low [1], e.g. the task “geolocation estimation of photos” from Computer Vision is neither present in Wikipedia nor in the Computer Science Ontology (CSO) [95].

2.2 Scientific ontologies

Various ontologies have been proposed to model metadata such as bibliographic resources and citations [83]. Iniesta and Corcho [93] reviewed ontologies to describe scholarly articles. In the following, we describe some ontologies that conceptualise the semantic content in research articles.

Several ontologies focus on rhetorical [27, 49, 109] (e.g. Background, Methods, Results, Conclusion), argumentative [69, 105] (e.g. claims, contrastive, and comparative statements about other work) or activity-based structure [84] (e.g. sequence of research activities) of research articles. Others describe scholarly knowledge with linked entities such as problem, method, theory, statement [19, 50], or focus on the main research findings and characteristics of research articles described in surveys with concepts such as problems, approaches, implementations, and evaluations [40, 106].

Various domain-specific ontologies exist, for instance, mathematics [65] (e.g. definitions, assertions, proofs), machine learning [62, 74] (e.g. dataset, metric, model, experiment), and physics  [96] (e.g. formation, model, observation). The EXPeriments Ontology (EXPO) is a core ontology for scientific experiments that conceptualise experimental design, methodology, and results [99], while the Scientific Observation Model (CRMsci) is an ontology of metadata about scientific observations, processed data, and measurements in descriptive and empirical sciences (e.g. biodiversity, geology, geography, archaeology) [34]. Various repositories provide access to several ontologies such as Open Biological and Biomedical Ontologies (OBO) Foundry [98] for the domain of life sciences or Linked Open Vocabularies (LOV) [107] for web data.

Taxonomies for domain-specific research areas support the characterisation and exploration of a research field. Salatino et al. [95] give an overview, e.g. Medical Subject Heading (MeSH), Physics Subject Headings (PhySH), Computer Science Ontology (CSO). Gene Ontology [26] and Chemical Entities of Biological Interest (CheBi) [30] are KGs for genes and molecular entities.

2.3 Construction of knowledge graphs

Nickel et al. [77] classify KG construction methods into four groups: (1) curated approaches, i.e. triples created manually by a closed group of experts, (2) collaborative approaches, i.e. triples created manually by an open group of volunteers, (3) automated semi-structured approaches, i.e. triples extracted automatically from semi-structured text via hand-crafted rules, and (4) automated unstructured approaches, i.e. triples are extracted automatically from unstructured text.

2.3.1 Manual approaches

WikiData [108] is one of the most popular KGs with semantically structured, encyclopaedic knowledge curated manually by a community. As of January 2021, WikiData comprises 92M entities curated by almost 27.000 active contributors. The community also maintains a taxonomy of categories and “infoboxes” which define common properties of certain entity types. Furthermore, Papers With Code [80] is a community-driven effort to interlink machine learning articles with tasks, source code, and evaluation results. KGs such as Gene Ontology [26] or Wordnet [41] are curated by domain experts. Research article submission portals such as EasyChair (https://wwww.easychair.org/) enforce the authors to provide machine-readable metadata. Librarians and publishers tag new articles with keywords and subjects [113]. Virtual research environments enable the execution of data analysis on interoperable infrastructure and store the data and results in KGs [101].

2.3.2 Automated approaches

Automatic KG construction from text: Petasis et al. [85] present a review on ontology learning, that is ontology creation from text, while Lubani et al. [72] review ontology population systems. Pajura and Singh [88] give an overview of the involved tasks for KG population: (a) information extraction to extract a graph from text with entity extraction and relation extraction, and (b) graph construction to clean and complete the extracted graph, as it is usually ambiguous, incomplete and inconsistent. Coreference resolution [17, 71] clusters different mentions of the same entity in text and entity linking [63] maps mentions in text to entities in the KG. Entity resolution [104] identifies objects in the KG that refer to the same underlying entity. For taxonomy population, Salatino et al. [95] provide an overview of methods based on rule-based natural language processing (NLP), clustering and statistical methods.

The Computer Science Ontology (CSO) has been automatically populated from research articles [95]. The AI-KG was automatically generated from 333,000 research papers in the artificial intelligence (AI) domain [32]. It contains five entity types (tasks, methods, metrics, materials, others) linked by 27 relations types. Kannan et al. [58] create a multimodal KG for deep learning papers from text and images and the corresponding source code. Brack et al. [17] generate a KG for 10 different science domains with the concept types material, method, process, and data. Zhang et al. [115] suggest a rule-based approach to mine research problems and proposed solutions from research papers.

Information extraction from scientific text: Information extraction is the first step in the automatic KG population pipeline. Nasar et al. [75] survey methods on information extraction from scientific text. Beltagy et al. [9] present benchmarks for several scientific datasets and Peng et al. [82] especially for the biomedical domain. “Appendix 1” section presents comparative overviews of datasets for the tasks sentence classification, relation extraction, and concept extraction, respectively, in research papers.

There are datasets which are annotated at sentence level for several domains, e.g. biomedical [31, 60], computer graphics [43], computer science [24], chemistry and computational linguistics [105], or algorithmic metadata [94]. They cover either only abstracts [24, 31, 60] or full articles [43, 69, 94, 105]. The datasets differentiate between five and twelve concept classes (e.g. Background, Objective, Results). Machine learning approaches for datasets consisting of abstracts achieve an F1 score ranging from 66 to 92% and for datasets with full papers F1 scores ranging from 51 to 78% (see Table 2).

More recent corpora, annotated at phrasal level, aim at constructing a fine-grained KG from scholarly abstracts with the tasks of concept extraction [4, 15, 44, 71, 89], binary relation extraction [4, 45, 71], n-ary relation extraction [55, 57, 59], and coreference resolution [17, 25, 71]. They cover several domains, e.g. material sciences [44]; computational linguistics [45, 89]; computer science, material sciences, and physics [4]; machine learning [71]; biomedicine [25, 57, 64]; or a set of ten scientific, technical and medical domains [15, 17, 37]. The datasets differentiate between four to seven concept classes (like task, method, tool) and between two to seven binary relation types (like used-for, part-of, evaluate-for). The extraction of n-ary relations involves extraction of relations among multiple concepts such as drug-gene-mutation interactions in medicine [57], experiments related to solid oxide fuel cells with involved material and measurement conditions in material sciences [44], or task-dataset-metric-score tuples for leaderboard construction for machine learning tasks [59].

Approaches for concept extraction achieve F1 scores ranging from 56.6 to 96.9% (see Table 4), for coreference resolution F1 scores range from 46.0 to 61.4% [17, 25, 71], and for binary relation extraction from 28.0 to 83.6% (see Table 3). The task of n-ary relation extraction with an F1 score from 28.7 to 56.4% [57, 59] is especially challenging, since such relationships usually span beyond sentences or even sections and thus, machine learning models require an understanding of the whole document. The inter-coder agreement for the task of concept extraction ranges from 0.6 to 0.96 (Table 4), for relation extraction from 0.6 to 0.9 (see also Table 3), while for coreference resolution the value of 0.68 was reported in two different studies [17, 71]. The results suggest that these tasks are not only difficult for machines but also for humans in most cases.

Fig. 1
figure 1

Activities within a systematic literature review

2.4 Quality of knowledge graphs

KGs may contain billions of machine-readable facts about the world or a certain domain. However, do the KGs have also an appropriate quality? Data quality (DQ) is defined as fitness for use by a data consumer [110]. Thus, to evaluate data quality, it is important to know the needs of the data consumer since, in the end, the consumer judges whether or not a product is fit for use. Wang et al. [110] propose a data quality evaluation framework for information systems consisting of 15 dimensions grouped into four categories, i.e.:

  1. 1.

    Intrinsic DQ: accuracy, objectivity, believability, and reputation.

  2. 2.

    Contextual DQ: value-added, relevancy, timeliness, completeness, and an appropriate amount of data.

  3. 3.

    Representational DQ: interpretability, ease of understanding, representational consistency, and concise representation.

  4. 4.

    Accessibility DQ: accessibility and access security.

Bizer [10] and Zaveri [114] propose further dimensions for the Linked Data context like consistency, verifiability, offensiveness, licensing, and interlinking. Pipino et al. [87] subdivide completeness into schema completeness, i.e. the extent to which classes and relations are missing in the ontology to support a certain use, column completeness (also known as Partial Closed World Assumption [47]), i.e. the extent to which facts are not missing, and population completeness, i.e. the extent to which instances for a certain class are missing. Färber et al. [39] comprehensively evaluate and compare the data quality of popular KGs (e.g. DBpedia, Freebase, WikiData, YAGO) using such dimensions.

To evaluate the correctness of instance data (also known as precision), the facts in the KG have to be compared against a ground truth. For that, humans annotate a set of facts as true or false. YAGO found to be 95% correct [103]. The automatically populated AI-KG has a precision of 79% [32] . The KG automatically populated by the Never-Ending Language Learner (NELL) has a precision of 74% [21].

To evaluate the completeness of instance data (also known as coverage and recall), small collections of ground-truth capturing all knowledge for a certain ontology is necessary, that are usually difficult to obtain [111]. However, some studies estimate the completeness of several KGs. Galarrage et al. [46] suggest a rule mining approach to predict missing facts. In Freebase [12] 71% of people have an unknown place of birth, and 75% have an unknown nationality [36]. Suchanek et al. [102] report that 69%-99% of instances in popular KGs (e.g. YAGO, DBPedia) do not have at least one property that other instances of the same class have. The AI-KG has a recall of 81.2% [32].

2.5 Systematic literature reviews

Literature reviews are one of the main tasks of researchers, since a clear identification of a contribution to the present scholarly knowledge is a crucial step in scientific work [51]. This requires a comprehensive elaboration of the present scholarly knowledge for a certain research question. Furthermore, systematic literature reviews help to identify research gaps and to position new research activities [61].

A literature review can be conducted systematically or in a non-systematic, narrative way. Following Fink’s [42] definition, a systematic literature review is “a systematic, explicit, comprehensive, and reproducible method identifying, evaluating, and synthesising the existing body of completed and recorded work”. Guidelines for systematic literature reviews have been suggested for several scientific disciplines, e.g. for software engineering [61], for information systems [79] and for health sciences [42]. A systematic literature review consists typically of the activities depicted in Fig. 1 subdivided into the phases plan, conduct, and report. The activities may differ in detail for the specific scientific domains [42, 61, 79]. In particular, a data extraction form defines which data has to be extracted from the reviewed papers. Data extraction requirements vary from review to review so that the form is tailored to the specific research questions investigated in the review.

Fig. 2
figure 2

UML use case diagram for the main use cases between a researcher, an Open Research Knowledge Graph (ORKG), and external systems

3 Requirements analysis

As the discussion of related work reveals, existing knowledge graphs for research information focus on specific use cases (e.g. improve search engines, help to reproduce research results) and mainly manage metadata and research artefacts about articles. We envision a KG in which research articles are linked through a deep semantic representation of their content to enable further use cases. In the following, we formulate the problem statement and describe our research method. This motivates our use case analysis in Sect. 3.1, from which we derive requirements for an ORKG.

Problem statement: Scholarly knowledge is very heterogeneous and diverse. Therefore, an ontology that conceptualises scholarly knowledge comprehensively does not exist. Besides, due to the complexity of the task, the population of comprehensive ontologies requires domain and ontology experts. Current automatic approaches can only populate rather simple ontologies and achieve moderate accuracy (see Sect. 2.3 and “Appendix 1)” section. On the one hand, we desire an ontology that can comprehensively capture scholarly knowledge, and instance data with high correctness and completeness. On the other hand, we are faced with a “knowledge acquisition bottleneck”.

Research method: To illuminate the problem statement, we perform a requirements analysis. We follow the design science research (DSR) methodology [18, 53]. The requirements analysis is a central phase in DSR, as it is the basis for design decisions and selection of methods to construct effective solutions systematically [18]. The objective of DSR in general is the innovative, rigorous, and relevant design of information systems for solving important business problems, or the improvement of existing solutions [18, 51].

To elicit requirements, we studied guidelines for (a) systematic literature reviews (see Sect. 2.5), (b) data quality requirements for information systems (see Sect. 2.4), and (c) interviewed members of the ORKG and Visual Analytics team at TIB,Footnote 2 who are software engineers and researchers in the field of computer science and environmental sciences. Based on the requirements, we elaborate possible approaches to construct an ORKG, which were identified through a literature review (see Sect. 2.3). To verify our assumptions on the presented requirements and approaches, ORKG and Visual Analytics team members reviewed them in an iterative refinement process.

3.1 Overview of the use cases

We define functional requirements with use cases which are a popular technique in software engineering [13]. A use case describes the interaction between a user and the system from the user’s perspective to achieve a certain goal. Furthermore, a use case introduces a motivating scenario to guide the design of a supporting ontology and the use case analysis helps to figure out which kind of information is necessary [29].

There are many use cases (e.g. literature reviews, plagiarism detection, peer reviewer suggestion) and several stakeholders (e.g. researchers, librarians, peer reviewers, practitioners) that may benefit from an ORKG. Ngyuen et al. [76] discuss some research-related tasks of scientists for information foraging at a broader level. In this study, we focus on use cases that support researchers (a) conducting literature reviews (see also Sect. 2.5), (b) obtaining a deep understanding of a research article and (c) reproducing research results. A full discussion of all possible use cases of graph-based knowledge management systems in the research environment is far beyond the scope of this article. With the chosen focus, we hope to cover the most frequent, literature-oriented tasks of scientists.

Figure 2 depicts the main identified use cases, which are described briefly in the following. Please note that we focus on how semantic content can improve these use cases and not further metadata.

Get research field overview: Survey articles provide an overview of a particular research field, e.g. a certain research problem or a family of approaches. The results in such surveys are sometimes summarised in structured and comparative tables (an approach usually followed in domains such as computer science, but not as systematically practised in other fields). However, once survey articles are published they are no longer updated. Moreover, they usually represent only the perspective of the authors, i.e. very few researchers of the field. To support researchers to obtain an up-to-date overview of a research field, the system should maintain such surveys in a structured way, and allow for dynamics and evolution. A researcher interested in such an overview should be able to search or to browse the desired research field in a user interface for ORKG access. Then, the system should retrieve related articles and available overviews, e.g. in a table or a leaderboard chart.

While an ORKG user interface should allow for showing tabular leaderboards or other visual representations, the backend should semantically represent information to allow for the exploitation of overlaps in conceptualisations between research problems or fields. Furthermore, faceted drill-down methods based on the properties of semantic descriptions of research approaches could empower researchers to quickly filter and zoom into the most relevant literature.

Find related work: Finding relevant research articles is a daily core activity of researchers. The primary goal of this use case is to find research articles which are relevant to a certain research question. A broad research question is often broken down into smaller, more specific sub-questions which are then converted to search queries [42]. For instance, in this paper, we explored the following sub-questions: (a) Which ontologies do exist to represent scholarly knowledge? (b) Which scientific knowledge graphs do exist and which information do they contain? (c) Which datasets do exist for scientific information extraction? (d) What are current state-of-the-art methods for scientific information extraction? (e) Which approaches do exist to construct a knowledge graph?

An ORKG should support the answering of queries related to such questions, which can be fine-grained or broad search intents. Preferably, the system should support natural language queries as approached by semantic search and question answering engines [6]. The system has to return a set of relevant articles.

Assess relevance: Given a set of relevant articles the researcher has to assess whether the articles match the criteria of interest. Usually researchers skim through the title and abstract. Often, also the introduction and conclusions have to be considered, which is cumbersome and time-consuming. If only the most important paragraphs in the article are presented to the researcher in a structured way, this process can be boosted. Such information snippets might include, for instance, text passages that describe the problem tackled in the research work, the main contributions, the employed methods or materials, or the yielded results.

Fig. 3
figure 3

An example research questions with a corresponding data extraction form, and the extracted text passages from relevant research articles for the respective (data extraction form) fields presented in a tabular form

Extract relevant information: To tackle a particular research question, the researcher has to extract relevant information from research articles. In a systematic literature review, the information to be extracted can be defined through a data extraction form (see Sect. 2.5). Such extracted information is usually compiled in written text or comparison tables in a related work section or survey articles. For instance, for the question “Which datasets do exist for scientific sentence classification?” a researcher who focuses on a new annotation study could be interested in (a) domains covered by the dataset and (b) the inter-coder agreement (see Table 2 as an example). Another researcher might follow the same question but focusing on machine learning and thus could be more interested in (c) evaluation results and (d) feature types used.

The system should support the researcher with tailored information extraction from a set of research articles: (1) The researcher defines a data extraction form as proposed in systematic literature reviews (e.g. the fields (a)–(d)), and (2) the system presents the extracted information as suggestions for the corresponding data extraction form and articles in a comparative table. Figure 3 illustrates a data extraction form with corresponding fields in form of questions, and a possible approach to visualise the extracted text passages from the articles for the respective fields in a tabular form.

Get recommended articles: When the researcher focuses on a particular article, further related articles could be recommended by the system utilising an ORKG, for instance, articles that address the same research problem or apply similar methods.

Obtain deep understanding: The system should help the researcher to obtain a deep understanding of a research article (e.g. equations, algorithms, diagrams, datasets). For this purpose, the system should connect the article with artefacts such as conference videos, presentations, source code, datasets, etc., and visualise the artefacts appropriately. Also text passages can be linked, e.g. explanations of methods in Wikipedia, source code snippets of an algorithm implementation, or equations described in the article.

Reproduce results: The system should offer researchers links to all necessary artefacts to help to reproduce research results, e.g. datasets, source code, virtual research environments, materials describing the study, etc. Furthermore, the system should maintain semantic descriptions of domain-specific and standardised evaluation protocols and guidelines such as in machine learning reproducibility checklists [86] and bioassays in the medical domain.

3.2 Knowledge graph requirements

As outlined in Sect. 2.4, data quality requirements should be considered within the context of a particular use case (“fitness for use”). In this section, we first describe dimensions we used to define non-functional requirements for an ORKG. Then, we discuss these requirements within the context of our identified use cases.

3.2.1 Dimensions for KG requirements

In the following, we describe the dimensions that we use to define the requirements for ontology design and instance data. We selected these dimensions since we assume that they are most relevant and also challenging to construct an ORKG with appropriate data to support the various use cases.

For ontology design, i.e. how comprehensively should an ontology conceptualise scholarly knowledge to support a certain use case, we use the following dimensions:

  1. (A)

    Domain specialisation of the ontology: How domain-specific should the concepts and relation types be in the ontology? An ontology with high domain specialisation targets a specific (sub-)domain and uses domain-specific terms. An ontology with low domain specialisation targets a broad range of domains and uses rather domain-independent terms. For instance, various ontologies (e.g.  [15, 84]) propose domain-independent concepts (e.g. process, method, material). In contrast, Klampanos et al. [62] present a very domain-specific ontology for artificial neural networks.

  2. (B)

    Granularity of the ontology: Which granularity of the ontology is required to conceptualise scholarly knowledge? An ontology with high granularity conceptualises scholarly knowledge with a lot of classes that have very detailed and a lot of fine-grained properties and relations. An ontology with a low granularity has only a few classes and relation types. For instance, the annotation schemes for scientific corpora (see Sect. 2.3) have a rather low granularity, as they do not have more than 10 classes and 10 relation types. In contrast, various ontologies (e.g. [50, 84]) with more than 20 to 35 classes and over 20 to 70 relations and properties are fine-grained and have a relatively high granularity.

Although there is usually a correlation between domain specialisation and granularity of the ontology (e.g. an ontology with high domain specialisation has also a high granularity), there exist also rather domain-independent ontologies with a high granularity, e.g. scholarly ontology [84]), and ontologies with high domain specialisation and low granularity, e.g. the PICO criterion in Evidence-Based Medicine [60, 92]) which stands for population (P), intervention (I), comparison (C), and outcome (O). Thus, we use both dimensions independently. Furthermore, a high domain specialisation requirement for a use case implies that each sub-domain requires a separate ontology for the specific use case. These domain-specific ontologies can be organised in a taxonomy.

Table 1 Requirements and approaches for the main use cases

For the instance data, we use the following dimensions:

  1. (C)

    Completeness of the instance data: Given an ontology, to which extent do all possible instances (i.e. instances for classes and facts for relation types) in all research articles have to be represented in the KG? Low completeness: it is tolerable for the use case when a considerable amount of instance data is missing for the respective ontology. High completeness: it is mandatory for the use case that for the respective ontology, a considerable amount of instances are present in the instance data. For instance, given an ontology with a class “Task” and a relation type “subTaskOf” to describe a taxonomy of tasks, the instance data for that ontology would be complete if all tasks mentioned in all research articles are present (population completeness) and “subTaskOf” facts between the tasks are not missing (column completeness).

  2. (D)

    Correctness of the instance data: Given an ontology, which correctness is necessary for the corresponding instances? Low correctness: it is tolerable for the use case, that some instances (e.g. 30%) are not correct. High correctness: it is mandatory for the use case, that instance data must not be wrong i.e. all present instances in the KG must conform to the ontology and reflect the content of the research articles properly. For instance, an article is correctly assigned to the task addressed in the article, the F1 score in the evaluation results are correctly extracted, etc.

It should be noted that completeness and correctness of instance data can be evaluated only for a given ontology. For instance, let A be an ontology having the class “Deep Learning Model” without properties, and let B be an ontology that also has a class “Deep Learning Model” and additionally further relation types describing the properties of the deep learning model (e.g. drop-out, loss functions, etc.). In this example, the instance data of ontology A would be considered to have high completeness, if it covers most of the important deep learning models. However, for ontology B, the completeness of the same instance data would be rather low since the properties of the deep learning models are missing. The same holds for correctness: If ontology B has, for instance, a sub-type “Convolutional Neural Network”, then the instance data would have a rather low correctness for ontology B if all “Deep Learning Model” instances are typed only with the generic class “Deep Learning Model”.

3.2.2 Discussion of the KG requirements

Next, we discuss the seven main use cases with regard to the required level of ontology domain specialisation and granularity, as well as completeness and correctness of instance data. Table 1 summarises the requirements for the use cases along the four dimensions at ordinal scale. The use cases are grouped together, when they have (1) similar justifications for the requirements and (2) a high overlap in ontology concepts and instances.

Extract relevant information & get research field overview: The information to be extracted from relevant research articles for a data extraction form within a literature review is very heterogeneous and depends highly on the intent of the researcher and the research questions. Thus, the ontology has to be domain-specific and fine-grained to offer all possible kinds of desirable information. However, missing information for certain questions in the KG may be tolerable for a researcher. Furthermore, it is tolerable for a researcher if some of the extracted suggestions are wrong since the researcher can correct them.

Research field overviews are usually the result of a literature review. The data in such an overview has also to be very domain-specific and fine-grained. Also, this information must have high correctness, e.g. an F1 score of an evaluation result must not be wrong. Furthermore, an overview of a particular research field should have appropriate completeness and must not miss any relevant research papers. However, it is acceptable when overviews for some research fields are missing.

Obtain deep understanding & reproduce results: The information required for these use cases has to achieve a high level of correctness (e.g. accurate links to dataset, source code, videos, articles, research infrastructures). An ontology for the representation of default artefacts can be rather domain-independent (e.g. Scholix [20]). However, semantic representation of evaluation protocols requires domain-dependent ontologies (e.g. EXPO [99]). Missing information is tolerable for these use cases.

Find related work & get recommended articles: When searching for related work, it is essential not to miss relevant articles. Previous studies revealed that more than half of search queries in academic search engines refer to scientific entities [112]. However, the coverage of scientific entities in general-purpose KGs (e.g. WikiData) is rather low, since the introduction of new concepts in research literature occurs at a faster pace than KG curation [1]. Despite the low completeness, Xiong et al.  [112] could improve the ranking of search results in academic search engines by exploiting general-purpose KGs. Hence, the instance data for the “find related work” use case should have high completeness with fine-grained scientific entities. However, semantic search engines leverage latent representations of KGs and text (e.g. graph and word embeddings) [6]. Since a non-perfect ranking of the search results is tolerable for a researcher, lower correctness of the instance data could be acceptable. Furthermore, due to latent feature representations, the ontology can be kept rather simple and domain-independent. For instance, the STM corpus [15] introduces four domain-independent concepts.

Graph- and content-based research paper recommendation systems [8] have similar requirements since they also leverage latent feature representations and require fine-grained scientific entities. Also, non-perfect recommendations are tolerable for a researcher.

Assess relevance: To help the researcher to assess the relevance of an article according to her needs, the system should highlight the most essential zones in the article to get a quick overview. The completeness and correctness of the presented information must not be too low, as otherwise the user acceptance may suffer. However, it can be suboptimal, since it is acceptable for a researcher when some of the highlighted information is not essential or when some important information is missing. The ontology to represent essential information should be rather domain-specific (i.e. using terms that the researchers understand) and quite simple (cf. ontologies for scientific sentence classification in Sect. 2.3.2).

4 ORKG construction strategies

In this section, we discuss the implications for the design and construction of an ORKG and outline possible approaches, which are mapped to the use cases in Table 1. Based on the discussion in the previous section, we can subdivide the use cases into two groups: (1) requiring high correctness and high domain specialisation with rather low requirements on the completeness (left side in Table 1) and (2) requiring high completeness with rather low requirements on the correctness and domain specialisation (right side in Table 1). The first group requires manual approaches, while the second group could be accomplished with fully automatic approaches. To ensure trustworthiness, data records should contain provenance information, i.e. who or what system curated the data.

Manually curated data can also support use cases with automatic approaches, and vice versa. Furthermore, automatic approaches can complement manual approaches by providing suggestions in user interfaces. Such synergy between humans and algorithms may lead to a “data flywheel” (also known as data network effects, see Fig. 4): Users produce data which enable to build a smarter product with better algorithms so that more users use the product and thus produce more data, and so on.

Fig. 4
figure 4

The virtuous cycle of data network effects by combining manual and automatic data curation approaches [22]

Fig. 5
figure 5

Conceptual meta-model in UML for templates and interface design for an external template-based information extractor

4.1 Manual approaches

Ontology design: The first group of use cases requires rather domain-specific and fine-grained ontologies. We suggest to develop novel or reuse ontologies that fit the respective use case and the specific domain (e.g. EXPO [99] for experiments). Moreover, appropriate and simple user interfaces are necessary for efficient and easy population.

However, such ontologies can evolve with the help of the community, as demonstrated by WikiData and Wikipedia with “infoboxes” (see Sect. 2.3). Therefore, the system should enable the maintenance of templates, which are pre-defined and very specific forms consisting of fields with certain types (see Fig. 5). For instance, to automatically generate leaderboards for machine learning tasks, a template would have the fields task, model, dataset, and score, which can then be filled in by a curator for articles providing such kind of results in a user interface generated from the template. Such an approach is based on meta-modelling [13], as the meta-model for templates enables the definition of concrete templates, which are then instantiated for articles.

Knowledge graph population: Several user interfaces are required to enable manual population: (1) populate semantic content for a research article by (1a) choosing relevant templates or ontologies and (1b) fill in the values; (2) terminology management (e.g. domain-specific research fields); (3) maintain research field overviews by (3a) assigning relevant research articles to the research field, (3b) define corresponding templates, and (3c) fill in the templates for the relevant research articles.

Furthermore, the system should also offer Application Programming Interfaces (APIs) to enable population by third-party applications, e.g.:

  • Submission portals such as https://www.easychair.org/ during submission of an article.

  • Authoring tools such as https://www.overleaf.com/ during writing.

  • Virtual research environments [101] to store evaluation results and links to datasets and source code during experimenting and data analysis.

To encourage stakeholders like researchers, librarians, crowd workers to contribute content, we see the following options:

  • Top-down enforcement via submission portals and publishers.

  • Incentive models: Researchers want their articles to be cited; semantic content helps other researchers to find, explore and understand an article. This is also related to the concept of enlightened self-interest, i.e. act to further interests of others to serve the own self-interest.

  • Provide public acknowledgements for curators.

  • Bring together experts (e.g. librarians, researchers from different institutions) who curate and organise content for specific research problems or disciplines.

4.2 (Semi-)automatic approaches

Ontology design: The second group of use cases require a high completeness, while a relatively low correctness and domain specialisation are acceptable. For these use cases, rather simple or domain-independent ontologies should be developed or reused. Although approaches for automatic ontology learning exist (see Sect. 2.3), the quality of their results is not sufficient to generate a meaningful ORKG with complex conceptual models and relations. Therefore, meaningful ontologies should be designed by human experts.

Knowledge graph population: Various approaches can be used to (semi-)automatically populate an ORKG. Methods for entity and relation extraction (see Sect. 2.3) can help to populate fine-grained KGs with high completeness and entity linking approaches can link mentions in text with entities in KGs. For cross-modal linking, Singh et al. [97] suggest an approach to detect URLs to datasets in research articles automatically, while the Scientific Software Explorer [52] connects text passages in research articles with code fragments. To extract relevant information at sentence level, approaches for sentence classification in scientific text can be applied (see Sect. 2.3). To support the curator fill in templates semi-automatically, template-based extraction can (1) suggest relevant templates for a research article and (2) pre-fill fields of templates with appropriate values. For pre-filling, approaches such as n-ary relation extraction [44, 54, 57, 59] or end-to-end question answering [33, 91] could be applied.

Furthermore, the system should enable to plugin external information extractors, developed for certain scientific domains to extract specific types of information. For instance, as depicted in Fig. 5, an external template information extractor has to implement an interface with three methods. This enables the system (1) to filter relevant template extractors for an article and (2) extract field values from an article.

5 Conclusions and future work

In this paper, we have presented a requirements analysis for an Open Research Knowledge Graph (ORKG). An ORKG should represent the content of research articles in a semantic way to enhance or enable a wide range of use cases. We identified literature-related core tasks of a researcher that can be supported by an ORKG and formulated them as use cases. For each use case, we discussed specificities and requirements for the underlying ontology and the instance data. In particular, we identified two groups of use cases: (1) the first group requires instance data with high correctness and rather fine-grained, domain-specific ontologies, but with moderate completeness; (2) the second group requires a high completeness, but the ontologies can be kept rather simple and domain-independent, and a moderate correctness of the instance data is sufficient. Based on the requirements, we have described possible manual and semi-automatic approaches (necessary for the first group), and automatic approaches (appropriate for the second group) for KG construction. In particular, we propose a framework with lightweight ontologies that can evolve by community curation. Furthermore, we have described the interdependence with external systems, user interfaces, and APIs for third-party applications to populate an ORKG.

The results of our work aim to give a holistic view of the requirements for an ORKG and guide further research. The suggested approaches have to be refined, implemented, and evaluated in an iterative and incremental process (see www.orkg.org for the current progress). Users from different scientific domains should be deeply involved in the development process to build proper solutions. Furthermore, since ontologies and instance data will evolve in the ORKG, solutions are required to adequately support this evolution process (e.g. editing, versioning, support to report inconsistencies, etc.). Finally, our analysis can serve as a foundation for a discussion on ORKG requirements with other researchers and practitioners.