Contents and time sensitive document ranking of scientific literature

doi:10.1016/j.joi.2014.04.006

Journal of Informetrics

Volume 8, Issue 3, July 2014, Pages 546-561

https://doi.org/10.1016/j.joi.2014.04.006 Get rights and content

Highlights

•
A new ranking framework better models the behaviour of readers of scientific papers.
•
Dynamic adjustments of random walk parameters in terms of paper's contents and age.
•
Scoring a paper with awareness of its diversity of topics and how they evolve.
•
More effective than PageRank in ranking papers’ potential scientific utility.
•
High practical value in retrieving and placing useful papers at the top of ranking

Abstract

A new link-based document ranking framework is devised with at its heart, a contents and time sensitive random literature explorer designed to more accurately model the behaviour of readers of scientific documents. In particular, our ranking framework dynamically adjusts its random walk parameters according to both contents and age of encountered documents, thus incorporating the diversity of topics and how they evolve over time into the score of a scientific publication. Our random walk framework results in a ranking of scientific documents which is shown to be more effective in facilitating literature exploration than PageRank measured against a proxy gold standard based on papers’ potential usefulness in facilitating later research. One of its many strengths lies in its practical value in reliably retrieving and placing promisingly useful papers at the top of its ranking.

Section snippets

Introduction and motivation

The explosive growth of the Internet and the overabundance of data fuel the creation and development of information networks, which constantly poses new challenges for information retrieval. As the searched domains expand, even queries targeted at some niche field retrieve a large volume of potentially relevant information that far exceeds human processing capabilities. Ranking addresses the challenge of information overload by identifying material of the highest “quality” among all “relevant”

Scientific document ranking

Scientific document ranking is a challenging task whose core problem is to quantify the importance of academic publications. Citation count based metrics have a long lineage, tracing back to the pioneering work done by Garfield on citation analysis in the 1970s ([Garfield, 1972], [Garfield, 1979]), and they are still widely used today. However, citation count has been challenged for being a quantitative measure of the popularity of a scientific document that fails to properly capture

Problem statement

We aim at ranking documents in a citation network to help researchers identify papers of high scientific utility in their field, a paper's usefulness being acknowledged in the kind of incoming citations it receives from later work. A scientific citation network has the same abstract structure as any other directed network, but it is distinctively static in nature: the contents of a document and the references it includes are frozen at the time of publication, imposing a strict temporal

Our approach

Given the recent relative success of applying PageRank to scientific document ranking ([King et al., 2013], [Li et al., 2008], [Sayyadi and Getoor, 2009], [Walker et al., 2007]), we propose to use PageRank as a starting point for our link-based ranking framework. But as argued in Section 3, some modifications are required to adapt PageRank and properly deal with the ageing factor and the intricate topic dynamics in a citation network in order to make the resulting ranking more useful. PageRank

A proxy gold standard

Without a notion of gold standard, evaluating ranking results on intrinsic measures, e.g., the potential scientific utility of academic publications, is infeasible ([King et al., 2013], [Walker et al., 2007]). However, the construction of a perfect gold standard would inevitably involve large-scale human evaluations, which have been prohibitively expensive for all studies up to date. A common practice to get around this problem that is well accepted in the information retrieval community is the

Conclusion and future work

In this paper we described RALEX, a random-walk based document ranking framework for scientific literature to help researchers identify publications with a high potential for scientific utility in their domains of interest, which is all the more challenging that those domains are at the frontier of science and technology. To the best of our knowledge, this work represents the first attempt to rank scientific papers in decreasing order of potential usefulness taking both topical contents and age

Acknowledegment

The authors would like to thank the anonymous reviewers for their valuable comments and constructive suggestions.

References (30)

M. Bressan et al.
Choose the damping, choose the ranking?
Journal of Discrete Algorithms
(2010)
S. Brin et al.
The anatomy of a large-scale hypertextual Web search engine
Computer Networks and ISDN Systems
(1998)
P. Chen et al.
Finding scientific gems with Google's PageRank algorithm
Journal of Informetrics
(2007)
N. Ma et al.
Bringing PageRank to the citation analysis
Information Processing & Management
(2008)
D.M. Blei et al.
Latent Dirichlet Allocation
Journal of Machine Learning Research
(2003)
P. Boldi et al.
PageRank as a function of the damping factor
S. Bonzi
Characteristics of a literature as predictors of relatedness between cited and citing works
Journal of the American Society for Information Science
(1982)
B. Brooke
Optimum p% library of scientific periodicals
Nature
(1971)
V.P. Diodato
Dictionary of bibliometrics
(1994)
L. Egghe et al.
Introduction to informetrics: Quantitative methods in library
Documentation and information science
(1990)

E. Garfield

Citation analysis as a tool in journal evaluation

Science

(1972)

E. Garfield

Citation indexing: Its theory and application in science, technology, and humanities

Information sciences series

(1979)

D.F. Gleich et al.

2010. Tracking the random surfer: Empirically measured teleportation parameters in PageRank

T.L. Griffiths et al.

Finding scientific topics

Proceedings of the National Academy of Sciences

(2004)

D. Hall et al.

Studying the history of ideas using topic models

Cited by (16)

Aspect-based opinion ranking framework for product reviews using a Spearman's rank correlation coefficient method
2018, Information Sciences
Citation Excerpt :
The authors considered the new features (word rules, negation rules, and too rules) that were not covered by Bing Liu's aspect-based opinion mining approach. Xu et al. [12] designed a framework for ranking contents and time-sensitive documents in the scientific literature. The framework dynamically adjusts its random walk parameters.
Opinion mining (also called sentiment analysis) is a type of natural language processing for computing people's opinions and emotions. It detects opinions from structured, semi-structured, and unstructured social media contents at different levels, such as the document, word, sentence, and aspect levels. In all these levels except aspect, opinion mining identifies the overall subjectivity or sentiment polarities. An aspect level is described as a part or an attribute of an entity. It exactly describes people's likes and dislikes in social media contents. In this paper, we propose a new framework for ranking products based on aspects. First, the system identifies the aspects of products. Second, the aspects and their opinion words are identified and visualized from the products’ reviews using a Harel–Koren fast multiscale layout. Third, the network visualization is constructed and modeled, and a Spearman's rank correlation coefficient based opinion ranking method is applied to rank the products based on positive and negative ranks. Fourth, the supervised learning methods (Naïve Bayes, Maximum Entropy, and Support Vector Machine) are employed for the aspect-based sentiment classification task. Finally, the performance of the system is measured by the experimental results.
Analysing academic paper ranking algorithms using test data and benchmarks: an investigation
2022, Scientometrics
Measuring academic entities’ impact by content-based citation analysis in a heterogeneous academic network
2021, Scientometrics
Knowledge fusion through academic articles: a survey of definitions, techniques, applications and challenges
2020, Scientometrics
Where Should I Submit My Work for Publication? An Asymmetrical Classification Model to Optimize Choice
2020, Journal of Classification
Automatic sentence extraction for the detection of scientific paper relations
2018, Journal of Physics: Conference Series

View all citing articles on Scopus

View full text

Contents and time sensitive document ranking of scientific literature

Highlights

Abstract

Section snippets

Introduction and motivation

Scientific document ranking

Problem statement

Our approach

A proxy gold standard

Conclusion and future work

Acknowledegment

Journal of Discrete Algorithms

Computer Networks and ISDN Systems

Journal of Informetrics

Information Processing & Management

Latent Dirichlet Allocation

Journal of Machine Learning Research

PageRank as a function of the damping factor

Characteristics of a literature as predictors of relatedness between cited and citing works

Journal of the American Society for Information Science

Optimum p% library of scientific periodicals

Nature

Dictionary of bibliometrics

Introduction to informetrics: Quantitative methods in library

Documentation and information science