A novel multiple layers name disambiguation framework for digital libraries using dynamic clustering

Zhu, Jia; Wu, Xingcheng; Lin, Xueqin; Huang, Changqin; Fung, Gabriel Pui Cheong; Tang, Yong

doi:10.1007/s11192-017-2611-8

A novel multiple layers name disambiguation framework for digital libraries using dynamic clustering

Published: 14 December 2017

Volume 114, pages 781–794, (2018)
Cite this article

Scientometrics Aims and scope Submit manuscript

Jia Zhu¹,
Xingcheng Wu¹,
Xueqin Lin¹,
Changqin Huang¹,
Gabriel Pui Cheong Fung² &
…
Yong Tang¹

1188 Accesses
14 Citations
Explore all metrics

Abstract

In many types of databases, such as a science bibliography database, the name attribute is the most commonly used identifier to recognize entities. However, names are frequently ambiguous and not always unique, thereby causing problems in various fields. Name disambiguation is a data management task that aims to properly distinguish different entities that share the same name, particularly for large databases such as digital libraries, because the information that can be used to identify author’s name is limited. In digital libraries, the issue of ambiguous author names occurs due to the existence of multiple authors with the same name or different name variations for the same author. Most previous works conducted to solve this issue frequently used hierarchical clustering approaches based on information within citation records, e.g., co-authors and publication titles. In the present study, we propose a multiple layers name disambiguation framework that is not only applicable to digital libraries but can also be easily extended to other applications. Our framework adopts a dynamic clustering mechanism to minimize clustering errors. We evaluated our approach on real world corpora, and favorable experiment results indicated that our proposed framework was feasible.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Novel Approach for Author Name Disambiguation Using Ranking Confidence

Large Scale Name Disambiguation Using Rule-Based Post Processing Combined with Aminer

Dynamic author name disambiguation for growing digital libraries

Article 21 July 2015

Yanan Qian, Qinghua Zheng, … Jun Liu

Notes

References

Alvaro, E. & Charles, E. (1997). An efficient domain-independent algorithm for detecting approximately duplicate database records. In Research Issues on Data Mining and Knowledge Discovery, (pp. 23–29).
Amancio, D. R., Oliveira, O. N, Jr., & da Costa, L. F. (2015). Topological-collaborative approach for disambiguating authors names in collaborative networks. Scientometrics, 102(1), 465–485.
Article Google Scholar
Dina, B., & David, J. (1983). Duplicate record elimination in large data files. ACM Transactions on Database Systems, 8(2), 255–265.
Article MATH Google Scholar
Dongwen, L., Byung-Won, O., Jaewoo, K., & Sanghyun, P. (2005). Effective and scalable solutions for mixed and split citation problems in digital libraries. In Proceedings of the 2nd International Workshop on Information Quality in Information Systems. ACM, (pp 69–76).
Han, H., Zhang, H., & Giles, C. L. (2005). Name disambiguation in author citations using a k-way spectral clustering method. In 5th ACM/IEEE Joint Conference on Digital Libraries, (pp. 334–343).
Hanna, P., Bhaskara, M., Brian, M., Stuart, J., & Ilya, S. (2002). Identity uncertainty and citation matching. Neural Information Processing Systems, (pp. 1401–1408).
Hui, H., Hong, Y., & Lee, G. (2005). Name disambiguation in author citations using a k-way spectral clustering method. In 5th ACM/IEEE Joint Conference on Digital Libraries, (pp. 334–343).
Ivan, P., & Alan, B. (1969). A theory for record linkage. Journal of the American Statistical Association, 64(328), 1183–1210.
Article Google Scholar
Kalashnikov, D. V., & Mehrotra, S. (2006). Domain-independent data cleaning via analysis of entity relationship graph. ACM Transactions Database System, 31(2), 716–767.
Article Google Scholar
Liu, Y., Li, W., Huang, Z., & Fang, Q. (2015). A fast method based on multiple clustering for name disambiguation in bibliographic citations. Journal of the Association for Information Science and Technology, 66(3), 636–644.
Article Google Scholar
MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability.
McCallum, A., Nigam, K., & Ungar, L. H. (2000). Efficient clustering of high-dimensional data sets with application to reference matching. Knowledge Discovery and Data Mining, (pp. 169–178).
Schulz, J. (2015). Using monte carlo simulations to assess the impact of author name disambiguation quality on different bibliometric analyses. Scientometrics, 107(3), 1283–1298.
Article Google Scholar
Shin, D., Kim, T., Choi, J., & Kim, J. (2014). Author name disambiguation using a graph model with node splitting and merging based on bibliographic information. Scientometrics, 100(1), 15–50.
Article Google Scholar
Song, Y., Huang, J., Councill, I. G., Li, J., & Giles., C. L. (2007). Efficient topic-based unsupervised name disambiguation. In 7th ACM/IEEE Joint Conference on Digital Libraries, (pp. 342–352).
Szekely, G. J., & Rizzo, M. L. (2005). Hierarchical clustering via joint between-within distances: Extending ward’s minimum variance method. Journal of Classification, 22, 151–183.
Article MathSciNet MATH Google Scholar
Tang, J., Fong, A., Wang, B., & Zhang, J. (2012). A unified probabilistic framework for name disambiguation in digital library. TKDE, 24(6), 975–987.
Google Scholar
Wu, J., & Ding, X. (2013). Author name disambiguation in scientific collaboration and mobility cases. Scientometrics, 96(3), 683–697.
Article Google Scholar
Yang, K. H., Peng, H. T., Jiang, J. Y., Lee, H. M., & Ho, J. M. (2008). Author name disambiguation for citations using topic and web correlation. In Proceedings of 12th European Conference on Research and Advanced Technology for Digital Libraries, (pp. 185–196).
Yin, X. X. & Han, J. W. (2007). Object distinction: Distinguishing objects with identical names. In IEEE 23rd International Conference on Data Engineering, (pp. 1242–1246).
Zhu, J., Fung, G. P. C., & Zhou, X. F. (2009). A term-based driven clustering approach for name disambiguation. Proceedings on Joint APWeb/WAIM, (pp. 320–331).
Zhu, J., Fung, G., & Zhou, X. (2010). Efficient web pages identification for entity resolution. 19th International World Wide Web, (pp. 1223–1224).
Zhu, J., Yang, Y., Xie, Q., Wang, L. W., & Hassan, S. (2014). Robust hybrid name disambiguation framework for large databases. Scientometrics, 98(3), 2255–2274.
Article Google Scholar

Download references

Acknowledgements

This work was supported by the National Science Foundation of China (No. 61772211, 61370229, 61750110516), the Natural Science Foundation of Guangdong Province, China (No. 2015A030310509), the S&T Projects of Guangdong Province, China (No. 2016A030303055, 2016B030305004, 2016B010109008), and the science and technology Projects of Guangzhou Municipality, China (201604010003, 201604016019).

Author information

Authors and Affiliations

School of Computer Science, South China Normal University, Guangzhou, China
Jia Zhu, Xingcheng Wu, Xueqin Lin, Changqin Huang & Yong Tang
Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Hong Kong, China
Gabriel Pui Cheong Fung

Authors

Jia Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Xingcheng Wu
View author publications
You can also search for this author in PubMed Google Scholar
Xueqin Lin
View author publications
You can also search for this author in PubMed Google Scholar
Changqin Huang
View author publications
You can also search for this author in PubMed Google Scholar
Gabriel Pui Cheong Fung
View author publications
You can also search for this author in PubMed Google Scholar
Yong Tang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jia Zhu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhu, J., Wu, X., Lin, X. et al. A novel multiple layers name disambiguation framework for digital libraries using dynamic clustering. Scientometrics 114, 781–794 (2018). https://doi.org/10.1007/s11192-017-2611-8

Download citation

Received: 04 April 2016
Published: 14 December 2017
Issue Date: March 2018
DOI: https://doi.org/10.1007/s11192-017-2611-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A novel multiple layers name disambiguation framework for digital libraries using dynamic clustering

Abstract

Access this article

Similar content being viewed by others

A Novel Approach for Author Name Disambiguation Using Ranking Confidence

Large Scale Name Disambiguation Using Rule-Based Post Processing Combined with Aminer

Dynamic author name disambiguation for growing digital libraries

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A novel multiple layers name disambiguation framework for digital libraries using dynamic clustering

Abstract

Access this article

Similar content being viewed by others

A Novel Approach for Author Name Disambiguation Using Ranking Confidence

Large Scale Name Disambiguation Using Rule-Based Post Processing Combined with Aminer

Dynamic author name disambiguation for growing digital libraries

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation