Abstract
In this paper we propose a new algorithm for person name disambiguation within authors of scientific publications. The algorithm is effective, elastic, and tailored to a scientific knowledge base. Besides the common properties of publication; namely, title, venue, author and co-authors names, it also exploits references. One of the reasons is that we decided to enrich the University Knowledge Base with connections between publications, not only references represented by a reference (i.e. author’s name, title, etc.). Our algorithm utilises the unsupervised approach which does not require creating a training set, which is time and resources consuming. However, we want to leverage additional information available from crowd sourcing or authorised users which confirms authorship and citation relations between papers. By utilising this information default parameters of the unsupervised algorithm can be optimised for a given case by means of a genetic algorithm in order to increase the accuracy. The proposed method can be applied for three tasks: assigning a publication to a specific researcher, indicating that a new author is yet unknown to the database and clustering a set of publications into clusters that contain papers of one researcher. Validation results confirm high accuracy of the new algorithm and its usefulness in the process of populating a scientific knowledge base.
Research has been supported by the National Centre for Research and Development under grant No SP/I/1/77065/10 and the Institute of Computer Science, Warsaw University of Technology under Grant No. II/2015/DS/1.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
References
Koperwas, J., Skonieczny, Ł., Kozłowski, M., Andruszkiewicz, P., Rybiński, H., Struk, W.: AI platform for building university research knowledge base. In: Andreasen, T., Christiansen, H., Cubero, J.-C., Raś, Z.W. (eds.) ISMIS 2014. LNCS, vol. 8502, pp. 405–414. Springer, Heidelberg (2014)
Smalheiser, N.R., Torvik, V.I.: Author name disambiguation. ARIST 43(1), 1–43 (2009)
Ferreira, A.A., Gonçalves, M.A., Laender, A.H.F.: A brief survey of automatic methods for author name disambiguation. SIGMOD Rec. 41(2), 15–26 (2012)
Han, H., Giles, C.L., Zha, H., Li, C., Tsioutsiouliklis, K.: Two supervised learning approaches for name disambiguation in author citations. In: Chen, H., Wactlar, H.D., Chen, C., Lim, E., Christel, M.G. (eds.) Proceedings of ACM/IEEE Joint Conference on Digital Libraries, JCDL 2004, Tucson, AZ, USA, 7–11 June 2004, pp. 296–305. ACM (2004)
Ferreira, A.A., Veloso, A., Gonçalves, M.A., Laender, A.H.: Effective self-training author name disambiguation in scholarly digital libraries. In: Proceedings of the 10th Annual Joint Conference on Digital Libraries, pp. 39–48. ACM (2010)
Veloso, A., Ferreira, A.A., Gonçalves, M.A., Laender, A.H.F., Meira, Jr. W.: Cost-effective on-demand associative author name disambiguation. Inf. Process. Manage. vol. 48(4), pp. 680–967 (2012)
Tang, J., Yao, L., Zhang, D., Zhang, J.: A combination approach to web user profiling. ACM Trans. Knowl. Discov. Data 5(1), 2: 1–2: 44 (2010)
Tang, J., Fong, A.C.M., Wang, B., Zhang, J.: A unified probabilistic framework for name disambiguation in digital library. IEEE Trans. Knowl. Data Eng. 24(6), 975–987 (2012)
Li, S., Cong, G., Miao, C.: Author name disambiguation using a new categorical distribution similarity. In: Flach, P.A., De Bie, T., Cristianini, N. (eds.) ECML PKDD 2012, Part I. LNCS, vol. 7523, pp. 569–584. Springer, Heidelberg (2012)
Liu, Y., Li, W., Huang, Z., Fang, Q.: A fast method based on multiple clustering for name disambiguation in bibliographic citations. JASIST 66(3), 634–644 (2015)
Yin, X., Han, J., Yu, P.S.: Object distinction: Distinguishing objects with identical names. In: Chirkova, R., Dogac, A., Özsu, M.T., Sellis, T.K., (eds.) Proceedings of the 23rd International Conference on Data Engineering, ICDE 2007, The Marmara Hotel, Istanbul, Turkey, 15–20 April 2007, pp. 1242–1246. IEEE (2007)
Han, H., Zha, H., Giles, C.L.: Name disambiguation in author citations using a k-way spectral clustering method. In: Marlino, M., Sumner, T., III, F.M.S., (eds.) Proceedings of ACM/IEEE Joint Conference on Digital Libraries, JCDL 2005, Denver, CO, USA, 7–11 June 2005, pp. 334–343. ACM (2005)
Cota, R.G., Ferreira, A.A., Nascimento, C., Gonçalves, M.A., Laender, A.H.F.: An unsupervised heuristic-based hierarchical method for name disambiguation in bibliographic citations. JASIST 61(9), 1853–1870 (2010)
Pereira, D.A., Ribeiro-Neto, B.A., Ziviani, N., Laender, A.H.F., Gonçalves, M.A., Ferreira, A.A.: Using web information for author name disambiguation. In: Heath, F., Rice-Lively, M.L., Furuta, R., (eds.) Proceedings of the 2009 Joint International Conference on Digital Libraries, JCDL 2009, Austin, TX, USA, 15–19 June 2009, pp. 49–58. ACM (2009)
Peng, H., Lu, C., Hsu, W., Ho, J.: Disambiguating authors in citations on the web and authorship correlations. Expert Syst. Appl. 39(12), 10521–10532 (2012)
de Souza, E.A., Ferreira, A.A., Gonçalves, M.A.: Combining classifiers and user feedback for disambiguating author names. In: II, P.L.B., Allard, S., Mercer, H., Beck, M., Cunningham, S.J., Goh, D.H., Henry, G., (eds.) Proceedings of the 15th ACM/IEEE-CE on Joint Conference on Digital Libraries, Knoxville, TN, USA, 21–25 June 2015, pp. 259–260. ACM (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Andruszkiewicz, P., Szepietowski, S. (2016). Person Name Disambiguation for Building University Knowledge Base. In: Nguyen, N.T., Trawiński, B., Fujita, H., Hong, TP. (eds) Intelligent Information and Database Systems. ACIIDS 2016. Lecture Notes in Computer Science(), vol 9621. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-49381-6_26
Download citation
DOI: https://doi.org/10.1007/978-3-662-49381-6_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-49380-9
Online ISBN: 978-3-662-49381-6
eBook Packages: Computer ScienceComputer Science (R0)