Doc2vec-based link prediction approach using SAO structures: application to patent network

Byungun Yoon¹,
Songhee Kim¹,
Sunhye Kim¹ &
…
Hyeonju Seol²

1643 Accesses
13 Citations
Explore all metrics

Abstract

As the amount of documents has exploded in the Internet era, many researchers have tried to understand the relationships between documents and predict the links between similar but unconnected documents. However, existing link prediction techniques that use the predefined links of documents might provide incorrect results, because of the generic problem of citation analysis. Moreover, they may fail to reflect important contents of documents in the link prediction process. Thus, we propose a new link prediction approach that employs the Doc2vec algorithm, a document-embedding method, in order to predict potential links between documents, by reflecting the functional context of technological words. For this, first, we collected both citation information and documents of patents of interest, and generated a patent network by using the citation relationship between patents. Second, we identified unconnected links between nodes and transformed the patent document into document vectors, based on the Doc2vec algorithm. In particular, since patent documents include useful functions for solving technological problems, the proposed approach extracts subject-action-object (SAO) structures that we used to generate document vectors. Then, we calculated the similarity between patents in the unconnected links of a patent network, and could predict potential links by using the similarity. Third, we validated the results of the proposed approach by comparing them using the Adamic–Adar technique, one of the traditional link prediction techniques, and word vector-based link prediction. We applied the Doc2vec-based link prediction approach to a real case, the unmanned aerial vehicle (UAV) technology field. We found that the proposed approach makes better predictions performance than the Adamic–Adar technique and the word vector approach. Our results can help analyzers accurately forecast future relationships between nodes in a network, and give R&D managers insightful information on the future direction of technological development by using a patent network.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

A Method of Knowledge Extraction for Response to Rapid Technological Change with Link Mining

Measuring patent similarity with SAO semantic analysis

Article 20 July 2019

Main path analysis for technological development using SAO structure and DEMATEL based on keyword causality

Article 11 February 2023

References

Adamic, L. A., & Adar, E. (2003). Friends and neighbors on the web. Social Networks, 25(3), 211–230.
Article Google Scholar
Al Hasan, M., Chaoji, V., Salem, S., & Zaki, M. (2006). Link prediction using supervised learning. In SDM06: workshop on link analysis, counter-terrorism and security.
Behrouzi, S., Sarmoor, Z. S., Hajsadeghi, K., & Kavousi, K. (2020). Predicting scientific research trends based on link prediction in keyword networks. Journal of Informetrics, 14(4), 101079.
Article Google Scholar
Chen, D., & Manning, C. D. (2014). A fast and accurate dependency parser using neural networks. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 740–750).
Chen, H., Li, X., & Huang, Z. (2005). Link prediction approach to collaborative filtering. In Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL'05) (pp. 141–142).
Dai, A. M., Olah, C., & Le, Q. V. (2015). Document embedding with paragraph vectors. arXiv preprint http://arxiv.org/abs/arXiv:1507.07998.
Getoor, L. (2003). Link mining: A new data mining challenge. ACM SIGKDD Explorations Newsletter, 5(1), 84–89.
Article Google Scholar
Getoor, L., & Diehl, C. P. (2005). Link mining: A survey. ACM SIGKDD Explorations Newsletter, 7(2), 3–12.
Article Google Scholar
Goldberg, Y., & Levy, O. (2014). word2vec Explained: Deriving Mikolov et al.'s negative-sampling word-embedding method. arXiv preprint http://arxiv.org/abs/arXiv:1402.3722.
Guo, J., Wang, X., Li, Q., & Zhu, D. (2016). Subject–action–object-based morphology analysis for determining the direction of technological change. Technological Forecasting and Social Change, 105, 27–40.
Article Google Scholar
Hopcroft, J., Lou, T., & Tang, J. (2011). Who will follow you back?: Reciprocal relationship prediction. Proceedings of the 20th ACM international conference on Information and knowledge management, ACM (2011), pp. 1137–1146.
Huang, Z., Chen, H., & Zeng, D. (2004). Applying associative retrieval techniques to alleviate the sparsity problem in collaborative filtering. ACM Transactions on Information Systems (TOIS), 22(1), 116–142.
Article Google Scholar
Huang, E. H., Socher, R., Manning, C. D., & Ng, A. Y. (2012). Improving word representations via global context and multiple word prototypes. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers,1, 873–882.
Jeong, B., Ko, N., Son, C., & Yoon, J. (2021). Trademark-based framework to uncover business diversification opportunities: Application of deep link prediction and competitive intelligence analysis. Computers in Industry, 124, 103356.
Article Google Scholar
Kroeger P. R., Analyzing grammar: An introduction. Cambridge University Press, 2005.
Lau, J. H., & Baldwin, T. (2016). An empirical evaluation of doc2vec with practical insights into document embedding generation. arXiv preprint http://arxiv.org/abs/arXiv:1607.05368.
Le, Q., & Mikolov, T. (2014). Distributed representations of sentences and documents. In International conference on machine learning (pp. 1188–1196).
Levy, O., & Goldberg, Y. (2014). Neural word embedding as implicit matrix factorization. In Advances in neural information processing systems (pp. 2177–2185).
Li, S., Chua, T. S., Zhu, J., & Miao, C. (2016). Generative topic embedding: A continuous representation of documents. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (Vol. 1, pp. 666–675).
Liben-Nowell, D., & Kleinberg, J. (2007). The link-prediction problem for social networks. Journal of the American Society for Information Science and Technology, 58(7), 1019–1031.
Article Google Scholar
Liu, Y., Liu, Z., Chua, T. S. & Sun, M. (2015). Topical word embeddings. In Twenty-Ninth AAAI Conference on Artificial Intelligence.
Liu, W., & Lü, L. (2010). Link prediction based on local random walk. EPL (europhysics Letters), 89(5), 58007.
Article Google Scholar
Lü, L., & Zhou, T. (2011). Link prediction in complex networks: A survey. Physica a: Statistical Mechanics and Its Applications, 390(6), 1150–1170.
Article Google Scholar
Manning, C. D., Surdeanu, M., Bauer, J., Finkel, J. R., Bethard, S., & McClosky, D. (2014). The Stanford CoreNLP natural language processing toolkit. In Proceedings of 52nd annual meeting of the association for computational linguistics: System demonstrations (pp. 55–60).
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint http://arxiv.org/abs/arXiv:1301.3781.
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems (pp. 3111–3119).
Moehrle, M. G., Walter, L., Geritz, A., & Muller, S. (2005). Patent-based inventor profiles as a basis for human resource decisions in research and development. R&D Management, 35(5), 513–524.
Article Google Scholar
Pavlov, M., & Ichise, R. (2007). Finding experts by link prediction in co-authorship networks. FEWS, 290, 42–55.
Google Scholar
Popescul, A., & Ungar, L. H. (2003, August). Statistical relational learning for link prediction. In IJCAI workshop on learning statistical models from relational data (Vol. 2003).
Rajbabu, K., Srinivas, H., & Sudha, S. (2018). Industrial information extraction through multi-phase classification using ontology for unstructured documents. Computers in Industry, 100, 137–147.
Article Google Scholar
Rong, X. (2014). word2vec parameter learning explained. arXiv preprint http://arxiv.org/abs/arXiv:1411.2738.
Sun H. L., Ch’ng E., Yong X., Garibaldi J. M., See S., Chen D.-B. (2017). An improved game-theoretic approach to uncover overlapping communities International Journal of Modern Physics C, 28 (9), 1750112.
Tang, J., Wu, S., Sun, J., & Su. H. (2012). Cross-domain collaboration recommendation. Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, pp. 1285–129.
Tang, D., Wei, F., Yang, N., Zhou, M., Liu, T., & Qin, B. (2014). Learning sentiment-specific word embedding for twitter sentiment classification. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (Vol. 1, pp. 1555–1565).
Tang, J., Qu, M., & Mei, Q. (2015, August). Pte: Predictive text embedding through large-scale heterogeneous text networks. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1165–1174). ACM.
Taskar, B., Wong, M. F., Abbeel, P., & Koller, D. (2004). Link prediction in relational data. In Advances in neural information processing systems (pp. 659–666).
Toutanova, K., & Manning, C. (2000). Enriching the knowledge sources used in a maximum entropy part-of-speech tagger. In Proceedings of the 2000 Joint SIGDAT Conference EMNLP/VLC (pp. 63–71).
Turian, J., Ratinov, L., & Bengio, Y. (2010, July). Word representations: a simple and general method for semi-supervised learning. In Proceedings of the 48th annual meeting of the association for computational linguistics (pp. 384–394). Association for Computational Linguistics.
Wu, J., Zhang, G., & Ren, Y. (2017). A balanced modularity maximization link prediction model in social networks. Information Processing & Management, 53(1), 295–307.
Article Google Scholar
Xie, Q., Zhang, X., Ding, Y., & Song, M. (2020). Monolingual and multilingual topic analysis using LDA and BERT embeddings. Journal of Informetrics, 14(3), 101055.
Article Google Scholar
Zhang, Y., Lu, J., Liu, F., Liu, Q., Porter, A., Chen, H., & Zhang, G. (2018). Does deep learning help topic extraction? A kernel k-means clustering method with word embedding. Journal of Informetrics, 12(4), 1099–1117.
Article Google Scholar

Download references

Acknowledgements

This work was supported by the Basic Science Research Program of the National Research Foundation of Korea (NRF) funded by the Ministry of Science and ICT under Grant NRF-2017R1D1A1B03036213.

Author information

Authors and Affiliations

Department of Industrial & Systems Engineering, Dongguk University, Seoul, 04620, South Korea
Byungun Yoon, Songhee Kim & Sunhye Kim
School of Integrated National Security, Chungnam National University, Daejeon, 34134, South Korea
Hyeonju Seol

Authors

Byungun Yoon
View author publications
You can also search for this author in PubMed Google Scholar
Songhee Kim
View author publications
You can also search for this author in PubMed Google Scholar
Sunhye Kim
View author publications
You can also search for this author in PubMed Google Scholar
Hyeonju Seol
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hyeonju Seol.

Appendices

Appendix 1: Code for Doc2vec

Appendix 2: Searching query for UAV technology

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yoon, B., Kim, S., Kim, S. et al. Doc2vec-based link prediction approach using SAO structures: application to patent network. Scientometrics 127, 5385–5414 (2022). https://doi.org/10.1007/s11192-021-04187-4

Download citation

Received: 27 February 2021
Accepted: 11 October 2021
Published: 13 November 2021
Issue Date: September 2022
DOI: https://doi.org/10.1007/s11192-021-04187-4