[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Identification of topic evolution: network analytics with piecewise linear representation and word embedding

Published: 01 September 2022 Publication History

Abstract

Understanding the evolutionary relationships among scientific topics and learning the evolutionary process of innovations is a crucial issue for strategic decision makers in governments, firms and funding agencies when they carry out forward-looking research activities. However, traditional co-word network analysis on topic identification cannot effectively excavate semantic relationship from the context, and fixed time window method cannot scientifically reflect the evolution process of topics. This study proposes a framework of identifying topic evolutionary pathways based on network analytics: Firstly, keyword networks are constructed, in which a piecewise linear representation method is used for dividing time periods and a Word2Vec mode is used for capturing semantics from the context of titles and abstracts; Secondly, a community detection algorithm is used to identify topics in networks; Finally, evolutionary relationships between topics are represented by measuring the topic similarity between adjacent time periods, and then topic evolutionary pathways are identified and visualized. An empirical study on information science demonstrates the reliability of the methodology, with subsequent empirical validations.

References

[1]
Arruda HF, Costa LDF, and Amancio DR Topic segmentation via community detection in complex networks Chaos: An Interdisciplinary Journal of Nonlinear Science 2016 26 6 063120
[2]
Balili C, Lee U, Segev A, Kim J, and Ko M TermBall: tracking and predicting evolution types of research topics by using knowledge structures in scholarly big data IEEE Access 2020 8 108514-108529
[3]
Blei, D. M., & Lafferty, J. D. (2006). Dynamic topic models. In Proceedings of the 23rd ACM international conference on machine learning (pp. 113–120).
[4]
Blondel VD, Guillaume JL, Lambiotte R, and Lefebvre E Fast unfolding of communities in large networks Journal of Statistical Mechanics: Theory and Experiment 2008 30 2 155-168
[5]
Börner K, Chen C, and Boyack KW Visualizing knowledge domains Annual Review of Information Science and Technology 2003 37 1 179-255
[6]
Branting LK Context-sensitive detection of local community structure Social Network Analysis and Mining 2012 2 3 279-289
[7]
Carmona-Poyato Á, Fernández-Garcia NL, Madrid-Cuevas FJ, and Durán-Rosal AM A new approach for optimal offline time-series segmentation with error bound guarantee Pattern Recognition 2021 115 107917
[8]
Chae C, Yim JH, Lee J, Jo SJ, and Oh JR The bibliometric keywords network analysis of human resource management research trends: the case of human resource management journals in South Korea Sustainability 2020 12 14 5700
[9]
Chang PC, Fan CY, and Liu CH Integrating a piecewise linear representation method and a neural network model for stock trading points prediction IEEE Transactions on Systems, Man, and Cybernetics Part c: Applications and Reviews 2009 39 1 80-92
[10]
Chen B, Tsutsui S, Ding Y, and Ma FC Understanding the topic evolution in a scientific domain: An exploratory study for the field of information retrieval Journal of Informetrics 2017 11 4 1175-1189
[11]
Chen H, Zhang G, Zhu D, and Lu J A patent time series processing component for technology intelligence by trend identification functionality Neural Computing and Applications 2015 26 2 345-353
[12]
Chen H, Zhang G, Zhu D, and Lu J Topic-based technological forecasting based on patent data: A case study of Australian patents from 2000 to 2014 Technological Forecasting and Social Change 2017 119 39-52
[13]
Chen J, Chen J, Zhao S, Zhang Y, and Tang J Exploiting word embedding for heterogeneous topic model towards patent recommendation Scientometrics 2020 125 3 2091-2108
[14]
Chen X, Chen J, Wu D, Xie Y, and Li J Mapping the research trends by co-word analysis based on keywords from funded project Procedia Computer Science 2016 91 547-555
[15]
Cheng Q, Wang J, Lu W, Huang Y, and Bu Y Keyword-citation-keyword network: A new perspective of discipline knowledge structure analysis Scientometrics 2020 124 3 1923-1943
[16]
Cruz P and Cruz H Piecewise linear representation of finance time series: Quantum mechanical tool Acta Physica Polonica A. 2020 138 1 21-24
[17]
Ding W and Chen C Dynamic topic detection and tracking: A comparison of HDP, C-word, and cocitation methods Journal of the Association for Information Science and Technology 2014 65 10 2084-2097
[18]
Ding Y Community detection: Topological vs. topical Journal of Informetrics 2011 5 4 498-514
[19]
Ding Y and Stirling K Data-driven discovery: A new era of exploiting the literature and data Journal of Data and Information Science 2016 1 4 1-9
[20]
Ding Z, Liu R, Li Z, and Fan C A thematic network-based methodology for the research trend identification in building energy management Energies 2020 13 18 4621
[21]
Érdi P, Makovi K, Somogyvári Z, Strandburg K, Tobochnik J, Volf P, and Zalányi L Prediction of emerging technologies based on analysis of the US patent citation network Scientometrics 2013 95 1 225-242
[22]
Firth JR A synopsis of linguistic theory 1930–55 Studies in Linguistic Analysis the Philological Society 1957 1957 1-32
[23]
Fortunato S Community detection in graphs Physics Reports 2010 486 3–5 75-174
[24]
Gémar G and Jiménez-Quintero JA Text mining social media for competitive analysis Tourism & Management Studies 2015 11 1 84-90
[25]
Guimera R, Sales-Pardo M, and Amaral LA Classes of complex networks defined by role-to-role connectivity profiles Nature physics 2007 3 1 63-69
[26]
Holland GA Information science: an interdisciplinary effort? Journal of Document 2008 64 1 7-23
[27]
Hou J, Yang X, and Chen C Emerging trends and new developments in information science: A document co-citation analysis (2009–2016) Scientometrics 2018 115 2 869-892
[28]
Hu K, Wu H, Qi K, Yu J, Yang S, et al. A domain keyword analysis approach extending Term Frequency-Keyword Active Index with Google Word2Vec model Scientometrics 2018 114 3 1031-1068
[29]
Hu X Using social network analysis and text mining to analyze students' input on social media Library & Information Science Research 2014 32 3 732-741
[30]
Huang, G., & Zhou, X. (2016). A piecewise linear representation method of hydrological time series based on curve feature. In 2016 8th international conference on intelligent human-machine systems and cybernetics (IHMSC) (pp. 203–207). IEEE.
[31]
Huang L, Chen X, Ni X, Liu J, Cao X, and Wang C Tracking the dynamics of co-word networks for emerging topic identification Technological Forecasting and Social Change 2021 170 120944
[32]
Huang, L., Liu, F., & Zhang, Y. (2020). Overlapping community discovery for identifying key research themes. IEEE transactions on engineering management.
[33]
Isler Y and Kuntalp M Heart rate normalization in the analysis of heart rate variability in congestive heart failure In Proceedings of the Institution of Mechanical Engineers Part H Journal of Engineering in Medicine 2010 224 3 453
[34]
Iwata, T., Yamada, T., Sakurai, Y., & Ueda, N. (2010). Online multiscale dynamic topic models. In Proceedings of the 16th ACM Sigkdd international conference on knowledge discovery and data mining (pp. 663–672).
[35]
Jeong C, Jang S, Park E, and Choi S A context-aware citation recommendation model with BERT and graph convolutional networks Scientometrics 2020 124 3 1907-1922
[36]
Jeong DH and Min S Time gap analysis by the topic model-based temporal technique Journal of Informetrics 2014 8 3 776-790
[37]
Kai H, Qi K, Yang S, Shen S, Cheng X, Huayi W, Zheng J, McClure S, and Tianxing Y Identifying the “Ghost City” of domain topics in a keyword semantic space combining citations Scientometrics 2018 114 3 1141-1157
[38]
Katsurai M and Ono S TrendNets: Mapping research trends from dynamic co-word networks via sparse representation Scientometrics 2019 121 1583-1598
[39]
Keogh, E., Chu, S., Hart, D., & Pazzani, M. (2001). An online algorithm for segmenting time series. In Proceedings 2001 IEEE international conference on data mining (pp. 289–296).
[40]
Keogh E, Chu S, Hart D, and Pazzani M Segmenting time series: A survey and novel approach Data Min Time Ser Databases 2004 57 1-22
[41]
Kimura A, Kashino K, Kurozumi T, and Murase H A quick search method for audio signals based on a piecewise linear representation of feature trajectories IEEE Transactions on Audio, Speech and Language Processing 2008 16 2 396-407
[42]
Kiss A, Temesi G, Tompa O, Lakner Z, and Soós S Structure and trends of international sport nutrition research between 2000 and 2018: Bibliometric mapping of sport nutrition science Journal of the International Society of Sports Nutrition 2021 18 1 12
[43]
Klavans R and Boyack KW Which type of citation analysis generates the most accurate taxonomy of scientific and technical knowledge? Journal of the Association for Information Science and Technology 2017 68 4 984-998
[44]
Kleminski R, Kazienko P, and Kajdanowicz T Analysis of direct citation, co-citation and bibliographic coupling in scientific topic identification Journal of Information Science 2020
[45]
Kralj, J., Valmarska, A., Robnik-Šikonja, M., & Lavrač, N. (2015). Mining text enriched heterogeneous citation networks. In Pacific-Asia conference on knowledge discovery and data mining (pp. 672–683). Springer, Cham.
[46]
Kuhn, T. S. (1962). The structure of scientifific revolutions. University of Chicago Press.
[47]
Lancichinetti A and Fortunato S Community detection algorithms: A comparative analysis Physical review E 2009 80 5 056117
[48]
Levy, O., & Goldberg, Y. (2014). Neural word embedding as implicit matrix factorization. In Advances in neural information processing systems (pp. 2177–2185).
[49]
Li G-C, Lai R, D’Amour A, Doolin DM, Sun Y, Torvik VI, Yu AZ, and Fleming L Disambiguation and co-authorship networks of the US patent inventor database (1975–2010) Research Policy 2014 43 6 941-955
[50]
Liu Z Visualizing the intellectual structure in urban studies: A journal co-citation analysis (1992–2002) Scientometrics 2005 62 3 385-402
[51]
Luo L and Chen X Integrating piecewise linear representation and weighted support vector machine for stock trading signal prediction Applied Soft Computing Journal 2013 13 2 806-816
[52]
Mathieu RG and Gibson JE A methodology for large-scale R&D planning based on cluster analysis IEEE Transactions on Engineering Management 1993 40 3 283-292
[53]
McCain KW Assessing an author's influence using time series historiographic mapping: The oeuvre of Conrad Hal Waddington (1905–1975) Journal of the American Society for Information Science and Technology 2008 59 4 510-525
[54]
Mei, Q. Z., & Zhai, C. X. (2005). Discovering evolutionary theme patterns from text: an exploration of temporal text mining. In Proceedings of the 11th ACM Sigkdd international conference on knowledge discovery and data mining (pp. 198–207).
[55]
Miao Z, Du J, Dong F, Liu Y, and Wang X Identifying technology evolution pathways using topic variation detection based on patent data: A case study of 3D printing Futures 2020 118 102530
[56]
Mikolov T, Sutskever I, Chen K, et al. Distributed representations of words and phrases and their compositionality Advances in Neural Information Processing Systems 2013 26 3111-3119
[57]
Moreno A and Terwiesch C Doing business with strangers: Reputation in online service marketplaces Information Systems Research 2014 25 4 865-886
[58]
Newman MEJ Fast algorithm for detecting community structure in networks Physical review E 2004 69 6 066133
[59]
Newman MEJ Communities, modules and large-scale structure in networks Nature Physics 2012 8 8 25-31
[60]
Newman MEJ and GIirvan M Finding and evaluating community structure in networks Physical review 2004 69 2 108-113
[61]
Nguyen THD, Melcer E, Canossa A, Isbister K, and Seif El-Nasr M Seagull: A bird's-eye view of the evolution of technical games research Entertainment Computing 2018 26 88-104
[62]
No HJ, An Y, and Park Y A structured approach to explore knowledge flows through technology-based business methods by integrating patent citation analysis and text mining Technological Forecasting & Social Change 2015 97 181-192
[63]
Onan A Two-Stage Topic Extraction Model for Bibliometric Data Analysis Based on Word Embeddings and Clustering IEEE Access 2019 7 145614-145633
[64]
Onan A and Toolu MA Weighted word embeddings and clustering-based identification of question topics in mooc discussion forum posts Computer Applications in Engineering Education. 2020 29 675-689
[65]
Palla G, Barabási A-L, et al. Quantifying social group evolution Nature 2007 446 7136 664
[66]
Park, I., & Yoon, B. (2018). Technological opportunity discovery for technological convergence based on the prediction of technology knowledge flow in a citation network. Journal of Informetrics, 12(4), 1199–1222.
[67]
Pépin L, Kuntz P, Blanchard J, Guillet F, and Suignard P Visual analytics for exploring topic long-term evolution and detecting weak signals in company targeted tweets Computers & Industrial Engineering 2017 112 450-458
[68]
Qi L, Wang Y, Chen J, Liao M, and Zhang J Culture under complex perspective: A classification for traditional Chinese cultural elements based on NLP and complex networks Complexity 2021 2021 1-15
[69]
Qian Y, Liu Y, and Sheng QZ Understanding hierarchical structural evolution in a scientific discipline: A case study of artificial intelligence Journal of Informetrics 2020 14 3 101047
[70]
Qiu J and Lin Z A framework for exploring organizational structure in dynamic social networks Decision Support Systems 2011 51 4 760-771
[71]
Rabitz F, Olteanu A, Jurkevičienė J, and Budžytė A A topic network analysis of the system turn in the environmental sciences Scientometrics 2021 126 3 2107-2140
[72]
Rees BS and Gallagher KB Overlapping community detection using a community optimized graph swarm Social Network Analysis & Mining 2012 2 4 405-417
[73]
Ren, H., Renoust, B., Melançon, G., Viaud, M.-L. & Satoh, S. (2018). Exploring temporal communities in mass media archives.
[74]
Schwartz, R., Reichart, R., & Rappoport, A. (2015). Symmetric pattern based word embeddings for improved word similarity prediction. In Proceedings of the nineteenth conference on computational natural language learning.
[75]
Sharef NM, Martin T, and Azmimurad MA Conceptually related lexicon clustering based on word context association mining International Journal of Information Processing & Management 2013 4 3 40-50
[76]
Sharma D, Kumar B, Chand S, and Shah RR Uncovering research trends and topics of communities in machine learning Multimedia Tools and Applications 2021 80 6 9281-9314
[77]
Sheng, Z., Hailong, C., Chuan, J., & Shaojun, Z. (2015). An adaptive time window method for human activity recognition. In 2015 IEEE 28th Canadian conference on electrical and computer engineering (CCECE) (pp. 1188–1192). IEEE.
[78]
Silvestrini, P., Amato, U., Vettoliere, A., Silvestrini, S., & Ruggiero, B. (2017). Rate equation leading to hype-type evolution curves: A mathematical approach in view of analysing technology development. Technological Forecasting and Social Change, 116, 1–12.
[79]
Steven AG Understanding belief using citation networks Journal of Evaluation in Clinical Practice 2011 17 2 389-393
[80]
Su LX, Lyu PH, Yang Z, and Ding S Scientometric cognitive and evaluation on smart city related construction and building journals data Scientometrics 2015 105 1 449-470
[81]
Sud P and Thelwall M Evaluating altmetrics Scientometrics 2014 98 2 1131-1143
[82]
Sun, J. M., Yu, P. S., Papadimitriou, S., & Faloutsos, C. (2007). GraphScope: Parameter-free mining of large Time-eevolving graphs. In Proceedings of the 13th ACM Sigkdd international conference on Knowledge discovery and data mining (pp. 687–696). New York: ACM.
[83]
Sun X and Ding K Identifying and tracking scientific and technological knowledge memes from citation networks of publications and patents Scientometrics 2018 116 3 1735-1748
[84]
Symeon P, Yiannis K, Athena V, and Ploutarchos S Community detection in social media, performance and application considerations Journal of Data Mining Knowledge Discovery 2012 24 3 515-554
[85]
The YW, Jordan MI, Beal MJ, and Blei DM Hierarchical Dirichlet processes Journal of the American Statistical Association 2006 101 1566-1581
[86]
Tseng YH, Lin CJ, and Lin YI Text mining techniques for patent analysis Information Processing & Management 2007 43 5 1216-1247
[87]
Vaio GD and Weisdorf JL Ranking economic history journals: A citation-based impact-adjusted analysis Discussion Papers 2009 4 1 1-17
[88]
Van Raan AF Sleeping beauties in science Scientometrics 2004 59 3 467-472
[89]
Verma M Cluster based ranking index for enhancing recruitment process using text mining and machine learning International Journal of Computer Applications 2017 157 9 23-30
[90]
Wang B, Liu S, Ding K, Liu Z, and Xu J Identifying technological topics and institution-topic distribution probability for patent competitive intelligence analysis: A case study in LTE technology Scientometrics 2014 101 1 685-704
[91]
Wang, C., Blei, D., & Heckerman, D. (2008). Continuous time dynamic topic models. In Proceedings of the international conference on uncertainty in artificial intelligence (pp. 579–586).
[92]
Wang, Q., She, J., Song, T., Tong, Y., Chen, L., & Xu, K. (2016). Adjustable time-window-based event detection on twitter. In international conference on web-age information management (pp. 265–278). Springer, Cham.
[93]
Wang, X., & Mccallum, A. (2006). Topics over time: a non-Markov continuous-time model of topical trends. In Acm Sigkdd International conference on knowledge discovery & data mining (pp. 424–433). ACM.
[94]
Wang X, Cheng Q, and Lu W Analyzing evolution of research topics with NEViewer: A new method based on dynamic co-word networks Scientometrics 2014 101 2 1253-1271
[95]
Wang Y, Liu Z, and Sun M Incorporating linguistic knowledge for learning distributed word representations PloS one 2015 10 4 e0118437
[96]
Wasserman, S., & Faust, K. (1994). Social network analysis methods and applications. Contemporary Sociology, 91(435).
[97]
Wu H, Yi H, and Li C An integrated approach for detecting and quantifying the topic evolutions of patent technology: a case study on graphene field Scientometrics 2021 126 1-21
[98]
Xie J, Kelley S, and Szymanski BK Overlapping community detection in networks: The state-of-the-art and comparative study Acm Computing Surveys (csur) 2013 45 4 1-35
[99]
Xu Y, Zhang S, Zhang W, Yang S, and Shen Y Research front detection and topic evolution based on topological structure and the PageRank algorithm Symmetry 2019 11 3 310
[100]
Xu H, Winnink J, Yue Z, Liu Z, and Yuan G Topic-linked innovation paths in science and technology Journal of Informetrics 2020 14 2 101014
[101]
Yan, C., Yi, C., Wu, L., & Fang, J. (2015). Trend Feature Extraction in Condition Monitoring by a New Piecewise Linear Representation Method. In First international conference on information sciences, machinery, materials and energy (pp. 1378–1383). Atlantis Press.
[102]
Yang, B., Liu, D., & Liu, J. (2010). Discovering communities from social networks: methodologies and applications. In Handbook of social network technologies and applications (pp. 331–346). Springer.
[103]
Yang Y, Wu M, and Cui L Integration of three visualization methods based on co-word analysis Scientometrics 2012 90 2 659-673
[104]
Yau CK, Porter A, Newman N, and Suominen A Clustering scientifc documents with topic modeling Scientometrics 2014 100 3 767-786
[105]
You H, Li M, Hipel KW, et al. Development trend forecasting for coherent light generator technology based on patent citation network analysis Scientometrics 2017 111 1 297-315
[106]
Zeng Q, Hu X, and Li C Extracting keywords with topic embedding and network structure analysis Data Analysis and Knowledge Discovery 2019 3 7 52-60
[107]
Zhang F and Wu S Measuring academic entities’ impact by content-based citation analysis in a heterogeneous academic network Scientometrics 2021 126 1-26
[108]
Zhang Y, Lu J, Liu F, Liu Q, Porter A, Chen H, et al. Does deep learning help topic extraction? a kernel k-means clustering method with word embedding Journal of Informetrics 2018 12 4 1099-1117
[109]
Zhang Y, Porter AL, Hu Z, Guo Y, and Newman NC “Term clumping” for technical intelligence: A case study on dye-sensitized solar cells Technological Forecasting and Social Change 2014 85 26-39
[110]
Zhang, Y., Wu, M., Miao, W., Huang, L., & Lu, J. (2021). Bi-layer network analytics: A methodology for characterizing emerging general-purpose technologies. Available at SSRN 3830937.
[111]
Zhang Y, Zhang G, Zhu D, and Lu J Scientific evolutionary pathways: Identifying and visualizing relationships for scientific topics Journal of the Association for Information Science & Technology 2017 68 8 1925-1939
[112]
Zhou, D., Ji, X., Zha, H., & Giles, C. L. (2006). Topic evolution and social interactions: how authors effect research. In Proceedings of the 15th ACM international conference on Information and knowledge management (pp. 248–257).
[113]
Zhou HK, Yu H, and Hu R Topic evolution based on the probabilistic topic model: A review Frontiers of Computer Science 2017 11 5 786-802
[114]
Zhou P and Jiang D Study on the evolution of hot topics in the urban development Evolutionary Intelligence 2020
[115]
Zhou X, Huang L, Porter A, Vicentegomila JM, and Phillips F Tracing the system transformations and innovation pathways of an emerging technology: solid lipid nanoparticles Technological Forecasting and Social Change 2019 146 785-794
[116]
Zhu, J., Li, X., Peng, M., Huang, J., Qian, T., Huang, J., Liu, J., Hong, R. & Liu, P. (2015). Coherent topic hierarchy: A strategy for topic evolutionary analysis on microblog feeds. In International conference on web-age information management (pp. 70–82). Springer, Cham.

Cited By

View all
  • (2024)Revealing the technology development of natural language processingInformation Processing and Management: an International Journal10.1016/j.ipm.2023.10357461:1Online publication date: 1-Jan-2024
  • (2024)Exploring technology fusion by combining latent Dirichlet allocation with Doc2vec: a case of digital medicine and machine learningScientometrics10.1007/s11192-024-05069-1129:7(4043-4070)Online publication date: 1-Jul-2024
  • (2024)Measuring the evolving stage of temporal distribution of research topic keyword in scientific literature by research heat curveScientometrics10.1007/s11192-024-04937-0129:11(7287-7328)Online publication date: 1-Nov-2024
  • Show More Cited By

Index Terms

  1. Identification of topic evolution: network analytics with piecewise linear representation and word embedding
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image Scientometrics
        Scientometrics  Volume 127, Issue 9
        Sep 2022
        491 pages

        Publisher

        Springer-Verlag

        Berlin, Heidelberg

        Publication History

        Published: 01 September 2022
        Accepted: 12 January 2022
        Received: 13 March 2021

        Author Tags

        1. Bibliometrics
        2. Topic analysis
        3. Network analytics
        4. Topic evolution

        Qualifiers

        • Research-article

        Funding Sources

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 20 Jan 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)Revealing the technology development of natural language processingInformation Processing and Management: an International Journal10.1016/j.ipm.2023.10357461:1Online publication date: 1-Jan-2024
        • (2024)Exploring technology fusion by combining latent Dirichlet allocation with Doc2vec: a case of digital medicine and machine learningScientometrics10.1007/s11192-024-05069-1129:7(4043-4070)Online publication date: 1-Jul-2024
        • (2024)Measuring the evolving stage of temporal distribution of research topic keyword in scientific literature by research heat curveScientometrics10.1007/s11192-024-04937-0129:11(7287-7328)Online publication date: 1-Nov-2024
        • (2024)Detecting technological recombination using semantic analysis and dynamic network analysisScientometrics10.1007/s11192-023-04812-4129:11(7385-7416)Online publication date: 1-Nov-2024
        • (2024)Exploring Technology Evolution Pathways Based on Link Prediction on Multiplex Network: Illustrated as CRISPRWisdom, Well-Being, Win-Win10.1007/978-3-031-57860-1_8(105-121)Online publication date: 15-Apr-2024
        • (2023)Evolution analysis of cross-domain collaborative research topic: a case study of cognitive-based product conceptual designScientometrics10.1007/s11192-023-04865-5128:12(6695-6718)Online publication date: 1-Dec-2023

        View Options

        View options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media