CN112507707A - Correlation degree analysis and judgment method for innovative technologies in different fields of power internet of things - Google Patents
Correlation degree analysis and judgment method for innovative technologies in different fields of power internet of things Download PDFInfo
- Publication number
- CN112507707A CN112507707A CN202011408521.2A CN202011408521A CN112507707A CN 112507707 A CN112507707 A CN 112507707A CN 202011408521 A CN202011408521 A CN 202011408521A CN 112507707 A CN112507707 A CN 112507707A
- Authority
- CN
- China
- Prior art keywords
- chinese
- english
- word
- sub
- fields
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000005516 engineering process Methods 0.000 title claims abstract description 24
- 238000000034 method Methods 0.000 title claims abstract description 15
- 238000004458 analytical method Methods 0.000 title claims abstract description 7
- 239000011159 matrix material Substances 0.000 claims abstract description 66
- 238000013507 mapping Methods 0.000 claims abstract description 10
- 238000012549 training Methods 0.000 claims abstract description 8
- 239000013598 vector Substances 0.000 claims description 55
- 238000013519 translation Methods 0.000 claims description 16
- 230000014616 translation Effects 0.000 claims description 16
- 108020001568 subdomains Proteins 0.000 claims description 15
- 238000004364 calculation method Methods 0.000 claims description 7
- 230000011218 segmentation Effects 0.000 claims description 7
- 238000012545 processing Methods 0.000 claims description 5
- 238000013332 literature search Methods 0.000 claims description 4
- OAICVXFJPJFONN-UHFFFAOYSA-N Phosphorus Chemical compound [P] OAICVXFJPJFONN-UHFFFAOYSA-N 0.000 claims 1
- 238000003672 processing method Methods 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 4
- 238000011160 research Methods 0.000 description 4
- 238000011161 development Methods 0.000 description 3
- 230000018109 developmental process Effects 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 230000008878 coupling Effects 0.000 description 2
- 238000010168 coupling process Methods 0.000 description 2
- 238000005859 coupling reaction Methods 0.000 description 2
- 230000004927 fusion Effects 0.000 description 2
- 238000003058 natural language processing Methods 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 241000228740 Procrustes Species 0.000 description 1
- 230000001808 coupling effect Effects 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 238000011478 gradient descent method Methods 0.000 description 1
- 238000000691 measurement method Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Economics (AREA)
- Probability & Statistics with Applications (AREA)
- Public Health (AREA)
- Water Supply & Treatment (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to a correlation degree analysis and judgment method for innovative technologies in different fields of power internet of things, and belongs to the technical field of data processing methods for management. The method comprises the steps of dividing the power Internet of things into 8 sub-fields, obtaining documents through retrieval, and extracting titles, abstracts, keywords and publication years of the documents as document data; extracting sentences containing keywords in the abstract as input of a space tool, training to obtain an entity recognition model, and traversing each sentence in the abstract to perform entity recognition to obtain key technical terms of the power internet of things; mapping Chinese and English literature data to a Chinese and English bilingual word embedding matrix by using a word embedding model, constructing a co-occurrence matrix of key technical terms and sub-fields, calculating two-dimensional mutual information of any two sub-fields, and finally judging the association strength between innovation technologies of any two sub-fields according to the two-dimensional mutual information. The method can provide reliable data sources for judging the association degree between the innovative technologies of the power Internet of things in different fields.
Description
Technical Field
The invention relates to a method for analyzing and judging mutual cooperative relationship among innovative technologies in different sub-fields of an electric power internet of things, and belongs to the technical field of data processing methods suitable for management.
Background
The electric power internet of things is an information physical fusion system, and the construction process of the electric power internet of things is also an innovative application process of the internet of things related technology in an electric power system. The research on the technical coupling action points and the collaborative innovation relationship between the internet of things related technology and the power system is beneficial to searching key technical breakthrough points of the power internet of things and developing efficient innovation paths.
At present, the coupling collaborative research aiming at the electric power system and the innovative technology of the internet of things focuses on the technical development situation of the internet of things, but because the electric power internet of things is a physical information fusion system and the technical innovation thereof comprises two aspects of construction of the electric power system and the internet of things, the currently known coupling collaborative research aiming at the electric power system and the innovative technology of the internet of things cannot provide an effective and reliable analysis basis for judging the development direction of the innovative technology of the electric power internet of things.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: effective and reliable data basis is provided for judging the development direction of the power Internet of things innovation technology.
The technical scheme provided by the invention for solving the technical problems is as follows: a correlation degree analysis and judgment method for innovative technologies in different fields of the power Internet of things comprises the following steps:
dividing the power internet of things into 8 sub-fields of a power source end, a network end, a load end, a storage end, an internet of things sensing layer, a network layer, a computing layer and an application layer; constructing Chinese and English literature search formulas related to the sub-fields according to the definitions of the sub-fields, wherein each search formula comprises a plurality of search terms; searching the technical sub-fields from the known network and the Web of Science core database according to the search formula, respectively acquiring Chinese documents and English documents, respectively extracting titles, abstracts, keywords and publication years from the Chinese documents and the English documents as Chinese document data and English document data, and forming Chinese document data and English document data together;
step 2.1, extracting sentences containing the keywords corresponding to each document in the abstract of each document, and taking the extracted sentences as input of a space tool and training to obtain an entity recognition model;
step 2.2, traversing each sentence in all abstracts in the Chinese and English literature data by using the entity recognition model to perform entity recognition, if the recognized entity is in the same sentence with the keyword of the literature, using the entity as a key technical term of the power Internet of things, and counting the occurrence times of all power Internet of things key technical terms in the Chinese and English literature data;
3.1, self-defining a Chinese and English translation anchor file, wherein the Chinese and English translation anchor file defines the one-to-one correspondence of Chinese and English translations of common words, obtaining an additional Chinese and English correspondence by calling a search word in a Baidu translation Chinese search formula and translating Chinese in the key technical term obtained in the step 2.2 into English, and adding the additional Chinese and English correspondence to the Chinese and English translation anchor file;
step 3.2, performing word segmentation processing on the Chinese and English literature data by using Chinese and English word segmentation tools respectively to obtain a Chinese word sequence of the Chinese literature data and an English word sequence of the English literature data respectively, training the Chinese word sequence and the English word sequence of the literature data respectively by using word2vec to obtain a word vector of each word, wherein the word vectors respectively form a Chinese literature word embedding matrix and an English literature word embedding matrix, and the dimension of each matrix is the number of words in the corresponding Chinese literature or English literature data multiplied by the same word vector dimension d;
and 3.3, constructing a bilingual word vector mapping model from Chinese to English, as shown in a formula (1).
Where d denotes the word vector dimension, Md(R) represents a real matrix defined on a real number field R by d, S and T represent a Chinese embedding matrix and an English literature word embedding matrix respectively, W is a weight matrix, argmin represents minimizing the distance between the Chinese word contribution embedding matrix S and the English literature word embedding matrix T | | | WS-T | | |F,|| ||FExpressing Frobenius norm, and obtaining an optimal weight matrix W by a calculation result*And obtaining a Chinese and English bilingual word embedding matrix through the bilingual word vector mapping model.
step 4.1, dividing the search word into 8 types according to 8 sub-fields, and taking a word vector corresponding to the search word as a word vector v of the search word according to the Chinese and English literature bilingual word embedding matrix obtained in the step 3;
step 4.2, selecting a word vector corresponding to the key technical term from the Chinese and English literature bilingual word embedding matrix obtained in the step 3 as a word vector u of the key technical term, calculating the similarity D (u, v) between the word vector u of the key technical term and the word vector v of the search word according to cosine similarity, and setting a similarity threshold value to be 0.3, wherein when D (u, v) > 0.3, the key technical term belongs to the sub-field corresponding to the word vector v of the search word;
step 4.3, obtaining the subordinate relationship between the key technical terms and the sub-fields according to the step 4.2, taking the sum of the times of all the key technical terms corresponding to the sub-fields appearing in the literature data as the co-occurrence times of the key technical terms and the sub-fields, and constructing a co-occurrence matrix of the key technical terms and the sub-fields according to the publication year division of the literature;
step 5, calculating the mutual information of any two sub-fields, specifically as follows:
step 5.1 for any two sub-domains x1And x2Respectively calculating the one-dimensional information entropy H (x) of the two sub-domains according to the formula (2)1) And H (x)2)。
Wherein x is a sub-domain, ciThe number of co-occurrences of key technical terms in the i (i ═ 1, 2.., 8) th sub-domain;
step 5.1 separately calculating two sub-domains x according to the formula (3)1And x2Two-dimensional information entropy H (x)1,x2),
Wherein, c1And c2Respectively being any two sub-fields x1And x2The number of co-occurrences of the key technical term of (c),
then the two sub-domains x1And x2The two-dimensional mutual information quantity is obtained by calculating the formula (4),
H(x1)+H(x2)-H(x1,x2) (4),
judging any two sub-fields x according to the two-dimensional mutual information quantity1And x2The degree of correlation between the innovative technologies of (1).
The invention has the beneficial effects that: because the power internet of things comprises sub-fields of power and the internet of things, most of the existing power internet of things innovation technology research based on scientific literature is based on literature statistical measurement methods, and therefore data analysis on the key technology of the power internet of things and the relation between the key technology and the related sub-fields related to the content of the scientific literature is lacked; according to the method, from the perspective of analyzing the text data of the electric power Internet of things literature, the key technical terms contained in the text of the electric power Internet of things sub-field literature are mined, the subordination relation between the technical terms and the sub-field is established, the co-occurrence times of the electric power Internet of things key technical terms and the electric power Internet of things sub-field are counted, and a more reliable data source is provided for judging the degree of the cooperative association between the electric power Internet of things innovation technologies in different fields.
Drawings
The method for analyzing and judging the association degree of innovative technologies in different fields of the power internet of things is further described with reference to the accompanying drawings.
Fig. 1 is a distribution diagram of a Chinese word-donation embedding matrix in a two-dimensional plane.
Fig. 2 is a distribution diagram of an english literature word embedding matrix in a two-dimensional plane.
Fig. 3 is a distribution diagram of a chinese-english bilingual word embedding matrix on a two-dimensional plane.
FIG. 4 is a relationship diagram of mutual information quantity between three pairs of source-load, source-store, and network-store domains.
Detailed Description
Examples
The relevance degree analysis and judgment method for the innovative technologies in different fields of the power internet of things comprises the following steps:
dividing the power internet of things into 8 sub-fields of a power source end, a network end, a load end, a storage end, an internet of things sensing layer, a network layer, a computing layer and an application layer; constructing Chinese and English literature search formulas related to the sub-fields according to the definitions of the sub-fields, wherein each search formula comprises a plurality of search terms; searching the technical sub-fields from the HowNet and the Web of Science core database according to the search formula, respectively obtaining Chinese documents and English documents, and extracting titles, abstracts, keywords and publication years of the documents (including the Chinese documents and the English documents) as document data (including Chinese document data and English document data); the chinese and english literature search formula part of this example is shown in table 1 below,
TABLE 1
The number of documents retrieved and acquired in this embodiment is shown in table 2.
TABLE 2
step 2.1, extracting sentences containing the keywords corresponding to the documents in the abstract of each document, taking the extracted sentences as input of a space tool and training to obtain an entity recognition model, wherein the space tool is an open source tool designed aiming at NLP word segmentation, entity recognition and part of speech tagging and supports custom training of the entity recognition model;
and 2.2, traversing each sentence in all abstracts in the document data by using the entity identification model to identify an entity, if the identified entity is in the same sentence with the keyword of the document, taking the entity as a key technical term of the power internet of things, counting the occurrence frequency of all key technical terms of the power internet of things in the document data, and obtaining the key technical term with higher occurrence frequency as shown in a table 3.
TABLE 3
And 3, uniformly vectorizing the Chinese and English literature data of the power Internet of things, and mapping the Chinese and English literature data to a Chinese and English bilingual word embedding matrix by using a word embedding model. In order to avoid the influence of Chinese and English literature data difference on the attribution of the sub-fields for judging the key technical terms of the power internet of things, the multi-language natural language processing word embedding technology is used for vectorizing the Chinese and English literature data to obtain Chinese and English words and donation word embedding matrixes distributed in the same vector space, so that the dependency relationship between the key technical terms of the power internet of things and the sub-fields is conveniently established, and the method specifically comprises the following steps:
3.1, self-defining a Chinese and English translation anchor file, wherein the Chinese and English translation anchor file defines the one-to-one correspondence of Chinese and English translations of common words, obtaining an additional Chinese and English correspondence by calling a search word in a Baidu translation Chinese search formula and translating Chinese in the key technical term obtained in the step 2.3 into English, and adding the additional Chinese and English correspondence to the Chinese and English translation anchor file;
step 3.2, performing word segmentation processing on document data (including Chinese document data and English document data) by using Chinese and English word segmentation tools (such as jieba and nltk) respectively to obtain Chinese word sequences and English word sequences of the document data respectively, and training the Chinese word sequences and the English word sequences respectively by using word2vec to obtain word vectors of each word (the word2vec model can represent the words as multidimensional vectors so as to map texts to word embedding matrixes formed by the multidimensional vectors), wherein the word vectors respectively form a Chinese document word embedding matrix and an English document word embedding matrix, and the dimension of each matrix is the dimension d of the word vector multiplied by the number of the words in the corresponding document data (the Chinese document data or the English document data); the word vector represents each word as a vector, and the dimensions of the word vector indicate the number of elements contained in the vector. The word vector dimension is set to 300 in this embodiment. Fig. 1 and fig. 2 show the distribution of the chinese literature word embedding matrix and the english literature word embedding matrix in a two-dimensional plane, respectively.
Step 3.3, a bilingual word vector mapping model from Chinese to English is constructed, as shown in formula (1),
where d denotes the word vector dimension, Md(R) represents a real matrix defined on a real number field R by d, S and T represent a Chinese embedding matrix and an English literature word embedding matrix respectively, W is a weight matrix, argmin represents minimizing the distance between the Chinese word contribution embedding matrix S and the English literature word embedding matrix T | | | WS-T | | |F,|| ||FExpressing Frobenius norm, and obtaining an optimal weight matrix W by a calculation result*(ii) a And obtaining a Chinese and English bilingual word embedding matrix through the bilingual word vector mapping model.
The optimization goal of the model is to solve a weight matrix W so that the distance between the Chinese donation word embedding matrix and the English literature word embedding matrix is | | | WS-T | |FThe shortest, thereby unifying the vector space where the Chinese word-donation embedding matrix and the English literature word embedding matrix are located; the model can be converted into a Procrustes problem, and iterative solution is carried out by adopting a singular value decomposition and gradient descent method to obtain the optimal W*The translation anchor file provides a one-to-one correspondence relationship of partial Chinese and English reference words, and the Chinese and English reference words are embedded into any two Chinese sums in the matrixThe distance between the English word vectors can be indirectly calculated by solving the word vector distance between each word vector and the reference word in the same language. Thus, by the weight matrix W*The Chinese document word embedding matrix can be mapped to the same vector space as the English document word embedding matrix, so that word vectors in the word embedding matrix can be compared with each other to jointly form a Chinese and English document bilingual word embedding matrix, the word embedding matrix comprises word vectors corresponding to all word sequences of Chinese and English document data, and the distribution of the Chinese and English bilingual word embedding matrix on a two-dimensional plane is shown as shown in fig. 3.
step 4.1, extracting the search words in the search formula in the table 1, classifying the search words into 8 types according to 8 sub-fields, and extracting word vectors corresponding to the search words as word vectors v of the search words according to the Chinese and English literature bilingual word embedding matrix obtained in the step 3;
step 4.2, selecting a word vector corresponding to each key technical term from the word vectors obtained in the step 4.1 as a word vector u of each key technical term, calculating the similarity D (u, v) of the word vector u of each key technical term and the word vector v of the search word according to cosine similarity, and setting a similarity threshold value to be 0.3, wherein when D (u, v) > 0.3, the key technical term belongs to a sub-field corresponding to the word vector v of the search word;
step 4.3: obtaining the dependency relationship between each key technical term and 8 sub-fields according to the calculation process in the step 4.2, taking the sum of the occurrence times of all key technical terms corresponding to the 8 sub-fields in the document data as the co-occurrence times of the key technical terms and the sub-fields (called term-field for short), and dividing according to the publication years of the document to which each key technical term belongs, and constructing a co-occurrence matrix of each key technical term and 8 sub-fields; as shown in table 4.
TABLE 4
And 5: and calculating mutual information of any two sub-fields.
Step 5.1: for any two sub-domains x1And x2Respectively calculating the one-dimensional information entropy H (x) of the two sub-domains according to the formula (2)1) And H (x)2),
Wherein x is a sub-domain, ciThe number of co-occurrences of key technical terms in the i (i ═ 1, 2.., 8) th sub-domain; for example, the one-dimensional entropy of the power source terminal field in 2010 is
Step 5.1: respectively calculating two sub-domains x according to formula (3)1And x2Two-dimensional information entropy H (x)1,x2),
Wherein, c1And c2Respectively being any two sub-fields x1And x2The number of co-occurrences of the key technical term of (c),
then the two sub-domains x1And x2The two-dimensional mutual information quantity is obtained by calculating the formula (4),
H(x1)+H(x2)-H(x1,x2) (4),
FIG. 4 shows the calculation results of the mutual information amount between the three pairs of source-load, source-store and network-store sub-domains. Shown in table 5, the two-dimensional average mutual information calculation results of 8 sub-domains are obtained by adding and averaging the two-dimensional mutual information of any two sub-domains in 2010-2019,
TABLE 5(mbit)
According to the two-dimensional mutual information quantity obtained by the calculation, any two sub-fields x can be judged1And x2The degree of correlation between the innovative technologies of (1).
The above description is only for the preferred embodiment of the present invention, but the present invention is not limited thereto, for example. All equivalents and modifications of the inventive concept and its technical solutions are intended to be included within the scope of the present invention.
Claims (1)
1. A correlation degree analysis and judgment method for innovative technologies in different fields of the power Internet of things is characterized by comprising the following steps:
step 1, dividing and collecting document data in the field of power internet of things, specifically comprising the following steps:
dividing the power internet of things into 8 sub-fields of a power source end, a network end, a load end, a storage end, an internet of things sensing layer, a network layer, a computing layer and an application layer; constructing Chinese and English literature search formulas related to the sub-fields according to the definitions of the sub-fields, wherein each search formula comprises a plurality of search terms; searching the technical sub-fields from the known network and the Web of Science core database according to the search formula, respectively acquiring Chinese documents and English documents, respectively extracting titles, abstracts, keywords and publication years from the Chinese documents and the English documents as Chinese document data and English document data, and forming Chinese document data and English document data together;
step 2, obtaining key technical terms of the power internet of things, specifically as follows:
step 2.1, extracting sentences containing the keywords corresponding to each document in the abstract of each document, and taking the extracted sentences as input of a space tool and training to obtain an entity recognition model;
step 2.2, traversing each sentence in all abstracts in the Chinese and English literature data by using the entity recognition model to perform entity recognition, and if the recognized entity is in the same sentence with the keyword of the literature, taking the entity as a key technical term of the power Internet of things;
step 3, performing unified vectorization processing on the Chinese and English literature data of the power internet of things, and mapping the Chinese and English literature data to a Chinese and English bilingual word embedding matrix by using a word embedding model, wherein the steps are as follows:
3.1, self-defining a Chinese and English translation anchor file, wherein the Chinese and English translation anchor file defines the one-to-one correspondence of Chinese and English translations of common words, obtaining an additional Chinese and English correspondence by calling a search word in a Baidu translation Chinese search formula and translating Chinese in the key technical term obtained in the step 2.2 into English, and adding the additional Chinese and English correspondence to the Chinese and English translation anchor file;
step 3.2, performing word segmentation processing on the Chinese and English literature data by using Chinese and English word segmentation tools respectively to obtain a Chinese word sequence of the Chinese literature data and an English word sequence of the English literature data respectively, training the Chinese word sequence and the English word sequence of the literature data respectively by using word2vec to obtain a word vector of each word, wherein the word vectors respectively form a Chinese literature word embedding matrix and an English literature word embedding matrix, and the dimension of each matrix is the number of words in the corresponding Chinese literature or English literature data multiplied by the same word vector dimension d;
and 3.3, constructing a bilingual word vector mapping model from Chinese to English, as shown in a formula (1).
Where d denotes the word vector dimension, Md(R) represents a real matrix defined on a real number domain R by d, S and T represent a Chinese embedding matrix and an English literature word embedding matrix, respectively, W is a weight matrix, and argmin represents a minimized Chinese contribution word embedding matrix S toDistance | | | WS-T | | non-conducting phosphor of English literature word embedding matrix TF,|| ||FExpressing Frobenius norm, and obtaining an optimal weight matrix W by a calculation result*And obtaining a Chinese and English bilingual word embedding matrix through the bilingual word vector mapping model.
Step 4, constructing a co-occurrence matrix of the key technical terms and the sub-fields, which comprises the following specific steps:
step 4.1, dividing the search word into 8 types according to 8 sub-fields, and taking a word vector corresponding to the search word as a word vector v of the search word according to the Chinese and English literature bilingual word embedding matrix obtained in the step 3;
step 4.2, selecting a word vector corresponding to the key technical term from the Chinese and English literature bilingual word embedding matrix obtained in the step 3 as a word vector u of the key technical term, calculating the similarity D (u, v) between the word vector u of the key technical term and the word vector v of the search word according to cosine similarity, and setting a similarity threshold value to be 0.3, wherein when D (u, v) > 0.3, the key technical term belongs to the sub-field corresponding to the word vector v of the search word;
step 4.3, obtaining the subordinate relationship between the key technical terms and the sub-fields according to the step 4.2, taking the sum of the times of all the key technical terms corresponding to the sub-fields appearing in the literature data as the co-occurrence times of the key technical terms and the sub-fields, and constructing a co-occurrence matrix of the key technical terms and the sub-fields according to the publication year division of the literature;
step 5, calculating the mutual information of any two sub-fields, specifically as follows:
step 5.1 for any two sub-domains x1And x2Respectively calculating the one-dimensional information entropy H (x) of the two sub-domains according to the formula (2)1) And H (x)2)。
Wherein x is a sub-domain, ciNo. (i) ═ 1, 2.., 8) th sub-collarThe number of co-occurrences of the key technical term of the domain;
step 5.1 separately calculating two sub-domains x according to the formula (3)1And x2Two-dimensional information entropy H (x)1,x2),
Wherein, c1And c2Respectively being any two sub-fields x1And x2The number of co-occurrences of the key technical term of (c),
then the two sub-domains x1And x2The two-dimensional mutual information quantity is obtained by calculating the formula (4),
H(x1)+H(x2)-H(x1,x2) (4),
judging any two sub-fields x according to the two-dimensional mutual information quantity1And x2The degree of correlation between the innovative technologies of (1).
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011408521.2A CN112507707A (en) | 2020-12-04 | 2020-12-04 | Correlation degree analysis and judgment method for innovative technologies in different fields of power internet of things |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011408521.2A CN112507707A (en) | 2020-12-04 | 2020-12-04 | Correlation degree analysis and judgment method for innovative technologies in different fields of power internet of things |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112507707A true CN112507707A (en) | 2021-03-16 |
Family
ID=74971709
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011408521.2A Pending CN112507707A (en) | 2020-12-04 | 2020-12-04 | Correlation degree analysis and judgment method for innovative technologies in different fields of power internet of things |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112507707A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113177420A (en) * | 2021-04-29 | 2021-07-27 | 同方知网(北京)技术有限公司 | Chinese-English bilingual dictionary construction method based on academic literature |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6510406B1 (en) * | 1999-03-23 | 2003-01-21 | Mathsoft, Inc. | Inverse inference engine for high performance web search |
CN101860978A (en) * | 2010-05-14 | 2010-10-13 | 南京邮电大学 | Internet of things system structure |
US20150356243A1 (en) * | 2013-01-11 | 2015-12-10 | Oslo Universitetssykehus Hf | Systems and methods for identifying polymorphisms |
CN106997341A (en) * | 2017-03-22 | 2017-08-01 | 山东大学 | A kind of innovation scheme matching process, device, server and system |
CN110852089A (en) * | 2019-10-25 | 2020-02-28 | 国家电网有限公司 | Operation and maintenance project management method based on intelligent word segmentation and deep learning |
CN111163057A (en) * | 2019-12-09 | 2020-05-15 | 中国科学院信息工程研究所 | User identification system and method based on heterogeneous information network embedding algorithm |
CN111753067A (en) * | 2020-03-19 | 2020-10-09 | 北京信聚知识产权有限公司 | Innovative assessment method, device and equipment for technical background text |
CN111931485A (en) * | 2020-08-12 | 2020-11-13 | 北京建筑大学 | Multi-mode heterogeneous associated entity identification method based on cross-network representation learning |
-
2020
- 2020-12-04 CN CN202011408521.2A patent/CN112507707A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6510406B1 (en) * | 1999-03-23 | 2003-01-21 | Mathsoft, Inc. | Inverse inference engine for high performance web search |
CN101860978A (en) * | 2010-05-14 | 2010-10-13 | 南京邮电大学 | Internet of things system structure |
US20150356243A1 (en) * | 2013-01-11 | 2015-12-10 | Oslo Universitetssykehus Hf | Systems and methods for identifying polymorphisms |
CN106997341A (en) * | 2017-03-22 | 2017-08-01 | 山东大学 | A kind of innovation scheme matching process, device, server and system |
CN110852089A (en) * | 2019-10-25 | 2020-02-28 | 国家电网有限公司 | Operation and maintenance project management method based on intelligent word segmentation and deep learning |
CN111163057A (en) * | 2019-12-09 | 2020-05-15 | 中国科学院信息工程研究所 | User identification system and method based on heterogeneous information network embedding algorithm |
CN111753067A (en) * | 2020-03-19 | 2020-10-09 | 北京信聚知识产权有限公司 | Innovative assessment method, device and equipment for technical background text |
CN111931485A (en) * | 2020-08-12 | 2020-11-13 | 北京建筑大学 | Multi-mode heterogeneous associated entity identification method based on cross-network representation learning |
Non-Patent Citations (3)
Title |
---|
周肖云 等: "基于专利计量的图书馆物联网技术发展态势研究", 图书馆杂志, no. 02, 15 February 2015 (2015-02-15), pages 82 - 87 * |
王其清 等: "基于自然语言处理和互信息的电力物联网技术协同创新研究", 《华北电力大学学报(自然科学版)》, 30 May 2021 (2021-05-30), pages 72 - 80 * |
陈良坤;梁胜涛;贺凯旋;: "物联网感知层结构关键技术及应用分析", 电力设备管理, no. 06, 25 June 2020 (2020-06-25), pages 192 - 194 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113177420A (en) * | 2021-04-29 | 2021-07-27 | 同方知网(北京)技术有限公司 | Chinese-English bilingual dictionary construction method based on academic literature |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112069408B (en) | Recommendation system and method for fusion relation extraction | |
CN108319583B (en) | Method and system for extracting knowledge from Chinese language material library | |
CN113268569B (en) | Semantic-based related word searching method and device, electronic equipment and storage medium | |
Zhang et al. | Exploiting parallel news streams for unsupervised event extraction | |
US20220207240A1 (en) | System and method for analyzing similarity of natural language data | |
CN115563313A (en) | Knowledge graph-based document book semantic retrieval system | |
CN111666766A (en) | Data processing method, device and equipment | |
CN111241410A (en) | Industry news recommendation method and terminal | |
CN111581943A (en) | Chinese-over-bilingual multi-document news viewpoint sentence identification method based on sentence association graph | |
Alian et al. | Arabic sentence similarity based on similarity features and machine learning | |
CN115309915A (en) | Knowledge graph construction method, device, equipment and storage medium | |
CN112507707A (en) | Correlation degree analysis and judgment method for innovative technologies in different fields of power internet of things | |
Ayetiran | An index-based joint multilingual/cross-lingual text categorization using topic expansion via BabelNet | |
Mohnot et al. | Hybrid approach for Part of Speech Tagger for Hindi language | |
Chang et al. | Incorporating word embedding into cross-lingual topic modeling | |
CN112597273A (en) | Power distribution automation chart generation method based on NL2SQL technology | |
Aejas et al. | Named entity recognition for cultural heritage preservation | |
Abimbola et al. | A noun-centric keyphrase extraction model: Graph-based approach | |
Kumari et al. | An Extractive Approach for Automated Summarization of Indian Languages using Clustering Techniques. | |
Wei et al. | Integrating visual word embeddings into translation language model for keyword spotting on historical Mongolian document images | |
CN113222119A (en) | Argument extraction method for multi-view encoder by using topological dependency relationship | |
CN107402914B (en) | Deep learning system and method for natural language | |
O'Keefe et al. | Dependency Based Bilingual word Embeddings without word alignment | |
Alqaisi et al. | Dependency Based Bilingual word Embeddings without word alignment | |
CN110728148B (en) | Entity relation extraction method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |