[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN111125373B - Concept node generation method and device and related products - Google Patents

Concept node generation method and device and related products Download PDF

Info

Publication number
CN111125373B
CN111125373B CN201911302866.7A CN201911302866A CN111125373B CN 111125373 B CN111125373 B CN 111125373B CN 201911302866 A CN201911302866 A CN 201911302866A CN 111125373 B CN111125373 B CN 111125373B
Authority
CN
China
Prior art keywords
value
value range
concept
range data
original
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911302866.7A
Other languages
Chinese (zh)
Other versions
CN111125373A (en
Inventor
马忠义
崔朝辉
赵立军
张霞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Neusoft Corp
Original Assignee
Neusoft Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Neusoft Corp filed Critical Neusoft Corp
Priority to CN201911302866.7A priority Critical patent/CN111125373B/en
Publication of CN111125373A publication Critical patent/CN111125373A/en
Application granted granted Critical
Publication of CN111125373B publication Critical patent/CN111125373B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a method and a device for generating concept nodes and related products. According to the method, value domain nodes corresponding to concept nodes in the map are divided according to two types of value domain data and unit data, an original value domain data set and unit concept nodes are obtained, and the concept coupling relation between numerical values and units is relieved. Dividing adjacent value fields formed by any two adjacent value field data in the original value field data set to obtain new value field data, and obtaining a numerical concept node by using the new value field data and the value field data in the original value field data set. The numerical concept node and the unit concept node are utilized to generate new concept nodes, the numerical value is in an unbound relation with the concept, and the numerical value concept node can also be used for generating other new concept nodes. The value domain data multiplexing capability is improved, and the new concept nodes can be used for generating new concept graphs, so that the value domain data are multiplexing, and the new concept graphs effectively save data storage resources and computing resources.

Description

Concept node generation method and device and related products
Technical Field
The present invention relates to the field of data storage and application, and in particular, to a method and apparatus for generating concept nodes, and related products.
Background
With the rapid development of internet technology and information technology, data contents represent an explosively growing situation. The characteristics of large scale, heterogeneous and multiple data contents and loose organization structure bring great challenges to people for effectively acquiring information and knowledge. The Knowledge Graph lays a foundation for Knowledge organization and intelligent application in the Internet age by the strong semantic processing capability and open organization capability.
The concept graph is a high-level knowledge graph, and has better application performance in various fields because of the existence of concept nodes, value domain nodes and term nodes and the connection among various types of nodes.
However, in the existing concept graph at present, a strict binding relationship exists between the numerical value and the concept, which results in very poor reusability of the value range data and causes waste of data storage resources and computing resources.
Disclosure of Invention
Based on the above problems, the present application provides a method, an apparatus, and a related product for generating concept nodes, which generate concept nodes in a new manner, and release the binding relationship between values and concepts, so as to improve the reusability of value range data, and save data storage resources and data computing resources.
The embodiment of the application discloses the following technical scheme:
in a first aspect, the present application provides a method for generating a concept node, including:
dividing value domain nodes corresponding to a plurality of concept nodes in an original concept map according to two types of value domain data and unit data to obtain an original value domain data set and unit concept nodes; the value domain node is a triplet comprising a lower bound, an upper bound and a unit; in the original value range data set, each value range data is arranged according to an ascending order;
dividing adjacent value fields formed by any two adjacent value field data in the original value field data set to obtain new value field data, and obtaining a numerical concept node by using the new value field data and the value field data in the original value field data set;
and generating a new concept node by using the numerical concept node and the unit concept node.
Optionally, for the value range node corresponding to each of the plurality of concept nodes in the original concept graph, dividing according to two types of value range data and unit data to obtain an original value range data set and unit concept nodes, which specifically includes:
obtaining the original value range data set by using non-repeated value range data in the lower bound and the upper bound in all the triples; the unit concept node is obtained using units in all of the triples.
Optionally, dividing an adjacent value range formed by any two adjacent value range data in the original value range data set to obtain new value range data, which specifically includes:
determining a function to be used according to the distribution condition of the value range data in the original value range data set;
dividing the adjacent value domain according to a preset granularity to obtain a dividing boundary value corresponding to the adjacent value domain, and adding the dividing boundary value into the adjacent value domain;
obtaining likelihood function values corresponding to the division boundary values by using the functions to be used, the lower bound and the upper bound of the adjacent value range, the division boundary values and the likelihood functions;
determining a partition boundary value that maximizes the likelihood function value and adding the partition boundary value as the new value range data to the original value range data set.
Optionally, determining the function to be used according to the distribution condition of the value range data in the original value range data set specifically includes:
judging whether value domain data in an original value domain data set is matched with probability distribution functions other than a Gaussian function, if so, determining the probability distribution functions as functions to be used; if not, determining a Gaussian function as the function to be used.
Optionally, after said adding the partition boundary value as said new value range data to said original value range data set, the method further comprises:
judging whether all adjacent value fields in the original value field data set meet a preset iteration ending condition, and ending iteration if so;
the obtaining a numeric concept node by using the new value range data and the value range data in the original value range data set specifically includes:
and obtaining the numerical concept node by using the value domain data in the original value domain data set after the iteration is finished.
Optionally, the above method further comprises: and generating a new concept graph by using the new concept nodes.
In a second aspect, the present application provides a generating apparatus for a concept node, including:
the first dividing module is used for dividing the value range nodes corresponding to the concept nodes in the original concept graph according to the two types of the value range data and the unit data to obtain an original value range data set and the unit concept nodes; the value domain node is a triplet comprising a lower bound, an upper bound and a unit; in the original value range data set, each value range data is arranged according to an ascending order;
The second dividing module is used for dividing an adjacent value range formed by any two adjacent value range data in the original value range data set to obtain new value range data, and obtaining a numerical concept node by using the new value range data and the value range data in the original value range data set;
and the concept node generation module is used for generating new concept nodes by utilizing the numerical value concept nodes and the unit concept nodes.
Optionally, the first dividing module specifically includes:
the first acquisition unit is used for acquiring the original value range data set by utilizing the value range data which are not repeated in the lower bound and the upper bound in all the triples;
and the second acquisition unit is used for acquiring the unit concept node by using units in all the triples.
Optionally, the second dividing module specifically includes:
the function determining unit is used for determining a function to be used according to the distribution condition of the value range data in the original value range data set;
the dividing unit is used for dividing the adjacent value domain according to a preset granularity, obtaining a dividing boundary value corresponding to the adjacent value domain, and adding the dividing boundary value into the adjacent value domain;
The computing unit is used for obtaining likelihood function values corresponding to the division boundary values by utilizing the functions to be used, the lower bound and the upper bound of the adjacent value range, the division boundary values and the likelihood functions;
and a value range data adding unit for determining a division boundary value that maximizes the likelihood function value, and adding the division boundary value as the new value range data to the original value range data set.
Optionally, the function determining unit is specifically configured to:
judging whether value domain data in an original value domain data set is matched with probability distribution functions other than a Gaussian function, if so, determining the probability distribution functions as functions to be used; if not, determining a Gaussian function as the function to be used.
Optionally, the apparatus further comprises: the judging module is used for judging whether all adjacent value fields in the original value field data set meet the preset iteration ending condition, and if so, ending the iteration;
the second dividing module is specifically configured to obtain the value concept node by using value domain data in the original value domain data set after the iteration is ended.
Optionally, the apparatus further comprises: and the map generation module is used for generating a new concept map by utilizing the new concept nodes.
In a third aspect, the present application provides a computer readable storage medium having stored therein a computer program which, when executed by a processor, implements a method of generating a concept node as provided in the first aspect.
In a fourth aspect, the present application provides a processor for executing a computer program, which when run performs the method of generating concept nodes as provided in the first aspect.
Compared with the prior art, the application has the following beneficial effects:
the value domain node in the original conceptual diagram is a triplet including a lower bound, an upper bound, and a unit. According to the method, value domain nodes corresponding to a plurality of concept nodes in the map are divided according to two types of value domain data and unit data, and an original value domain data set and unit concept nodes are obtained. By dividing, the value domain data and the unit data of the value domain node are stripped, and the conceptual coupling relation between the numerical value and the unit is relieved. And dividing adjacent value fields formed by any two adjacent value field data in the original value field data set to obtain new value field data, wherein the new value field data enriches the original value field data in the original data set. The value concept node can be obtained by using the new value range data and the value range data in the original value range data set, and the value concept node has the characteristic of easy identification due to the action of the new value range data. And because the original value range data set and the unit concept node realize decoupling of the numerical value and the unit, the numerical value concept node and the unit concept node obtained by using the original value range data set have no coupling relation.
In the application, the new concept node is generated by using the numerical concept node and the unit concept node, and because no coupling relationship exists between the numerical concept node obtained by the original value domain data set and the unit concept node, the numerical value and the concept are in an unbinding relationship, that is, the numerical concept node can also be used for generating other new concept nodes. For example, a numeric concept node may be used to generate a weight value concept node, and may also be used to generate a height value concept node. Therefore, the method, the device and the related equipment for generating the concept node generate the concept node in a new mode, the binding relation between the numerical value and the concept is relieved, and the reusability of the value range data is improved. The new concept node generated by adopting the technical scheme can be used for generating a new concept graph, and because the value domain data are reusable, the new concept graph can effectively save data storage resources and calculation resources.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive faculty for a person skilled in the art.
FIG. 1 is a schematic diagram of an original conceptual diagram;
fig. 2 is a flowchart of a method for generating a concept node according to an embodiment of the present application;
FIG. 3 is a schematic diagram of generating new concept nodes according to an embodiment of the present application;
FIG. 4 is a schematic diagram of another concept node generation provided by an embodiment of the present application;
FIG. 5 is a flowchart of another method for generating a concept node according to an embodiment of the present application;
FIG. 6 is a schematic diagram of a new conceptual diagram according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of a generating device of a concept node according to an embodiment of the present application;
fig. 8 is a hardware configuration diagram of a generating device of a concept node according to the present embodiment.
Detailed Description
The types of nodes typically included in a concept graph are: concept nodes, value range nodes, and term nodes. Wherein the value range node and the term node are nodes of the same class. The value range nodes and concept nodes are "representing" relationships, that is, a number of value range nodes represent (represent) a certain concept node. In addition, the term node and concept node are also "representing" relationships, that is, a plurality of term nodes represent (representation) a certain concept node. While concept nodes express concepts themselves. For ease of understanding, the node relationships in the original conceptual diagram are described below in conjunction with the accompanying drawings.
Referring to fig. 1, a schematic diagram of an original conceptual diagram is shown.
As shown in fig. 1, the graph includes node O, node a, and node B. Wherein the node O is a node for expressing the concept of a certain weight value, namely a weight value concept node for short; node A is a node for expressing the concept of weight, which is abbreviated as a weight concept node; node B is a node expressing the concept of overweight, simply referred to as overweight concept node. The path shown in fig. 1 from node O to node a to node B is an inferred logical path that infers the concept of "overweight".
Node O1..on is a value domain node, representing the actual value of node O. For example, the node O1 is ("unit" = "kg", "min_value" = "60", "max_value" = "80"), and the node O2 synonymous therewith is ("unit" = "t", "min_value" = "0.06", "max_value" = "0.08"). It can be seen that the value domain data and the unit data in the value domain node are in a mutual coupling relationship. Since value range nodes express concept nodes, values are bound to concepts.
Node A1, an is a weight term node, representing the actual value of node a. For example, node A1 may be expressed in terms of "body weight" and node A2 may be expressed in terms of "weight".
Node B1..bn is an overweight term node, representing the actual value of node B. For example, node B1 may be expressed in terms of "overweight" and node B2 may be expressed in terms of "overweight".
As can be seen from fig. 1, the actual value of each of the concept nodes O, A and B is related to the concept node by the expression (representation), that is, the actual value expresses the concept node. For example, value range node O1, no. On refers to a value range that does not exceed the range of concepts expressed by concept node O, so value range node O1, no. On commonly expresses concept node O; the weight terms node A1, an do not have their respective term meanings beyond the scope of the weight concept expressed by concept node a, so the weight terms node A1, an collectively express concept node a; the term overweight term node B1, where the meaning of the term by Bn, respectively, does not fall outside the scope of the overweight concept expressed by the concept node B, so that the term overweight node B1, where Bn jointly expresses the concept node B.
In connection with the illustration of concept node O and value range node O1 in fig. 1, it is known that currently, the values in the original concept graph are mutually bound to the concepts, i.e. the values of the value range data types contained in the value range nodes cannot express other concept nodes. This results in other concept nodes needing to be expressed in other value range nodes. For example, value range node O1,..on expresses concept node O, value range node Y1,..yn expresses concept node Y. In practice, however, there may be intersecting or overlapping values between the value range nodes O1, on and Y1, yn, but the value range data of the value range nodes in the original concept graph are difficult to multiplex due to the binding relationship between the values and the concepts, which easily results in great waste of data storage resources and computing resources.
In order to solve the above problems, the present application provides a method, an apparatus, and a related product for generating concept nodes, which implement decoupling of a value and a unit by separating value range nodes corresponding to each of a plurality of concept nodes in an original concept graph according to types of value range data and unit data. In the present application, a numerical concept node and a unit concept node are formed, respectively, and finally, a new concept node is generated by using the numerical concept node and the unit concept node. In the method, the independent existence of the numerical concept nodes releases the binding relation between the numerical value and the original concept nodes, improves the reusability of the value range data, and further saves data storage resources and computing resources.
In order to make the present invention better understood by those skilled in the art, the following description will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Method embodiment
Referring to fig. 2, the diagram is a flowchart of a method for generating a concept node according to an embodiment of the present application.
As shown in fig. 2, the method for generating a concept node provided in this embodiment includes:
step 201: and dividing the value range nodes corresponding to the concept nodes in the original concept graph according to two types of value range data and unit data to obtain an original value range data set D and a unit concept node U.
In this embodiment of the present application, the original concept graph includes at least concept nodes (refer to node O shown in fig. 1) and value range nodes (refer to node O1, on shown in fig. 1) corresponding to the concept nodes. It should be noted that, the plurality of concept nodes described herein may include not only the concept node O shown in fig. 1, but also other concept nodes not shown in fig. 1, such as a height value concept node, a blood pressure value concept node, or a heart rate value concept node.
Generally, a value range node is a triplet that includes a lower bound, an upper bound, and a unit. For example, the value range node O1 is ("unit" = "kg", "min_value" = "60", "max_value" = "80"), where "kg" is a unit of weight and thus is unit data; "60" is the lower bound of value range node O1, "80" is the upper bound of value range node O1, and "60" and "80" are value range data.
By performing this step, the value range data is divided into the original value range data set D. As an alternative implementation, the individual value range data may be arranged in ascending order. Table 1 illustratively presents value range nodes.
Table 1 value range node example table
Value range node Lower boundary of Upper boundary of Unit (B)
<100,200,g> 100 200 g
<1,2,kg> 1 2 kg
<200,1200,g> 200 1200 g
<3,4,kg> 3 4 kg
<4,6,kg> 4 6 kg
<1500,2000,g> 1500 2000 g
After the value range nodes provided in the table 1 are divided according to data types, the obtained original value range data set is D= [ - ≡,1,2,3,4,6, 100, 200, 1200, 1500, 2000, +++, where, both ends ±infinity of the set D are used as the boundary of the set D. In practical application, before the original value range data set D is obtained, the value range data obtained by dividing can be de-duplicated, that is, duplicate value range data is removed, and non-duplicate value range data is reserved as elements in the original value range data set D.
The unit data divided by the data type, for example, kg and g, etc., can be used to obtain the unit concept node U.
Step 202: dividing adjacent value fields formed by any two adjacent value field data in the original value field data set D to obtain new value field data, and obtaining a numerical concept node E by using the new value field data and the value field data in the original value field data set D.
For the front partAn example raw value range data set provided in one step is d= [ - ≡,1,2,3,4,6, 100, 200, 1200, 1500, 2000, ++ infinity]Comprising a plurality of contiguous range data, e.g. 1 and 2 being two contiguous range data, 1200 and 1500 being two contiguous range data, respectively contiguous ranges [1,2]A contiguous value range of [1200, 1500]. For ease of explanation and explanation, any contiguous range of values will be denoted as [ V ] in the embodiments of the present application i ,V i+1 ]. Wherein V is i+1 Is equal to V i The value of the adjacency is greater than V i Value range data of V i For contiguous value range [ V i ,V i+1 ]Lower bound of V i+1 For contiguous value range [ V i ,V i+1 ]Is a lower bound of (c). Assuming that the original value range data set D has m value range data (m is a positive integer greater than 1), the value of i is any integer from 1 to m-1, V i Representing the ith value range data arranged in ascending order, V i+1 Indicating the i+1th value range data arranged in ascending order.
This step is performed on any adjacent value range V i ,V i+1 ]Dividing to obtain a value with a value of V i And V i+1 New value range data in between. In practical applications, there are various implementations of dividing a contiguous range of values, for example, the contiguous range of values may be divided according to a certain set granularity or divided according to a set number of segments. Accordingly, the specific implementation of dividing the contiguous value range is not limited herein.
By dividing, new value range data are obtained, and the new value range data can be used for obtaining the numerical concept node E in combination with the value range data in the original value range data set D. As an example, the numerical concept node E may be expressed as [ - +_infinity, 1,2,3,4,6, 100, 200, 400, 600, 800, 1000, 1200, 1500, 2000, +_infinity ].
Step 203: a new concept node is generated using the numeric concept node E and the unit concept node U.
Referring to fig. 3, a schematic diagram of generating a new concept node is provided in an embodiment of the present application. A plurality of unit terms node U1..un may be individual units divided by performing step 201. These unit terms node U1, un refers in fig. 3 to the unit concept node U, representing the unit terms node U1, un stands for (representational) unit concept node U. In the example shown in fig. 3, the unit concept node U may also be referred to as a weight unit concept node U. In one possible implementation, the weight unit concept node U includes multiple types of concept node instances, such as U001 (not shown in fig. 3) and U002 (not shown in fig. 3), each of which is expressed by synonymous unit term nodes. For example, the unit terms "kg" and "kilogram" refer to the concept node instance U001, and the unit terms "gram" and "g" refer to the concept node instance U002.
In fig. 3, a plurality of value range nodes E1,..en may be a value range formed from original value range data and new value range data in the original value range data set D, such as [200, 400], [400, 600], etc. Note that, the value range node E1..the En is different from the value range node in the original conceptual map because the coupling relationship of the value range node E1..en and the units is released. These value range nodes E1..en are denoted in fig. 3 towards the value concept node E, representing the value range node E1..en stands for (representation) the value concept node E.
In fig. 3, the unit concept node U and the numerical concept node E are collectively directed to a concept node O, which may also be referred to as a weight value concept node in the embodiments of the present application. It should be noted that this concept node O is different from the weight value concept node O shown in fig. 1, because in fig. 3, the concept node O is generated using a unit concept node U and a numerical concept node E, whereas in fig. 1, the concept node O corresponds to a value range node O1.
Referring to fig. 4, a schematic diagram of yet another concept node generation provided in an embodiment of the present application is shown. As shown in fig. 4, since the method provided in this embodiment decouples the numerical value from the unit, the numerical value concept node E may participate in the link of generating the weight value concept node O or the height value concept node C. As shown in fig. 4, K represents a height unit conceptual node obtained in a similar manner to U, and the height unit conceptual node K and the numerical conceptual node E generate a height value conceptual node C.
The above is a method for generating a value range node provided by the embodiment of the present application. According to the method, value domain nodes corresponding to a plurality of concept nodes in the map are divided according to two types of value domain data and unit data, and an original value domain data set D and a unit concept node U are obtained. By dividing, the value domain data and the unit data of the value domain node are stripped, and the conceptual coupling relation between the numerical value and the unit is relieved. And dividing adjacent value fields formed by any two adjacent value field data in the original value field data set D to obtain new value field data, wherein the new value field data enriches the original value field data in the original data set D. The value concept node E can be obtained by using the new value range data and the value range data in the original value range data set D, and the value concept node E has the characteristic of easy identification due to the action of the new value range data. In addition, as the original value range data set D and the unit concept node U realize decoupling of the numerical value and the unit, the numerical value concept node E and the unit concept node U obtained by using the original value range data set D have no coupling relation.
In the present application, the new concept node is generated by using the numeric concept node E and the unit concept node U, and since there is no coupling relationship between the numeric concept node E obtained by the original value range data set D and the unit concept node U, the numeric and concept are in an unbound relationship, that is, the numeric concept node E may also be used to generate other new concept nodes. For example, the numerical concept node E may be used to generate a weight value concept node, or may be used to generate a height value concept node. Therefore, the concept node is generated in a new mode by the method for generating the concept node, the binding relation between the numerical value and the concept is relieved, and the reusability of the value range data is improved. The new concept node generated by adopting the technical scheme can be used for generating a new concept graph, and because the value domain data are reusable, the new concept graph can effectively save data storage resources and calculation resources.
As described above, in practical applications, a variety of alternative implementations may be used to complete the division of the contiguous value ranges and obtain new value range data. The following detailed description refers to the accompanying drawings and examples.
Referring to fig. 5, a flowchart of a method for generating another concept node according to an embodiment of the present application is shown.
As shown in fig. 5, the method for generating a concept node provided in this embodiment includes:
step 501: obtaining an original value domain data set D by using non-repeated value domain data in the lower bound and the upper bound of all value domain node triples in the original conceptual diagram; the unit concept node U is obtained using units in all triples.
The implementation of this step can be referred to the previous embodiments.
Step 502: and determining a function to be used according to the distribution condition of the value range data in the original value range data set D.
In this embodiment, a likelihood function is specifically applied to determine new value range data. When the likelihood function is applied, the integral function needs to be used for integration, and the integral function is called a function to be used in this embodiment, so that the function to be used may be different in different situations. The method of determining the function to be used is described in detail below.
In practical applications, the distribution of the data in the original value range data set D may conform to or be far from the existing probability distribution function in statistics. In order to determine a function to be used, in this embodiment, it is required to determine whether value range data in the original value range data set D is matched with a probability distribution function other than a gaussian function, and if so, determining the matched probability distribution function as the function to be used; if not, the Gaussian function is determined as the function to be used.
The expression of the function f (x) to be used is as follows:
in the formula (1), f P (D) A probability distribution function representing the generation of concept nodes in the original value range data set D. μ Is the expected value of the Gaussian distribution, sigma is the standard deviation of the Gaussian distribution, wherein μ The position of the gaussian distribution is determined and σ determines the magnitude of the gaussian distribution.
Step 503: adjacent value range V according to preset granularity q i ,V i+1 ]Dividing to obtain adjacent value range V i ,V i+1 ]Corresponding dividing boundary value and adding the dividing boundary value to the adjacent value range V i ,V i+1 ]Is a kind of medium.
The granularity q can be set according to actual requirements, and specific numerical values of the granularity q are not limited. Obtaining the adjacent value range [ V ] of any one end through division i ,V i+1 ]Corresponding dividing boundary value q w =V i +q,V i +2q,V i +3q. The adjacent value range [ V ] i ,V i+1 ]The corresponding division boundary value is greater than the lower bound of the contiguous range of values and less than the upper bound of the contiguous range of values.
Adding partitioning boundary values to contiguous value ranges V i ,V i+1 ]Obtained [ V ] i ,V i +q,V i +2q,V i +3q,...,V i+1 ]。
Step 504: using the function f (x) to be used, the contiguous value range V i ,V i+1 ]Lower boundary V of (V) i And upper bound V i+1 Dividing boundary value q w And likelihood functions, obtaining likelihood function values corresponding to the dividing boundary values.
The expression of the likelihood function is as follows:
for [ V ] i ,V i +q,V i +2q,V i +3q,...,V i+1 ]Dividing boundary value q w The corresponding likelihood function values are obtained according to the above formula (2).
Step 505: a division boundary value that maximizes the likelihood function value is determined and added as new value range data to the original value range data set D.
For [ V ] i ,V i +q,V i +2q,V i +3q,...,V i+1 ]All of the partition boundary values q w The likelihood function value corresponding to the likelihood function value may be small, and in this embodiment, the likelihood function value is calculated as max (L (q w ) Represents the maximum likelihood function, and max (L (q) w ) The corresponding division boundary value is added to the original value range data set as new value range data.
Step 506: judging whether all adjacent value fields in the original value field data set D meet a preset iteration ending condition, if so, ending iteration, and executing step 507; if not, determining the adjacent value range which does not meet the preset iteration end condition, and executing step 503.
In practical application, the preset iteration end conditions are set as follows: v (V) i+1 -V i < s. Wherein s represents the contiguous range [ V ] i ,V i+1 ]A limit value for the difference between the upper and lower bounds. When the iteration end condition is satisfied, a contiguous range [ V ] is represented i ,V i+1 ]The covered value range is narrow, and no division needs to be continued in the adjacent value range. While when a certain section is adjacent to the value range V i ,V i+1 ]When the iteration condition is not satisfied, the adjacent value range V is represented i ,V i+1 ]If the division is not continued, when new changes occur in the value domain data in the conceptual diagram, for example, when new value domain data between two value domain data are newly added, the recognition capability of the new value domain data is easily affected, and therefore the division needs to be continued.
It will be appreciated that continued partitioning of the contiguous range of values will result in new range data being added to the original range data set D. That is, the iterative process is a process of continually adding new elements to the original value range data set D.
Step 507: and obtaining the numerical concept node E by using the value domain data in the original value domain data set D after the iteration is finished.
Step 508: and generating a new concept node by using the numerical concept node E and the unit concept node U.
The implementation manner of generating the new concept node by using the numeric concept node E and the unit concept node U may refer to fig. 3 and 4 and the description of the foregoing embodiment step 203, which are not repeated herein.
Step 509: and generating a new concept graph by using the new concept nodes.
Referring to fig. 6, a schematic diagram of a new concept graph is provided in an embodiment of the present application. As can be seen from comparing fig. 6 and fig. 1, the generation method of the weight value concept node O is changed, so that the binding relationship between the concept and the value is released, and the value concept node E is provided with reusability, for example, to generate the weight value concept node, the blood pressure value concept node, the heart rate value concept node, or the like. In this way, data storage resources and computing resources are saved.
As can be seen from the description of the above embodiment, the original value range data set D is enriched by dividing the adjacent value ranges and obtaining new value range data, so that the actual value of the numeric concept node E is more diversified. Therefore, when the application scene of the concept graph changes, even if the value range data changes, the value concept nodes in the concept graph have various possible value-taking modes, so that the changed (or newly added) value range data can be accurately identified. Therefore, the concept graph is guaranteed to have strong applicability in various fields (such as medical field, industrial field and the like), and the inference logic (or inference judgment) function of the concept graph has high accuracy.
Based on the method for generating the concept node provided by the foregoing embodiment, correspondingly, the present application further provides a device for generating the concept node. The following description is made with reference to the examples and the accompanying drawings.
Device embodiment
Referring to fig. 7, the schematic structural diagram of a production device of a concept node according to an embodiment of the present application is shown.
As shown in fig. 7, the apparatus includes:
the first dividing module 701 is configured to divide, for value range nodes corresponding to each of the plurality of concept nodes in the original concept graph, according to two types of value range data and unit data, to obtain an original value range data set D and a unit concept node U; the value domain node is a triplet comprising a lower bound, an upper bound and a unit; in the original value range data set D, all value range data are arranged according to ascending order;
a second dividing module 702, configured to divide an adjacent value range formed by any two adjacent value range data in the original value range data set D to obtain new value range data, and obtain a numeric concept node E by using the new value range data and the value range data in the original value range data set D;
a concept node generating module 703, configured to generate a new concept node by using the numeric concept node E and the unit concept node U.
The value domain node in the original conceptual diagram is a triplet including a lower bound, an upper bound, and a unit. According to the method, value domain nodes corresponding to a plurality of concept nodes in the map are divided according to two types of value domain data and unit data, and an original value domain data set D and a unit concept node U are obtained. By dividing, the value domain data and the unit data of the value domain node are stripped, and the conceptual coupling relation between the numerical value and the unit is relieved. And dividing adjacent value fields formed by any two adjacent value field data in the original value field data set D to obtain new value field data, wherein the new value field data enriches the original value field data in the original data set D. The value concept node E can be obtained by using the new value range data and the value range data in the original value range data set D, and the value concept node E has the characteristic of easy identification due to the action of the new value range data. In addition, as the original value range data set D and the unit concept node U realize decoupling of the numerical value and the unit, the numerical value concept node E and the unit concept node U obtained by using the original value range data set D have no coupling relation.
In the present application, the new concept node is generated by using the numeric concept node E and the unit concept node U, and since there is no coupling relationship between the numeric concept node E obtained by the original value range data set D and the unit concept node U, the numeric and concept are in an unbound relationship, that is, the numeric concept node E may also be used to generate other new concept nodes. For example, the numerical concept node E may be used to generate a weight value concept node, or may be used to generate a height value concept node. Therefore, the concept node generating device generates the concept nodes in a new mode, the binding relation between the numerical value and the concept is relieved, and the reusability of the value range data is improved. Because the value range data is reusable, in many application scenarios, there is no need to repeatedly store and calculate a large amount of value range data, thereby effectively saving data storage resources and computing resources.
Optionally, the first dividing module 701 specifically includes:
the first obtaining unit is used for obtaining the original value range data set D by utilizing the value range data which are not repeated in the lower bound and the upper bound in all the triples;
and the second acquisition unit is used for acquiring the unit concept node U by using units in all the triples.
Optionally, the second dividing module 702 specifically includes:
the function determining unit is used for determining a function to be used according to the distribution condition of the value range data in the original value range data set D;
a dividing unit for dividing the adjacent value range [ V ] according to a preset granularity q i ,V i+1 ]Dividing to obtain the adjacent value range [ V ] i ,V i+1 ]Corresponding division boundary values, adding the division boundary values to the adjacent value range [ V ] i ,V i+1 ]In (a) and (b);
a computing unit for utilizing the function to be used, the adjacent value range [ V ] i ,V i+1 ]The lower limit and the upper limit of the partition boundary value and the likelihood function, and a likelihood function value corresponding to the partition boundary value is obtained;
and a value range data adding unit for determining a division boundary value that maximizes the likelihood function value, and adding the division boundary value as the new value range data to the original value range data set D.
By dividing adjacent value fields and obtaining new value field data, the original value field data set D is enriched, and the actual value of the numerical concept node E is more diversified. Therefore, when the application scene of the concept graph changes, even if the value range data changes, the value concept nodes in the concept graph have various possible value-taking modes, so that the changed (or newly added) value range data can be accurately identified.
Optionally, the function determining unit is specifically configured to:
judging whether the value range data in the original value range data set D is matched with a probability distribution function other than a Gaussian function, if so, determining the probability distribution function as the function to be used; if not, determining a Gaussian function as the function to be used.
Optionally, the apparatus further comprises: the judging module is used for judging whether all adjacent value fields in the original value field data set D meet the preset iteration ending condition, and if so, ending the iteration;
the second dividing module 702 is specifically configured to obtain the numeric concept node E by using the value range data in the original value range data set D after the iteration is ended.
Optionally, the apparatus further comprises: and the map generation module is used for generating a new concept map by utilizing the new concept nodes.
The device provided by the embodiment ensures that the concept graph has strong applicability in various fields (such as medical field, industrial field and the like), and the function of reasoning logic (or reasoning judgment) provided by the device has high accuracy.
Based on the method and the device for generating the concept node provided by the foregoing embodiments, the embodiments of the present application further provide a computer readable storage medium.
The storage medium stores a program that, when executed by a processor, implements some or all of the steps in the method for generating a concept node that is protected by the foregoing method embodiments of the present application.
The storage medium may be a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
Based on the method, the device and the storage medium for generating the concept node provided by the foregoing embodiments, the embodiment of the application provides a processor. The processor is configured to execute a program, where when the program runs, part or all of the steps in the method for generating the concept node protected by the foregoing method embodiment are executed.
Based on the storage medium and the processor provided in the foregoing embodiments, the present application further provides a generating device for a concept node.
Referring to fig. 8, the hardware configuration diagram of the generating device of the concept node provided in the present embodiment is shown.
As shown in fig. 8, the concept node generating apparatus includes: a memory 801, a processor 802, a communication bus 803, and a communication interface 804.
The memory 801 stores a program that can be run on a processor, and when the program is executed, the program implements some or all of the steps in the method for generating a concept node provided in the foregoing method embodiment of the present application. The memory 801 may include high-speed random access memory and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device.
In this device, the processor 802 and the memory 801 transmit signaling, logic instructions, and the like through a communication bus. The device is capable of communicating with other devices via the communication interface 804.
In the present application, the new concept node is generated by using the numeric concept node E and the unit concept node U, and since there is no coupling relationship between the numeric concept node E obtained by the original value range data set D and the unit concept node U, the numeric and concept are in an unbound relationship, that is, the numeric concept node E may also be used to generate other new concept nodes. For example, the numerical concept node E may be used to generate a weight value concept node, or may be used to generate a height value concept node. Therefore, the generation equipment of the concept node releases the binding relation between the numerical value and the concept, so that the reusability of the value range data is improved. Because the value range data is reusable, in many application scenarios, there is no need to repeatedly store and calculate a large amount of value range data, thereby effectively saving data storage resources and computing resources.
It should be noted that, in the present specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment is mainly described in a different point from other embodiments. In particular, for the apparatus and system embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, with reference to the description of the method embodiments in part. The above-described apparatus and system embodiments are merely illustrative, in which elements illustrated as separate elements may or may not be physically separate, and elements illustrated as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
The foregoing is merely one specific embodiment of the present application, but the protection scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application should be covered in the protection scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (7)

1. A method for generating a concept node, comprising:
dividing value domain nodes corresponding to a plurality of concept nodes in an original concept map according to two types of value domain data and unit data to obtain an original value domain data set and unit concept nodes; the value domain node is a triplet comprising a lower bound, an upper bound and a unit; in the original value range data set, each value range data is arranged according to an ascending order;
dividing adjacent value fields formed by any two adjacent value field data in the original value field data set to obtain new value field data, and obtaining a numerical concept node by using the new value field data and the value field data in the original value field data set;
generating new concept nodes by using the numerical concept nodes and the unit concept nodes;
the method for dividing the value domain nodes corresponding to the concept nodes in the original concept graph according to two types of value domain data and unit data to obtain an original value domain data set and unit concept nodes specifically comprises the following steps:
obtaining the original value range data set by using non-repeated value range data in the lower bound and the upper bound in all the triples; obtaining the unit concept node by using units in all the triples;
Dividing adjacent value fields formed by any two adjacent value field data in the original value field data set to obtain new value field data, wherein the method specifically comprises the following steps:
determining a function to be used according to the distribution condition of the value range data in the original value range data set;
dividing the adjacent value domain according to a preset granularity to obtain a dividing boundary value corresponding to the adjacent value domain, and adding the dividing boundary value into the adjacent value domain;
obtaining likelihood function values corresponding to the division boundary values by using the functions to be used, the lower bound and the upper bound of the adjacent value range, the division boundary values and the likelihood functions;
determining a partition boundary value that maximizes the likelihood function value and adding the partition boundary value as the new value range data to the original value range data set.
2. The method according to claim 1, wherein the determining the function to be used according to the distribution of the value range data in the original value range data set specifically comprises:
judging whether value domain data in an original value domain data set is matched with probability distribution functions other than a Gaussian function, if so, determining the probability distribution functions as functions to be used; if not, determining a Gaussian function as the function to be used.
3. The method according to claim 1 or 2, wherein after said adding the partition boundary value as said new value range data to said original value range data set, said method further comprises:
judging whether all adjacent value fields in the original value field data set meet a preset iteration ending condition, and ending iteration if so;
the obtaining a numeric concept node by using the new value range data and the value range data in the original value range data set specifically includes:
and obtaining the numerical concept node by using the value domain data in the original value domain data set after the iteration is finished.
4. The method according to claim 1 or 2, further comprising: and generating a new concept graph by using the new concept nodes.
5. A concept node generating apparatus, comprising:
the first dividing module is used for dividing the value range nodes corresponding to the concept nodes in the original concept graph according to the two types of the value range data and the unit data to obtain an original value range data set and the unit concept nodes; the value domain node is a triplet comprising a lower bound, an upper bound and a unit; in the original value range data set, each value range data is arranged according to an ascending order;
The second dividing module is used for dividing an adjacent value range formed by any two adjacent value range data in the original value range data set to obtain new value range data, and obtaining a numerical concept node by using the new value range data and the value range data in the original value range data set;
a concept node generating module, configured to generate a new concept node using the numeric concept node and the unit concept node;
the first dividing module specifically includes:
the first acquisition unit is used for acquiring the original value range data set by utilizing the value range data which are not repeated in the lower bound and the upper bound in all the triples;
a second obtaining unit, configured to obtain the unit concept node by using units in all the triples;
the second dividing module specifically includes:
the function determining unit is used for determining a function to be used according to the distribution condition of the value range data in the original value range data set;
the dividing unit is used for dividing the adjacent value domain according to a preset granularity, obtaining a dividing boundary value corresponding to the adjacent value domain, and adding the dividing boundary value into the adjacent value domain;
The computing unit is used for obtaining likelihood function values corresponding to the division boundary values by utilizing the functions to be used, the lower bound and the upper bound of the adjacent value range, the division boundary values and the likelihood functions;
and a value range data adding unit for determining a division boundary value that maximizes the likelihood function value, and adding the division boundary value as the new value range data to the original value range data set.
6. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a computer program which, when being executed by a processor, implements the method of generating concept nodes according to any of claims 1-4.
7. A processor, configured to run a computer program, the program when run performing the method of generating concept nodes according to any of claims 1-4.
CN201911302866.7A 2019-12-17 2019-12-17 Concept node generation method and device and related products Active CN111125373B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911302866.7A CN111125373B (en) 2019-12-17 2019-12-17 Concept node generation method and device and related products

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911302866.7A CN111125373B (en) 2019-12-17 2019-12-17 Concept node generation method and device and related products

Publications (2)

Publication Number Publication Date
CN111125373A CN111125373A (en) 2020-05-08
CN111125373B true CN111125373B (en) 2023-08-08

Family

ID=70499338

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911302866.7A Active CN111125373B (en) 2019-12-17 2019-12-17 Concept node generation method and device and related products

Country Status (1)

Country Link
CN (1) CN111125373B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2015203474A1 (en) * 2012-07-23 2015-07-16 Facebook, Inc. Structured search queries based on social-graph information
CN105320642A (en) * 2014-06-30 2016-02-10 中国科学院声学研究所 Automatic abstract generation method based on concept semantic unit
CN110534168A (en) * 2019-08-30 2019-12-03 北京百度网讯科技有限公司 Medicine advises indicating risk method, apparatus, electronic equipment and storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8180804B1 (en) * 2010-04-19 2012-05-15 Facebook, Inc. Dynamically generating recommendations based on social graph information
US10740678B2 (en) * 2016-03-31 2020-08-11 International Business Machines Corporation Concept hierarchies
US10566081B2 (en) * 2016-12-09 2020-02-18 International Business Machines Corporation Method and system for automatic knowledge-based feature extraction from electronic medical records

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2015203474A1 (en) * 2012-07-23 2015-07-16 Facebook, Inc. Structured search queries based on social-graph information
CN105320642A (en) * 2014-06-30 2016-02-10 中国科学院声学研究所 Automatic abstract generation method based on concept semantic unit
CN110534168A (en) * 2019-08-30 2019-12-03 北京百度网讯科技有限公司 Medicine advises indicating risk method, apparatus, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN111125373A (en) 2020-05-08

Similar Documents

Publication Publication Date Title
US20210326885A1 (en) Method and Apparatus of Identifying a Transaction Risk
WO2020139861A8 (en) Constructing a knowledge graph employing multiple subgraphs and a linking layer including multiple linking nodes
CN109547574A (en) A kind of data transmission method and relevant apparatus
JP2020503917A5 (en)
CN111125373B (en) Concept node generation method and device and related products
Zhang et al. The Gestation Delay: A Factor Causing Complex Dynamics in Gause‐Type Competition Models
KR101780534B1 (en) Method and system for extracting image feature based on map-reduce for searching image
CN107395598A (en) A kind of adaptive defense method for suppressing viral transmission
CN111695701B (en) System for realizing data set construction processing based on federal learning and construction generation method thereof
CN105553725B (en) A kind of dispositions method of multi-tenant data center software middleware
CN107229635A (en) A kind of method of data processing, memory node and coordinator node
CN104932982B (en) A kind of Compilation Method and relevant apparatus of message memory access
CN104598567B (en) A kind of method of the data statistics re-scheduling based on Hadoop MapReduce programming frameworks
CN115879543A (en) Model training method, device, equipment, medium and system
CN109408035A (en) A kind of process configuration method, storage medium and the server of operation system
EP3346380A1 (en) Methods for adaptive placement of applications and devices thereof
CN111612162B (en) Reinforced learning method and device, electronic equipment and storage medium
CN110704382B (en) File deployment method, device, server and storage medium
CN109033189B (en) Compression method and device of link structure log, server and readable storage medium
GB2615498A (en) Graph-based color description generation
CN106528259B (en) Data sending, receiving method and its device
CN111325032A (en) 5G + intelligent banking institution name standardization method and device
CN109783569A (en) A kind of account book recording method, device and terminal device based on block chain
CN113824802B (en) Decentralized distributed training topological structure, training system and method
CN110413600A (en) Data managing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant