[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN112001188B - Method and device for rapidly realizing NL2SQL based on vectorization semantic rule - Google Patents

Method and device for rapidly realizing NL2SQL based on vectorization semantic rule Download PDF

Info

Publication number
CN112001188B
CN112001188B CN202011184694.0A CN202011184694A CN112001188B CN 112001188 B CN112001188 B CN 112001188B CN 202011184694 A CN202011184694 A CN 202011184694A CN 112001188 B CN112001188 B CN 112001188B
Authority
CN
China
Prior art keywords
semantic
entity
rule template
sentence
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011184694.0A
Other languages
Chinese (zh)
Other versions
CN112001188A (en
Inventor
肖超峰
李智
钱泓锦
刘占亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zhiyuan Artificial Intelligence Research Institute
Original Assignee
Beijing Zhiyuan Artificial Intelligence Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zhiyuan Artificial Intelligence Research Institute filed Critical Beijing Zhiyuan Artificial Intelligence Research Institute
Priority to CN202011184694.0A priority Critical patent/CN112001188B/en
Publication of CN112001188A publication Critical patent/CN112001188A/en
Application granted granted Critical
Publication of CN112001188B publication Critical patent/CN112001188B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a method and a device for quickly realizing NL2SQL based on vectorized semantic rules. The method comprises the following steps: performing word segmentation processing and entity recognition on a first sentence based on a natural language; replacing the corresponding entity in the first statement by using a preset entity type to obtain a second statement; identifying the second sentence according to a preset semantic rule template to obtain a semantic segment; obtaining table and field information of a service database according to semantic fragment matching; and generating SQL sentences according to the table and field information of the business database. NL2SQL can be quickly realized without depending on a complex system and a database, semantic fragments in natural sentences are identified based on vectorization semantic rules, the semantic search accuracy and generalization capability are improved, and the method has high recall rate.

Description

Method and device for rapidly realizing NL2SQL based on vectorization semantic rule
Technical Field
The invention relates to the technical field of natural language processing, in particular to a method and a device for quickly realizing NL2SQL based on vectorized semantic rules.
Background
In the field of semantic search, how to freely query target data in a database through natural language becomes an emerging research hotspot in the industry. The conversion of natural language into a standard semantic representation which can be understood and executed by a computer is a subtask in the field of semantic analysis. NL2SQL (Natural Language to SQL) is a technology that can convert a user Natural statement into an SQL statement that can be executed by a computer.
In an actual professional application scenario, because enough labeled corpora are not available or lack in the professional field, corresponding model training cannot be constructed, so that the NL2SQL can be quickly realized by combining a business data model still is a difficult problem. In addition, the field attribute analyzed in the natural sentence and the field in the service database lack accurate mapping, so that the whole link flow is complex and the executable SQL cannot be correctly generated.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides the following technical scheme.
One aspect of the present invention provides a method for rapidly implementing NL2SQL based on vectorized semantic rules, comprising:
performing word segmentation processing and entity recognition on a first sentence based on a natural language;
replacing the corresponding entity in the first statement by using a preset entity type to obtain a second statement;
identifying the second sentence according to a preset semantic rule template to obtain a semantic segment;
obtaining table and field information of a service database according to the semantic fragment matching;
and generating SQL sentences according to the table and field information of the service database.
Preferably, the performing word segmentation processing and entity recognition on the first sentence based on the natural language comprises:
performing word segmentation processing and entity recognition on the first sentence by using a predefined entity rule template and a conventional dictionary to obtain a preset entity type corresponding to an entity in the first sentence; the entity rule template includes a custom professional domain dictionary.
Preferably, the entity rule template further comprises a third party model interface for invoking a third party entity recognition model.
Preferably, the recognizing the second sentence according to a preset semantic rule template to obtain a semantic segment includes:
matching each participle in the second sentence with a word in the semantic rule template to obtain a semantic rule;
and identifying elements of the semantic fragments according to the semantic rules.
Preferably, matching each participle in the second sentence with a word in the semantic rule template comprises:
converting the participles into participle vectors, and calculating the similarity between the participle vectors and the vectors of the words in the semantic rule template; and if the similarity reaches a threshold value, replacing the participle with a word in the semantic rule template.
Preferably, the obtaining of the table and field information of the service database according to the semantic segment matching includes:
and matching the elements of the semantic fragments with the information description rules of a preset business database table to obtain the information, the field information and the associated information between the tables of the corresponding table.
Preferably, the generating an SQL statement according to the table and field information of the service database includes:
obtaining the structural elements of the SQL statement according to the information and the field information of the corresponding tables and the correlation information among the tables;
and filling the structural elements into a rule template of the SQL statement to generate the SQL statement.
The second aspect of the present invention provides a device for quickly implementing NL2SQL based on vectorized semantic rules, including:
the entity recognition module is used for performing word segmentation processing and entity recognition on the first sentence based on the natural language;
the entity type replacing module is used for replacing the corresponding entity in the first statement by using a preset entity type to obtain a second statement;
the semantic segment recognition module is used for recognizing the second sentence according to a preset semantic rule template to obtain a semantic segment;
the service database table matching module is used for obtaining the table and field information of the service database according to the semantic segment matching;
and the SQL statement generating module is used for generating the SQL statement according to the table and field information of the business database.
A third aspect of the invention provides a memory storing a plurality of instructions for implementing the method described above.
A fourth aspect of the present invention provides an electronic device, comprising a processor and a memory connected to the processor, wherein the memory stores a plurality of instructions, and the instructions are loaded and executed by the processor, so that the processor can execute the method.
The invention has the beneficial effects that: the technical scheme provided by the invention is that,
the text proposes a field of application in the profession, combined with business data structured description information,
matching files are simply configured according to a rule template in advance, the configured files are combined, after word segmentation processing is carried out on natural sentences, semantic fragments are identified according to semantic rules by entity assistance, then a business database table is matched according to field information in the semantic fragments, and SQL sentences are finally generated. The NL2SQL can be quickly realized only by simple and easily understood file template format configuration without depending on a complex system and a database, and the semantic fragments in the natural sentences are identified based on the vectorized semantic rule, so that good accuracy and generalization capability are ensured, and the recall rate is high.
Drawings
FIG. 1 is a schematic flow chart of a method for rapidly implementing NL2SQL based on vectorized semantic rules according to the present invention;
fig. 2 is a schematic structural diagram of the device for rapidly implementing NL2SQL based on vectorized semantic rules according to the present invention.
Detailed Description
In order to better understand the technical solution, the technical solution will be described in detail with reference to the drawings and the specific embodiments.
The method provided by the invention can be implemented in the following terminal environment, and the terminal can comprise one or more of the following components: a processor, a memory, and a display screen. Wherein the memory has stored therein at least one instruction that is loaded and executed by the processor to implement the methods described in the embodiments described below.
A processor may include one or more processing cores. The processor connects various parts within the overall terminal using various interfaces and lines, performs various functions of the terminal and processes data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory, and calling data stored in the memory.
The Memory may include a Random Access Memory (RAM) or a Read-Only Memory (ROM). The memory may be used to store instructions, programs, code sets, or instructions.
The display screen is used for displaying user interfaces of all the application programs.
In addition, those skilled in the art will appreciate that the above-described terminal configurations are not intended to be limiting, and that the terminal may include more or fewer components, or some components may be combined, or a different arrangement of components. For example, the terminal further includes a radio frequency circuit, an input unit, a sensor, an audio circuit, a power supply, and other components, which are not described herein again.
Example one
As shown in fig. 1, an embodiment of the present invention provides a method for quickly implementing NL2SQL based on vectorized semantic rules, including:
s101, performing word segmentation processing and entity recognition on a first sentence based on a natural language;
s102, replacing a corresponding entity in the first statement by using a preset entity type to obtain a second statement;
s103, identifying the second sentence according to a preset semantic rule template to obtain a semantic segment;
s104, obtaining table and field information of a service database according to the semantic segment matching;
and S105, generating an SQL statement according to the table and the field information of the business database.
Executing step S101, specifically including: performing word segmentation processing and entity recognition on the first sentence by using a predefined entity rule template and a conventional dictionary to obtain a preset entity type corresponding to an entity in the first sentence; the entity rule template includes a custom professional domain dictionary.
For a specific business professional field, more professional languages or newly generated words exist, the conventional dictionary is often incapable of accurately segmenting natural sentences in the professional field, and in order to solve the problem, the embodiment of the invention utilizes a self-defined professional field dictionary. In the customized professional field dictionary, a user can customize a professional language and/or newly generated words and the like, and can also customize synonyms, stop words, keywords, entities and the like. Because the matching degree of the customized professional field dictionary and the professional sentences is better, the precision can be improved and the professional field dictionary is more suitable for the business scene by adopting the professional field dictionary to process word segmentation and entity recognition.
In the actual application process, an entity rule template can be predefined, and a customized professional domain dictionary is configured in the entity rule template. The content format of the entity rule template is entity type and dictionary name, and a plurality of words are separated by commas, which is exemplified as follows:
#{person}=http://ip:port/person
Figure 615627DEST_PATH_IMAGE002
q = interface address// name entity third party model
Lixia, Lijian, Liwen,.// custom dictionary
# stock name
Digital video signal
#{date}=http://ip:port/date
Figure 142555DEST_PATH_IMAGE002
q =// time entity
#{place}=http://ip:port/place
Figure 797658DEST_PATH_IMAGE002
q =// location entity
#end
In a specific implementation process, a Trie (dictionary tree, in which a conventional dictionary is built) may be used in combination with an entity rule template to perform word segmentation processing on an input first sentence based on a natural language by using a forward and reverse maximum matching algorithm, so as to perform entity recognition. For example,
the first statement: in the last 3 months, men who have had Beijing and are greater than 175 blood group B in height have been visited.
Word segmentation and entity recognition results: the last 3 months/date passed/vq Beijing/place/uj height/v is greater than/d 175/x blood type/n B/x/uj male/n.
In a preferred embodiment of the present invention, the entity rule template further comprises a third party model interface for invoking a third party entity recognition model. As shown in the above example. The third-party entity recognition model can be directly called through the configured interface address. The parameter representation is for example: # { date } = http:// ip: port/date
Figure 534146DEST_PATH_IMAGE002
q =, standard data return format, for example:
{“type”:””date”,””entity”:”2013-06-08”},
{ "type": "", "person", "" entity ": Liu Xiaohua" }.
Because the customized professional field dictionary comprises clear professional field vocabularies, the customized professional field dictionary is preferentially adopted in the process of word segmentation and entity recognition, and the accuracy is higher compared with the third-party entity recognition model.
And S102 is executed, and the entity type is adopted to replace the corresponding entity in the first statement to obtain a second statement. For example:
last 3 months/date gone/vq Beijing/place/uj height/v is greater than/d 175/x blood type/n B/x/uj man/n
"last 3 months" is the date entity { date }, and "Beijing" is the place entity { place }. After replacing the corresponding entity with the entity type, the obtained second statement is:
{ date } to men with { place } height greater than 175 blood group B.
And S103, identifying the second sentence according to a preset semantic rule template to obtain a semantic segment. The method specifically comprises the following steps: matching each participle in the second sentence with a word in the semantic rule template to obtain a semantic rule; and identifying elements of the semantic fragments according to the semantic rules.
The semantic rule template is provided with words, semantic rules and element fields. Examples are as follows:
# stature// semantic fragment rule template
vector = [ height,. ]// in use, the words are respectively vectorized by word2vector to represent 128 dimensions
frag = [ height (greater than | less than | l) \ d {3} (cm | m) ], [. eta. ]// semantic rules, multiple rules # split, { slot } representing different expressions of the above words.
operate = [ greater than: ], [ less than:.// obtaining on a frag basis
value = [ \ d {2,3} ], [ ].// obtaining field values on the basis of frag
mapvalue =// map database true value
islike = 0|1// whether to turn on fuzzy search
# gender
vector = [ male, female. ]// vectorizing words respectively represents 128 dimensions
Frag = [ (male/female) ]/H
operate=[]
value = [ (male | female) ]
mapvalue = [ male: 1], [ female: 0]// mapping database true value
islike=1
Blood group # of
vector = [ blood type, AB, O. ]// vectorizing slot words respectively represents 128 dimensions
Frag = [ blood type (A | B | AB | O), (A | B | AB | O) type. ]
operate=[]
value=[(A|B|AB|O)]
islike=0
# destination
vector = [ go,. ]// use respectively vectorized words represent 128 dimensions
Frag = [ go over { place }, to { place }. ]
operate=[]
value=[{place}]
islike=0
Time # time
slot = [ recently, these days. ]
Frag=[{date},... ]
operate=[]
value=[{date}, \d{1,3}]
islike=0
#end
As an example, for example, the second statement is:
{ date } to men with { place } height greater than 175 blood group B.
Matching the segmented words ' removed ' with Frag words ' in a semantic rule template to obtain a semantic rule: "go { place }", according to the semantic rule, identify the semantic fragment in the first sentence: "get over Beijing", and extract the essential information of the semantic segment: destination # operation "=" and value "Beijing".
For another example, the segmented word "height" is matched with the Frag word "height" in the semantic rule template to obtain the semantic rule: "height (greater than | less than |) \ d {3} (cm | m)", identifying a semantic segment in the first sentence according to the semantic rule: and the height is more than 175, and element information of the semantic fragment is extracted: height # was found to be "greater than:" for operate and "175" for value.
In a preferred embodiment of the present invention, when each participle in the second sentence is matched with a word in the semantic rule template, the participle is converted into a participle vector, and the similarity between the participle vector and the vector of the word in the semantic rule template is calculated; and if the similarity reaches a threshold value, replacing the participle with a word in the semantic rule template, and then matching, so that the recall rate and generalization capability of semantic recognition can be greatly improved.
In the specific implementation process, the above-mentioned word segmentation conversion and replacement may be performed before matching under all circumstances, or may be performed when the Frag word cannot be matched by using the original word segmentation. The specific setting can be carried out according to the actual situation.
As an example, the statements are:
"{ date } passed over a man with { place } height greater than 175 blood group B,
the segmented words are removed from the template and are used as Frag words in a semantic rule template, if the segmented words are too removed, the segmented words are not matched, and the semantic rule cannot be obtained, the segmented words are removed from the semantic rule template, if the segmented words are too removed, the segmented words are used as if the segmented words are too removed, and the segmented words are not used.
Executing step S104, obtaining the table and field information of the service database according to the semantic segment matching, specifically comprising:
and matching the elements of the semantic fragments with the information description rules of a preset business database table to obtain the information, the field information and the associated information between the tables of the corresponding table.
Wherein, the information description rule of the business database table can be configured in the template. The method is used for describing the specific table structure and table incidence relation of the database so as to finally generate SQL query database acquisition data. For example, in the configuration file of the following example, the content format is table association described with # table as the start; beginning with # fields, a service field is described, the contents are [ table name: field description: field name: type ], and a plurality of fields are comma-separated.
#table
[person.id=behavior.person_id],[...]
#fields
person = [ height: int, gender: string, blood type: bloodtype: string, birth place: csd: string. ]
behavior = [ destination: ddd: string, arrival time: ddsj: date. ]
#end
In the following example, using the recognition result of the semantic segment obtained in step S103: time, # destination, # gender, # blood type, # height "etc. to match the information description rules of the business database table: person = [ height: int, gender: string, blood type: bloodtype: string, birth place: csd: string, ] and the like), and the table, the field and the associated information among the tables are obtained, such as the field name, the field type, the operation symbol and the like of the related field. The table and field information can be output as JSON format, and the result is as follows:
[
{
"entType": height ",
"frag" height 175cm,
"operate":">",
"value":"175",
"table":"person",
"field":"height",
"fieldType":"double"
},
{
"entType": sex ",
"frag": for male,
"operate":"=",
"value" means "male",
"table":"person",
"field":"gender",
"fieldType":"string"
},
{
"entType": blood type ",
"frag": blood group B ",
"operate":"=",
"value":"B",
"table":"person",
"field":"bloodtype",
"fieldType":"string"
},
{
"entType": time of arrival ",
"frag": last 3 months,
"operate":"",
"value":"3",
"table":"behavior",
"field":"ddsj",
"fieldType":"date"
},
{
"entType": destination ",
"frag" go to Beijing ",
"operate":"=",
"value": Beijing ",
"table":"behavior",
"field":"ddd",
"fieldType":"string"
}
]
executing step S105, generating an SQL statement according to the table and field information of the service database, which specifically includes:
obtaining the structural elements of the SQL statement according to the information and the field information of the corresponding tables and the correlation information among the tables;
and filling the structural elements into a rule template of the SQL statement to generate the SQL statement.
As an example, the rule template of the SQL statement is: select A from B where C and D, wherein A, B, C, D is a component of the SQL statement.
According to the above example, the table and field information of the service database obtained in step S104 are used to obtain the corresponding values of "table" and "field", and the component a obtained after splicing is: person, gene, person, blob type, behavior.ddsj, behavior.ddd; the constituent element B is: person person person, behavior behavior; the constituent element C is: person id = behavior person id; the component D is table.field + operation + value condition, and if there are a plurality of default conditions and one field corresponds to each other, the default condition is converted to an or connection, and as a result, person.generator = 'man' and person.bloodtype = 'B' and person.height > 175 and behavor.dd = 'beijing' and behavor.ddsj > = 'xxx'.
Filling the result of the component A, B, C, D into a rule template of the SQL statement, and generating the SQL statement as follows:
height, person, blob type, behavior.ddsj, B-ehavior.ddfromperson, behaororbahoviorherepherson.id = behavior.person _ id and person.gen = 'man' and person.blob type = 'B' and person.height > 175 and behavior.ddd = 'beijing' and behavior.ddsj > = 'xxx'.
The method provided by the invention has the following beneficial effects:
firstly, semantic segments in natural sentences are identified based on vectorization semantic rules, and relevant field information in the semantic segments is extracted.
And secondly, a dictionary which is convenient for configuring the professional field is supported, and a third-party entity recognition model is flexibly referred in a mode of configuring an interface address. The combination of the two has good expansion capability and usability.
Thirdly, the business data table structure, the professional domain dictionary and the semantic recognition rule can be configured through a simple and easily understood file template format, the configured elements can be mutually quoted among the templates, the operation is simple, clear and convenient, the operation does not depend on a complex system and a complex database, and the expansion is easy.
Fourthly, field fuzzy search can be supported through configuration, mapping of identified attribute values and database field values is supported, multi-table association query is also supported, and application range is expanded.
Example two
As shown in fig. 2, another aspect of the present invention further includes a functional module architecture completely corresponding to and consistent with the foregoing method flow, that is, an embodiment of the present invention further provides an apparatus for quickly implementing NL2SQL based on vectorized semantic rules, including:
an entity recognition module 201, configured to perform word segmentation processing and entity recognition on a first sentence based on a natural language;
an entity type replacing module 202, configured to replace a corresponding entity in the first statement with a preset entity type, to obtain a second statement;
the semantic segment recognition module 203 is configured to recognize the second sentence according to a preset semantic rule template to obtain a semantic segment;
a service database table matching module 204, configured to obtain the table and field information of the service database according to the semantic segment matching;
the SQL statement generating module 205 is configured to generate an SQL statement according to the table and field information of the service database.
Further, the performing word segmentation processing and entity recognition on the first sentence based on the natural language comprises:
performing word segmentation processing and entity recognition on the first sentence by using a predefined entity rule template and a conventional dictionary to obtain a preset entity type corresponding to an entity in the first sentence; the entity rule template includes a custom professional domain dictionary.
Still further, the entity rule template further includes a third party model interface for invoking a third party entity recognition model.
Further, the recognizing the second sentence according to a preset semantic rule template to obtain a semantic segment includes:
matching each participle in the second sentence with a word in the semantic rule template to obtain a semantic rule;
and identifying elements of the semantic fragments according to the semantic rules.
Further, matching each participle in the second sentence with a term in the semantic rule template comprises:
converting the participles into participle vectors, and calculating the similarity between the participle vectors and the vectors of the words in the semantic rule template; and if the similarity reaches a threshold value, replacing the participle with a word in the semantic rule template.
Further, the obtaining of the table and field information of the service database according to the semantic segment matching includes:
and matching the elements of the semantic fragments with the information description rules of a preset business database table to obtain the information, the field information and the associated information between the tables of the corresponding table.
Further, the generating an SQL statement according to the table and field information of the service database includes:
obtaining the structural elements of the SQL statement according to the information and the field information of the corresponding tables and the correlation information among the tables;
and filling the structural elements into a rule template of the SQL statement to generate the SQL statement.
The device can be implemented by the method provided in the first embodiment, and the specific implementation method can be referred to the description in the first embodiment, which is not described herein again.
The invention also provides a memory storing a plurality of instructions for implementing the method according to the first embodiment.
The invention also provides an electronic device comprising a processor and a memory connected to the processor, wherein the memory stores a plurality of instructions, and the instructions can be loaded and executed by the processor to enable the processor to execute the method according to the first embodiment.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention. It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (3)

1. A method for rapidly realizing NL2SQL based on vectorized semantic rules is characterized by comprising the following steps:
performing word segmentation processing and entity recognition on a first sentence based on a natural language, wherein the word segmentation processing and the entity recognition comprise the following steps:
performing word segmentation processing and entity recognition on the first sentence by using a predefined entity rule template and a conventional dictionary to obtain a preset entity type corresponding to an entity in the first sentence; the entity rule template comprises a self-defined professional field dictionary; the entity rule template further comprises a third party model interface for calling a third party entity identification model;
replacing the corresponding entity in the first statement by using a preset entity type to obtain a second statement;
identifying the second sentence according to a preset semantic rule template to obtain a semantic segment, wherein the semantic segment comprises:
matching each participle in the second sentence with a word in the semantic rule template to obtain a semantic rule;
identifying elements of semantic fragments according to the semantic rules;
wherein, the semantic rule template is provided with words, semantic rules and element fields;
obtaining the table and field information of the service database according to the semantic fragment matching, wherein the table and field information comprises the following steps:
matching the elements of the semantic fragments with information description rules of a preset business database table to obtain corresponding table information, field information and correlation information among tables;
wherein, the table and field information output is in a JSON format;
generating SQL sentences according to the table and field information of the service database, including: obtaining the structural elements of the SQL statement according to the information and the field information of the corresponding tables and the correlation information among the tables;
filling the constituent elements into a rule template of an SQL statement to generate the SQL statement;
the matching each participle in the second sentence with a term in the semantic rule template comprises:
converting the participles into participle vectors, and calculating the similarity between the participle vectors and the vectors of the words in the semantic rule template; and if the similarity reaches a threshold value, replacing the participle with a word in the semantic rule template.
2. A memory storing a plurality of instructions for implementing the method of claim 1.
3. An electronic device comprising a processor and a memory coupled to the processor, the memory storing a plurality of instructions that are loadable and executable by the processor to enable the processor to perform the method of claim 1.
CN202011184694.0A 2020-10-30 2020-10-30 Method and device for rapidly realizing NL2SQL based on vectorization semantic rule Active CN112001188B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011184694.0A CN112001188B (en) 2020-10-30 2020-10-30 Method and device for rapidly realizing NL2SQL based on vectorization semantic rule

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011184694.0A CN112001188B (en) 2020-10-30 2020-10-30 Method and device for rapidly realizing NL2SQL based on vectorization semantic rule

Publications (2)

Publication Number Publication Date
CN112001188A CN112001188A (en) 2020-11-27
CN112001188B true CN112001188B (en) 2021-03-16

Family

ID=73475294

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011184694.0A Active CN112001188B (en) 2020-10-30 2020-10-30 Method and device for rapidly realizing NL2SQL based on vectorization semantic rule

Country Status (1)

Country Link
CN (1) CN112001188B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112597273A (en) * 2020-12-02 2021-04-02 国网浙江省电力有限公司台州供电公司 Power distribution automation chart generation method based on NL2SQL technology
CN113722457B (en) * 2021-08-11 2024-08-06 北京零秒科技有限公司 Intention recognition method and device, storage medium and electronic device
CN114168885A (en) * 2021-12-03 2022-03-11 武汉百智诚远科技有限公司 Intelligent class retrieval method based on voice recognition and NL2SQL model
CN114090721B (en) * 2022-01-19 2022-04-22 支付宝(杭州)信息技术有限公司 Method and device for querying and updating data based on natural language data
CN115687397A (en) * 2022-01-19 2023-02-03 支付宝(杭州)信息技术有限公司 Query processing method and device for natural language
CN115794857A (en) * 2022-01-19 2023-03-14 支付宝(杭州)信息技术有限公司 Query request processing method and device
CN114218935B (en) * 2022-02-15 2022-06-21 支付宝(杭州)信息技术有限公司 Entity display method and device in data analysis
CN114443692B (en) * 2022-02-15 2023-08-04 支付宝(杭州)信息技术有限公司 Data query method and device
CN117648926A (en) * 2024-01-30 2024-03-05 北京数语科技有限公司 Method and system for automatically creating data model based on natural language

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103646032A (en) * 2013-11-11 2014-03-19 漆桂林 Database query method based on body and restricted natural language processing
CN107451153A (en) * 2016-05-31 2017-12-08 北京京东尚科信息技术有限公司 The method and apparatus of export structure query statement
CN110275947A (en) * 2019-05-23 2019-09-24 中国人民解放军战略支援部队信息工程大学 Domain-specific knowledge map natural language querying method and device based on name Entity recognition

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3842577B2 (en) * 2001-03-30 2006-11-08 株式会社東芝 Structured document search method, structured document search apparatus and program

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103646032A (en) * 2013-11-11 2014-03-19 漆桂林 Database query method based on body and restricted natural language processing
CN107451153A (en) * 2016-05-31 2017-12-08 北京京东尚科信息技术有限公司 The method and apparatus of export structure query statement
CN110275947A (en) * 2019-05-23 2019-09-24 中国人民解放军战略支援部队信息工程大学 Domain-specific knowledge map natural language querying method and device based on name Entity recognition

Also Published As

Publication number Publication date
CN112001188A (en) 2020-11-27

Similar Documents

Publication Publication Date Title
CN112001188B (en) Method and device for rapidly realizing NL2SQL based on vectorization semantic rule
WO2021068321A1 (en) Information pushing method and apparatus based on human-computer interaction, and computer device
CN108287858B (en) Semantic extraction method and device for natural language
WO2021174717A1 (en) Text intent recognition method and apparatus, computer device and storage medium
US20180210883A1 (en) System for converting natural language questions into sql-semantic queries based on a dimensional model
US20220254507A1 (en) Knowledge graph-based question answering method, computer device, and medium
CN108628830B (en) Semantic recognition method and device
CN107861954B (en) Information output method and device based on artificial intelligence
CN110019742B (en) Method and device for processing information
CN111310440A (en) Text error correction method, device and system
CN111274797A (en) Intention recognition method, device and equipment for terminal and storage medium
CN113220835B (en) Text information processing method, device, electronic equipment and storage medium
CN116303537A (en) Data query method and device, electronic equipment and storage medium
CN110427455A (en) A kind of customer service method, apparatus and storage medium
CN113836314B (en) Knowledge graph construction method, device, equipment and storage medium
CN114020888A (en) Text generation method, device, equipment and storage medium
CN110795942B (en) Keyword determination method and device based on semantic recognition and storage medium
CN111898363A (en) Method and device for compressing long and difficult sentences of text, computer equipment and storage medium
CN111126084A (en) Data processing method and device, electronic equipment and storage medium
CN112818096A (en) Dialog generating method and device
CN110956043A (en) Domain professional vocabulary word embedding vector training method, system and medium based on alias standardization
US20210263915A1 (en) Search Text Generation System and Search Text Generation Method
CN112559550B (en) Multi-data-source NL2SQL system based on semantic rules and multi-dimensional model
CN117370596A (en) Medicine knowledge retrieval method and device
CN117438106A (en) Auxiliary inquiry method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant