CN115034209A - Text analysis method and device, electronic equipment and storage medium - Google Patents
Text analysis method and device, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN115034209A CN115034209A CN202210828882.5A CN202210828882A CN115034209A CN 115034209 A CN115034209 A CN 115034209A CN 202210828882 A CN202210828882 A CN 202210828882A CN 115034209 A CN115034209 A CN 115034209A
- Authority
- CN
- China
- Prior art keywords
- text
- analyzed
- target
- result
- determining
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 238000004458 analytical method Methods 0.000 title claims abstract description 248
- 238000012937 correction Methods 0.000 claims abstract description 126
- 238000000034 method Methods 0.000 claims description 38
- 239000011159 matrix material Substances 0.000 claims description 32
- 238000012545 processing Methods 0.000 claims description 22
- 238000004590 computer program Methods 0.000 claims description 13
- 230000001419 dependent effect Effects 0.000 claims description 11
- 230000015654 memory Effects 0.000 claims description 11
- 230000002123 temporal effect Effects 0.000 claims description 11
- 238000013473 artificial intelligence Methods 0.000 abstract description 5
- 238000003058 natural language processing Methods 0.000 abstract description 5
- 238000013135 deep learning Methods 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 15
- 239000000306 component Substances 0.000 description 14
- 238000004891 communication Methods 0.000 description 11
- 239000013598 vector Substances 0.000 description 11
- 238000005516 engineering process Methods 0.000 description 9
- 238000013519 translation Methods 0.000 description 9
- 230000006870 function Effects 0.000 description 6
- 238000000605 extraction Methods 0.000 description 4
- 238000012986 modification Methods 0.000 description 4
- 230000004048 modification Effects 0.000 description 4
- 230000003993 interaction Effects 0.000 description 3
- 238000012015 optical character recognition Methods 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 238000013145 classification model Methods 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 239000008358 core component Substances 0.000 description 2
- 230000000877 morphologic effect Effects 0.000 description 2
- 238000003062 neural network model Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 210000000784 arm bone Anatomy 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 238000013475 authorization Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 239000004615 ingredient Substances 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 208000011977 language disease Diseases 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000012805 post-processing Methods 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 239000000523 sample Substances 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000001953 sensory effect Effects 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/253—Grammatical analysis; Style critique
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Machine Translation (AREA)
Abstract
The disclosure provides a text analysis method, a text analysis device, an electronic device, a storage medium and a program product, and relates to the technical field of artificial intelligence, in particular to the technical field of deep learning and the technical field of natural language processing. The specific implementation scheme is as follows: carrying out syntactic analysis on the text to be analyzed to obtain a syntactic tree of the text to be analyzed; determining a syntactic analysis result of the text to be analyzed based on the syntactic tree; carrying out error correction analysis on the text to be analyzed to obtain an error correction result of the text to be analyzed; and determining a target analysis result based on the syntax analysis result and the error correction result.
Description
Technical Field
The present disclosure relates to the field of artificial intelligence technologies, and in particular, to the field of deep learning technologies and natural language processing technologies, and in particular, to a text analysis method, an apparatus, an electronic device, a storage medium, and a program product.
Background
With the rise of artificial intelligence, natural language processing technology has become an important branch of the field of artificial intelligence. Basic tasks of natural language processing techniques may include: syntactic analysis, semantic analysis, expression error analysis, and the like. The natural language processing technology is widely applied, for example, the technology can be applied to a human-computer interaction scene, a text translation scene and a search scene.
Disclosure of Invention
The disclosure provides a text analysis method, a text analysis device, an electronic device, a storage medium and a program product.
According to an aspect of the present disclosure, there is provided a text analysis method including: carrying out syntactic analysis on a text to be analyzed to obtain a syntactic tree of the text to be analyzed; determining a syntactic analysis result of the text to be analyzed based on the syntactic tree; performing error correction analysis on the text to be analyzed to obtain an error correction result of the text to be analyzed; and determining a target analysis result based on the syntax analysis result and the error correction result.
According to another aspect of the present disclosure, there is provided a text analysis apparatus including: the first analysis module is used for carrying out syntactic analysis on a text to be analyzed to obtain a syntactic tree of the text to be analyzed; a first determining module, configured to determine a parsing result of the text to be analyzed based on the syntax tree; the second analysis module is used for carrying out error correction analysis on the text to be analyzed to obtain an error correction result of the text to be analyzed; and a second determining module for determining a target analysis result based on the syntax analysis result and the error correction result.
According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to perform the method of the present disclosure.
According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform a method according to the present disclosure.
According to another aspect of the present disclosure, a computer program product is provided, comprising a computer program which, when executed by a processor, implements a method as disclosed herein.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 schematically illustrates an exemplary system architecture to which the text analysis method and apparatus may be applied, according to an embodiment of the present disclosure;
FIG. 2 schematically illustrates a flow diagram of a text analysis method according to an embodiment of the disclosure;
FIG. 3 schematically illustrates a block diagram of a syntactic analysis model according to an embodiment of the present disclosure;
FIG. 4 schematically shows a diagram of a syntax tree according to an embodiment of the present disclosure;
FIG. 5 schematically shows a flow diagram of error correction analysis according to an embodiment of the disclosure;
FIG. 6 schematically illustrates an application diagram of a text analysis method according to an embodiment of the present disclosure;
FIG. 7 schematically illustrates a block diagram of a text analysis apparatus according to an embodiment of the disclosure; and
FIG. 8 schematically illustrates a block diagram of an electronic device suitable for implementing a text analysis method in accordance with an embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
The disclosure provides a text analysis method, a text analysis device, an electronic device, a storage medium and a program product.
According to an embodiment of the present disclosure, a text analysis method includes: carrying out syntactic analysis on the text to be analyzed to obtain a syntactic tree of the text to be analyzed; determining a syntactic analysis result of the text to be analyzed based on the syntactic tree; carrying out error correction analysis on the text to be analyzed to obtain an error correction result of the text to be analyzed; and determining a target analysis result based on the syntax analysis result and the error correction result.
In the technical scheme of the disclosure, the collection, storage, use, processing, transmission, provision, disclosure, application and other processing of the personal information of the related user are all in accordance with the regulations of related laws and regulations, necessary confidentiality measures are taken, and the customs of the public order is not violated.
In the technical scheme of the disclosure, before the personal information of the user is acquired or collected, the authorization or the consent of the user is acquired.
Fig. 1 schematically illustrates an exemplary system architecture to which the text analysis method and apparatus may be applied, according to an embodiment of the present disclosure.
It should be noted that fig. 1 is only an example of a system architecture to which the embodiments of the present disclosure may be applied to help those skilled in the art understand the technical content of the present disclosure, and does not mean that the embodiments of the present disclosure may not be applied to other devices, systems, environments or scenarios. For example, in another embodiment, an exemplary system architecture to which the text analysis method and apparatus may be applied may include a terminal device, but the terminal device may implement the text analysis method and apparatus provided in the embodiments of the present disclosure without interacting with a server.
As shown in fig. 1, the system architecture 100 according to this embodiment may include terminal devices 101, 102, 103, a network 104 and a server 105. Network 104 is the medium used to provide communication links between terminal devices 101, 102, 103 and server 105. Network 104 may include various connection types, such as wired and/or wireless communication links, and so forth.
A user may use terminal devices 101, 102, 103 to interact with a server 105 over a network 104 to receive or send messages or the like. The terminal devices 101, 102, 103 may have installed thereon various communication client applications, such as a knowledge reading application, a web browser application, a search application, a translation application, a sentence analysis application, an instant messaging tool, a mailbox client, and/or social platform software, etc. (by way of example only).
The terminal devices 101, 102, 103 may be various electronic devices having display screens and supporting web browsing, including but not limited to smart phones, tablets, laptop and desktop computers, electronic pens, and the like.
The server 105 may be a server providing various services, such as a background management server (for example only) providing support for content browsed by the user using the terminal devices 101, 102, 103. The background management server may analyze and otherwise process the received text such as the user request, and feed back the processing result (e.g., a web page, information, or data obtained or generated according to the user request) to the terminal device, so as to highlight the terminal device.
It should be noted that the text analysis method provided by the embodiment of the present disclosure may be generally executed by the terminal device 101, 102, or 103. Accordingly, the text analysis apparatus provided in the embodiment of the present disclosure may also be disposed in the terminal device 101, 102, or 103.
Alternatively, the text analysis method provided by the embodiment of the present disclosure may be generally executed by the server 105. Accordingly, the text analysis apparatus provided by the embodiments of the present disclosure may be generally disposed in the server 105. The text analysis method provided by the embodiments of the present disclosure may also be performed by a server or a server cluster that is different from the server 105 and is capable of communicating with the terminal devices 101, 102, 103 and/or the server 105. Accordingly, the text analysis apparatus provided in the embodiment of the present disclosure may also be disposed in a server or a server cluster different from the server 105 and capable of communicating with the terminal devices 101, 102, 103 and/or the server 105.
For example, a user may collect relevant images of a text to be analyzed through the terminal devices 101, 102, and 103, identify text content to be analyzed in the images, send the text content to the server 105, and perform syntactic analysis and error correction analysis on the text to be analyzed by the server 105 to obtain a syntactic tree of the text to be analyzed and an error correction result of the text to be analyzed; determining a syntactic analysis result of the text to be analyzed based on the syntactic tree; and determining a target analysis result based on the syntax analysis result and the error correction result. Or by a server or server cluster capable of communicating with the terminal devices 101, 102, 103 and/or the server 105, and finally determine the target analysis result.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for an implementation.
It should be noted that the sequence numbers of the respective operations in the following methods are merely used as representations of the operations for description, and should not be construed as representing the execution order of the respective operations. The method need not be performed in the exact order shown, unless explicitly stated.
FIG. 2 schematically shows a flow diagram of a text analysis method according to an embodiment of the disclosure.
As shown in fig. 2, the method includes operations S210 to S240.
In operation S210, the text to be analyzed is parsed, and a syntax tree of the text to be analyzed is obtained.
In operation S220, a parsing result of the text to be analyzed is determined based on the syntax tree.
In operation S230, the text to be analyzed is subjected to error correction analysis, and an error correction result of the text to be analyzed is obtained.
In operation S240, a target analysis result is determined based on the syntax analysis result and the error correction result.
According to the embodiment of the disclosure, the text to be analyzed may be sentence content input by a user, or may be an image uploaded by the user, and the text to be analyzed is obtained by converting the image through an OCR (Optical Character Recognition) technology. The language of the text to be analyzed is not limited, and may be, for example, chinese, english, or other types of languages.
According to the embodiment of the disclosure, the syntactic analysis may include syntactic analysis or lexical analysis, and the syntactic analysis may be performed on the text to be analyzed to obtain a syntactic tree of the text to be analyzed. The syntax tree may include a plurality of nodes and an association relationship of the plurality of nodes with each other. The plurality of nodes may characterize a plurality of bytes (tokens) in the text to be analyzed. The type of the syntax tree may be determined according to the type of the association relationship. For example, the syntax tree may be determined to be a dependent syntax tree or a component syntax tree based on the type of association of the syntax tree.
According to the embodiment of the disclosure, the syntax tree is processed, and the syntax analysis result of the text to be analyzed can be determined. The parsing results may include one or more of lexical analysis results, syntax analysis results, and syntax analysis results.
According to embodiments of the present disclosure, the error correction analysis may include a primary error type analysis, such as a lexical spelling error, a grammatical error, a formatting error, and the like. The error correction analysis may also include secondary error type analysis, such as error correction analysis of the types of predicate consistency errors, preposition errors, noun errors, and the like.
According to the embodiment of the disclosure, the error correction analysis can be performed on the text to be analyzed, so that the error correction result of the text to be analyzed is obtained. The error correction result may include one or more of a primary error type identification result and a secondary error type identification result. But is not limited thereto. The error correction result may also include the result of the correct byte corresponding to the erroneous byte.
According to an embodiment of the present disclosure, determining the target analysis result based on the syntax analysis result and the error correction result may refer to: and taking the grammar analysis result and the error correction result as a target analysis result together. But is not limited thereto. Determining a target analysis result based on the syntax analysis result and the error correction result, which may also refer to: and updating the grammar analysis result based on the error correction result, and taking the updated grammar analysis result and the error correction result as a target analysis result. Determining the target analysis result based on the syntax analysis result and the error correction result may also refer to: and taking the grammar analysis result (or the updated grammar analysis result), the translation result of the text to be analyzed and the error correction result as the target analysis result.
By using the text analysis method provided by the embodiment of the disclosure, error correction and grammar analysis are combined, so that the target analysis result comprises a judgment result of whether the text to be analyzed is expressed correctly or not, and a grammar analysis result of lexical analysis, syntax analysis and grammar analysis of the text to be analyzed, so that the target analysis result is comprehensive, the processing effect is high, and the use experience of a user is improved.
According to another embodiment of the present disclosure, parsing the text to be analyzed in operation S210 to obtain a syntax tree of the text to be analyzed may further include: and processing the text to be analyzed to obtain a first matrix and a second matrix related to the text to be analyzed. And carrying out syntactic analysis on the first matrix and the second matrix to obtain a syntactic tree.
According to the embodiment of the disclosure, the first matrix can be used for characterizing whether the plurality of nodes have dependency relationships with each other, and the second matrix is used for characterizing the dependency relationship categories of the plurality of nodes with each other.
According to the embodiment of the disclosure, the text to be analyzed can be processed by utilizing a syntactic analysis model, and a first matrix and a second matrix related to the text to be analyzed are obtained. The network structure of the syntactic analysis model is not limited, and the network structure of the syntactic analysis model shown in fig. 3 may be used.
Fig. 3 schematically illustrates a block diagram of a syntactic analysis model according to an embodiment of the present disclosure.
As shown in fig. 3, the syntax analysis model may include, in order, an encoder 320, a plurality of parallel multi-layer perceptrons 330 (MLP), and a plurality of parallel affine-doubly functions 340 (Bi-affine).
As shown in fig. 3, text 310 to be analyzed may be input into an encoder 320, resulting in an encoded vector. The encoded vectors are input into four parallel multi-layered perceptrons 330, respectively, resulting in four feature vectors for each byte (token), respectively. The four feature vectors may include a feature vector characterizing the dependent arc-head, a feature vector characterizing the dependent arc-tail, a feature vector characterizing the correspondence with the dependent arc-head, and a feature vector characterizing the correspondence with the dependent arc-tail. It is understood that the dependent arc head may refer to the modified word in the grammatical relationship and also called the dominant word, and the dependent arc tail may refer to the modified word in the grammatical relationship and also called the dependent word.
According to the embodiment of the disclosure, a multi-layer BilSTM (Bi-directional Long Short-Term Memory network) can be used as the encoder, for example, the BilSTM in the encoder is 3 layers. But is not limited thereto. Gated cyclic units may also be employed as encoders. Can be determined according to actual conditions.
As shown in fig. 3, four feature vectors are input into two parallel affine-double functions 340, resulting in a first matrix 350 and a second matrix 360.
As shown in fig. 3, the first matrix 350 and the second matrix 360 are input to a syntax parser 370 for parsing the first matrix 350 and the second matrix 360 to obtain a syntax tree 380.
In accordance with embodiments of the present disclosure, a syntax tree, also referred to as a multi-way tree, may include a dependency syntax tree or a component syntax tree. The syntax parser may be determined according to a syntax tree type. For example, for the dependency syntax tree, the dependency syntax result may be parsed by one or more of an Eisner method (a lexical probability model), a tney syntax parser, a constrained dependency grammar, a decision analysis model, and the like, and then the dependency syntax tree may be constructed.
Fig. 4 schematically shows a schematic diagram of a syntax tree according to an embodiment of the present disclosure.
For the text "I send a present to her." to be analyzed, a syntax tree as shown in fig. 4 is obtained according to the syntax analysis. The syntax tree includes nodes 410 for characterizing bytes in the text to be analyzed, such as, but not limited to, nodes for bytes "I", "send", "a", "present", "to", "her", and also includes nodes for punctuation ".", and virtual ROOT nodes, such as "ROOT", for characterizing sentence beginnings or end-of-sentences. In addition, the nodes in the syntax tree also include byte information 430 of bytes. The byte information may include part-of-speech information of the byte. As shown in fig. 4, PRP, VBD, DT, JJ, TO, PRP $ and the like are part-of-speech tags, which are used TO represent part-of-speech information of "I", "send", "a", "present", "TO", and "her" bytes, respectively.
As shown in FIG. 4, a plurality of dependency edges 420, which may also be referred to as dependency arcs, are also included in the syntax tree to characterize the dependency relationship between two nodes. For example, a dependency edge between "ROOT" and "present", a dependency edge between "I" and "present", a dependency edge between "present" and "her", a dependency edge between "present" and ".", a dependency edge between "to" and "her", and a dependency edge between "a" and "present". A single-ended end point arrow is arranged on each dependency edge, and the master-slave relationship between the two bytes can be determined based on the end point arrows, for example, a node at the end without the end with the arrow is a dependency arc head node, and a node at the end with the arrow is a dependency arc tail node. In addition, each dependency edge is labeled with a dependency label 440, such as a label ROOT for the dependency between "ROOT" and "send", a label nsubj for the dependency between "I" and "send", a label del for the dependency between "a" and "present", a label obj for the dependency between "send" and "present", a label case for the dependency between "to" and "her", a label obl for the dependency between "send" and "her", and a label punct for the dependency between "send" and ". j. The dependency type between two nodes may be determined from the label 440 on the dependency edge.
According to the embodiment of the disclosure, the syntax tree is a syntax-related analysis result, and in the case of obtaining the syntax tree, a syntax parser can be used for further analysis on the basis of the syntax tree, so as to obtain a more specific analysis result about a lexical, a syntax, a syntactic structure and the like. For example, in the case where the text to be analyzed is in English, corresponding parser may be designed for different grammars in English. The syntax tree can be processed by the parser based on the result of the parser, e.g., syntax tree, to perform functions such as clause identification, clause category identification, special structure identification, verb identification, verb-not-predicate-verb identification, statement type identification, morphism identification, and temporal identification.
According to another embodiment of the present disclosure, for determining a parsing result of the text to be analyzed based on the syntax tree in operation S210, it may further include: determining a target dependency edge which accords with the target dependency relationship from the plurality of dependency edges; determining a first target node for characterizing a predetermined part of speech from the plurality of nodes based on the target dependency edges; and determining a grammatical analysis result of the text to be analyzed based on the first target node.
According to an embodiment of the present disclosure, the parsing result may include a byte component, e.g., a part of speech of a byte. The grammar analysis result comprises generalized parts of speech of verbs, adjectives, nouns and other bytes, and the grammar analysis result can also comprise more specific part of speech results. For example, the verb-subject verb, the verb-assistant, the verb-episodic verb, and the substantive verb in the verb, and the more specific part-of-speech result such as the verb-predicate verb and the verb-non-predicate verb in the substantive verb are included.
According to the embodiment of the disclosure, the generalized part-of-speech result of each of the plurality of nodes, such as parts-of-speech such as verbs, adjectives and nouns, can be determined based on the part-of-speech information indicated by each of the plurality of nodes in the syntax tree. A first target node for characterizing the predetermined part of speech may also be determined from the plurality of nodes based on the target dependency of the target dependency edge and the part of speech information corresponding to the target dependency edge. The first target node is a more specific part-of-speech result, such as a part-of-speech result of a verb predicate.
According to the embodiment of the present disclosure, taking the part-of-speech result that the parsing result includes the predicate verb as an example, the "I send a present to her. The target dependency edge can be determined according to the dependency relationship among a plurality of nodes marked in the syntax tree. Such as a dependency edge between "I" and "send" and a dependency edge between "send" and "present". The dependency relationship corresponding to the dependency edge between the 'I' and the 'sent' is 'nsubj', and is used for characterizing the cardinal predicate relationship. The dependency relationship between the "sent" and the "present" is "obj", and is used for characterizing the guest moving relationship. A first target node, e.g., "send," from the plurality of nodes characterizing the predetermined part-of-speech, e.g., verb part-of-speech, may be determined based on the dependency of the target dependency edge.
According to another embodiment of the present disclosure, the predetermined part of speech may refer to a generalized part of speech, and in a case where a plurality of nodes for characterizing the predetermined part of speech are determined based on the target dependency edges, the plurality of nodes for characterizing the predetermined part of speech may be regarded as a plurality of initial first target nodes. The following operations are further performed to determine a first target node from the plurality of nodes.
For example, a plurality of initial first target nodes characterizing a predetermined part of speech are determined from the plurality of nodes based on the target dependency edges. And determining a grammatical relation between the initial first target node and the adjacent node aiming at each initial first target node in the plurality of initial first target nodes to obtain a plurality of grammatical relations. The neighboring node is a node adjacent to the initial first target node. A first target node is determined from a plurality of initial first target nodes based on a plurality of grammatical relations.
According to embodiments of the present disclosure, the grammatical relationship may refer to a grammatical structure relationship, a fixed collocation expression relationship, and the like. A predicate verb in a phrase of a fixed collocation expression may be used as a pseudo-predicate verb based on the grammatical relationship. The nodes of the verb parts of speech of the pseudo predicate may be deleted, and the nodes of the verb parts of speech of the predicate may be the first target nodes.
According to an embodiment of the present disclosure, the text "I'm going to leave at the end of this month" is analyzed. According to the syntax tree, a plurality of first initial target nodes for characterizing the predetermined part-of-speech as a verb part-of-speech are determined to include "going" and "leave". In this case, the determination may be made in conjunction with nodes other than the plurality of first initial target nodes among the plurality of nodes in the text to be analyzed. For example, determining the grammatical relation between the first initial target node "going" and the neighboring node' "m", and the grammatical relation between "going" and "to", determining "be going to" as the grammatical structure of the fixed collocation expression. Further, based on the grammatical relationship between "to" and "leave", the "leave" part-of-speech may be determined to be the verb part-of-speech "VB". Then based on the plurality of grammatical relationships, "leave" is determined to satisfy the first target node for characterizing the predicate verb part-of-speech.
By utilizing the grammar analysis result determining mode provided by the embodiment of the disclosure, the initial result can be cleaned through a multi-judgment post-processing mode, so that the grammar analysis result is prevented from being illegal or wrong, and the accuracy of the grammar analysis result is further improved.
According to other embodiments of the present disclosure, a fixed structure of multiple nodes, for example, a fixed collocations term "be going to" may be determined by parts of speech of multiple nodes in a text to be analyzed and the dependency relationship and grammatical relationship of the multiple nodes, and the fixed structure between the multiple nodes is taken as a sub-result in a grammatical analysis result.
According to an embodiment of the present disclosure, the parsing result may further include at least one of: temporal results, morphic results, statement type results. Temporal results may refer to current, past, future, etc. temporal results. The morphic results may refer to active morphic results, passive morphic results, and the like. Statement type results may refer to statement type results such as interrogative statements, exclamatory statements, and the like.
The syntax analysis result of the above type can be determined in the following manner. For example, where a first target node is determined to be used to characterize a predicate node, based on the first target node, a suffix form of the first target node is determined. In a case where it is determined that the end-of-word form of the first target node coincides with the predetermined end-of-word form, a second target node related to the first target node is determined from the plurality of second nodes. The plurality of second nodes includes nodes of the plurality of nodes other than the first target node. And determining a syntactic analysis result of the text to be analyzed based on the second target node and the first target node.
According to an embodiment of the present disclosure, the word ending forms may include a word ending form in the past such as an added word ending form, a word ending form in a third population such as an added es or s word ending form, and a word ending form in progress such as an added ing word ending form.
According to an embodiment of the present disclosure, the second target node related to the first target node may include: and the auxiliary word ending form expresses nodes of semantic or syntactic structures in the text to be analyzed.
Taking the syntax analysis result as a temporal result as an example, the suffix form of the first target node may be determined when the first target node in the text to be analyzed "She is doing her homework" is determined to be used for representing the predicate node. For example, the suffix form of "d 0" is "ing" form, and it can be determined that the suffix form coincides with a predetermined suffix form when it was performed in the past or when it was performed at present. In this case, a second target node associated with the first target node is determined from the plurality of second nodes, for example, the second target node is ". A parsing result of the text to be analyzed with respect to the temporal result may be determined based on the first target node and the second target node, for example, the parsing result is a temporal result in progress.
Taking the syntax analysis result as a morphological result as an example, the suffix form of the first target node may be determined when the first target node in the text "It is added to the bag. For example, in the case where the suffix form of "add" is the form of "ed", it may be determined that the suffix form of the first target node coincides with the predetermined suffix form of the passive speech state. In this case, a second target node related to the first target node may be determined from the plurality of second nodes. For example, the second target node is ". Based on the second target node and the first target node, a syntactic analysis result of the text to be analyzed about the morphic result may be determined based on the first target node and the second target node, for example, the syntactic analysis result is a morphic result in a passive morphic.
Taking the parsing result as the sentence type result as an example, it can be determined that the text to be analyzed is "CouldI drag it? "is used to characterize the predicate node, the suffix form of the first target node is determined. For example, in the case where the suffix form of "drink" is a null form, when it is determined that the suffix form of the first target node coincides with the predetermined suffix form, the second target node related to the first target node may be determined from the plurality of second nodes. For example, are the second target nodes "could" and "? ". A parsing result of the text to be parsed with respect to the sentence type result, for example, a sentence type result of which the parsing result is a general question sentence, may be determined based on the first target node and the second target node.
According to the embodiment of the present disclosure, a parsing result of a text to be analyzed may be determined based on a first target node for characterizing a predicate verb and a second target node related to the first target node. The second target node matches the type of the parsing result. And determining a grammar analysis result of the text to be analyzed based on one or more of the first target node, the second target node, the position information of the nodes, the part of speech information and punctuation. The more the types and the number of the adopted reference information are, the more the accuracy of the grammar analysis result is improved.
By utilizing the determining mode of the syntactic analysis result provided by the embodiment of the disclosure, various different types of syntactic analysis results can be determined, and the results are various and accurate.
According to an embodiment of the present disclosure, the syntax analysis result may further include a sentence analysis result.
According to an embodiment of the present disclosure, for determining a parsing result of the text to be analyzed based on the syntax tree in operation S220, the method may further include: based on the plurality of nodes in the syntax tree, a third target node for characterizing the lead word is determined from the plurality of nodes. And determining a target sentence associated with the third target node from the text to be analyzed based on the third target node. And determining a syntactic analysis result based on the third target node and the target statement.
According to embodiments of the present disclosure, the introductory word may refer to the introductory word used to characterize clause relationships. The third target nodes used to characterize the leader may include nodes such as "that", "what", "which", "where", or "no more … th".
According to an embodiment of the present disclosure, the target sentence associated with the third target node may refer to a clause. Taking the text to be analyzed as english as an example, one sentence may include a plurality of nested sentences, such as a main sentence and a plurality of subordinate sentences. The target sentence refers to a clause in the text to be analyzed.
For example, The text to be analyzed is "The heart is no more than The ingredient of The stock for The y arm bone controlled by The train". A plurality of nodes in the syntax tree may be traversed to determine whether the current node is a third target node that characterizes the lead word. It should be noted that the third target node is not necessarily a byte. Thus, where it is determined that there is a node that characterizes the lead word, traversal of other nodes may continue to determine whether there are multiple nodes that characterize the lead word. The third target node is determined to be "no more … thann and for". The target statement "the y art bounded by the train" associated with the third target node may be determined based on the third target node. For the text to be analyzed, if the guidance of "no more … than" is not a complete sentence and cannot be used as the target sentence, the guidance word node of "no more … than" can be used as an illegal result and is discarded without consideration.
According to the embodiment of the present disclosure, with the text analysis method, it is possible to determine not only whether there is a grammatical analysis result of a clause in a text to be analyzed, but also, in a case where it is determined that there is a clause in the text to be analyzed, a grammatical analysis result regarding a sentence pattern category result of a target sentence.
For example, in the case where it is determined that the target sentence exists in the text to be analyzed, the text to be analyzed is subjected to sentence type recognition processing, and a syntax analysis result regarding a sentence type result of the target sentence is obtained.
According to the embodiment of the disclosure, the text to be analyzed can be input into the clause classification model and the sentence classification result about the target sentence can be output under the condition that the target sentence exists in the text to be analyzed.
In accordance with embodiments of the present disclosure, sentence type category results may include subject clauses, table clauses, object clauses, conditional clauses, and the like.
In accordance with embodiments of the present disclosure, the clause classification model may include a pre-trained deep learning model. For example, one or more of a convolutional neural network model, a graph neural network model, a recurrent neural network model, and the like.
According to an embodiment of the present disclosure, for performing error correction analysis on the text to be analyzed in operation S230 to obtain an error correction result of the text to be analyzed, the method may further include: and identifying the text to be analyzed to obtain an error correction identification result. And under the condition that the error bytes exist in the text to be analyzed based on the error correction identification result, correcting the error bytes based on the error correction identification result to obtain correct bytes corresponding to the error bytes. And determining an error correction result based on the correct byte and the error correction identification result.
According to the embodiment of the disclosure, the error correction recognition result of the text to be analyzed can be determined by comparing the predetermined grammar rule or the suffix form of the predetermined bytes with a plurality of bytes in the text to be analyzed. For example, in the case that the grammar of the text to be analyzed conforms to the predetermined grammar rule, it is determined that there is no error related to the grammar in the text to be analyzed, and on the contrary, it is determined that there is an error in the text to be analyzed. For example, in the case where the end-of-word form of a plurality of bytes of the text to be analyzed coincides with the end-of-word form of a predetermined byte, it is determined that there is no spelling or vocabulary error in the text to be analyzed, and conversely, it is determined that there is an error in the text to be analyzed.
According to an alternative embodiment of the present disclosure, the text to be analyzed may be processed by using an error correction analysis model, and an error correction recognition result about the text to be analyzed is obtained. The error correction analysis Model may include a Part-Of-Speech tagging (POS tagging) Model, for example, may include one or more Of a Hidden Markov Model (HMM) or a Conditional Random Field (CRFs). The error correction analysis operation shown in fig. 5 may be performed in combination with the error correction analysis model to obtain the error correction recognition result.
Fig. 5 schematically shows a flow diagram of error correction analysis according to an embodiment of the present disclosure.
As shown in fig. 5, a text 510 to be analyzed may be input into a feature extraction model 520, resulting in a text feature vector. The text feature vector is input into the error correction analysis model 530, and an error correction recognition result 540 is obtained. In the case where it is determined that there is an erroneous byte in the text to be analyzed based on the error correction recognition result 540, the erroneous byte is corrected based on the error correction recognition result 540, resulting in a correct byte 550 corresponding to the erroneous byte. Based on the correct byte 550 and the error correction recognition result 540, an error correction result 560 is determined.
According to the embodiment of the present disclosure, the feature extraction model may include a convolutional neural network model, but is not limited thereto, and may also include an erni (enhanced Language Representation with information entities) or other feature extraction models for extracting or encoding features of a text to be analyzed.
According to an embodiment of the present disclosure, the error correction recognition result may be an error category label. For example, the category label may include a consistent error category label, a preposition error category label, a noun all-lattice error category label, a format error category label, a vocabulary error category label, and the like, and may also include a no-error category label. A plurality of error category labels corresponding to the plurality of bytes one to one may be matched according to the number of the plurality of bytes in the text to be analyzed. Therefore, the error correction identification result is more accurate.
According to the embodiment of the disclosure, when a plurality of error category labels corresponding to a plurality of bytes one to one are labels without error categories, it is determined that there is no error byte in the text to be analyzed. Conversely, when it is determined that there is an error byte in the text to be analyzed, the error byte may be corrected based on the error correction recognition result, for example, the error category label, to obtain a correct byte corresponding to the error byte.
According to the embodiment of the present disclosure, taking the text to be analyzed as "this mouse probe", the text to be analyzed includes 3 bytes. And obtaining an error category label corresponding to each byte after the processing of the feature extraction model and the error correction identification model. For example, the error correction category label of the byte "this" is "capital", and "capital" is used to characterize the error correction recognition result with the capitalized initials in the format error. The error correction category label about the byte of "cause" is "verb _ vb _ vbz", and "verb _ vb _ vbz" is used for characterizing the error correction recognition result of the third individual named single verb error in the grammar error. The error correction category label of the byte "provlem" is "plural", which is used for representing the error correction recognition result of single and multiple errors in the syntax error.
According to the embodiment of the disclosure, the error bytes can be corrected according to the error correction identification result about the error bytes in the text to be analyzed and the predetermined transformation strategy, so as to obtain the correct bytes corresponding to the error bytes. For example, the correct byte "This" may be corrected to be capitalized for "This". The correct byte "cause" can be corrected for "cause" to the third person, which refers to the singular plus s. The correct byte "schemes" can be corrected to the complex form for "schemes". Further, the correct text "This cities publications" is obtained.
According to another embodiment of the present disclosure, for operation S240, determining a target analysis result based on the syntax analysis result and the error correction recognition result, may further include: and under the condition that the error bytes exist in the text to be analyzed based on the error correction identification result, correcting the error bytes based on the error correction identification result to obtain correct bytes corresponding to the error bytes. And updating the grammar analysis result by using the correct byte to obtain an updated grammar analysis result. And determining a target analysis result based on the updated grammar analysis result, the correct byte and the error correction recognition result.
According to an embodiment of the present disclosure, for an operation of updating a parsing result with a correct byte to obtain an updated parsing result, a request for updating the parsing result from a user may be received before the operation is performed. And responding to the request, and executing the operation of updating the grammar analysis result by using the correct byte to obtain the updated grammar analysis result.
According to the embodiment of the disclosure, the target analysis result is determined based on the updated grammatical analysis result, the correct byte and the error correction recognition result, so that the comprehensiveness and the accuracy of the target analysis result are improved.
According to an embodiment of the present disclosure, an operation of receiving a text to be analyzed may be performed before performing syntactic analysis and error correction analysis of the text to be analyzed in the text analysis method. Receiving the text to be analyzed may refer to receiving the text to be analyzed, which is input by the user in a text manner, but is not limited to this, and may also receive an image containing the text to be analyzed, which is input by the user. The OCR technology can be utilized to perform target recognition on the received image to obtain a text to be analyzed. The user can input the text to be analyzed quickly by inputting the image, so that the user does not need to type, and the hands of the user are obviously liberated.
According to the embodiment of the disclosure, the terminal device or the server can receive different types of information such as character types or image types, and determine texts to be recognized from the information, so that the use experience of a user is improved.
According to the embodiment of the present disclosure, the data amount of the text to be analyzed is not limited, for example, the text to be analyzed may include one sentence, or may include a plurality of sentences. In case the text to be analyzed comprises a plurality of sentences, the sentence break can be done automatically.
Fig. 6 schematically shows an application diagram of a text analysis method according to an embodiment of the present disclosure.
As shown in fig. 6, text to be analyzed 620 may be displayed on the display interface 610. And a control for characterizing the text to be analyzed is presented at the display interface 610. Such as the control 631 for user request parsing or the control 632 for error correction analysis. In a case where the user clicks the control member to issue a request for analyzing the text to be analyzed, the terminal device or the server may perform an operation of the text analysis method in response to the request, to obtain a target analysis result of the text to be analyzed. The target analysis results may be displayed on the display interface 610.
According to an embodiment of the present disclosure, the display manner may include one or more of highlighting, changing colors, bolding, labeling, bracketing, underlining, and the like.
According to the embodiment of the disclosure, the display interface can be divided into a plurality of display areas for displaying different types of target analysis results.
As shown in fig. 6, the visualized core component 641 in the target analysis result is displayed. The part-of-speech result, for example, core bytes such as subject, predicate, and object, may be displayed on the text to be analyzed, color-transformed and highlighted according to different display rules, or italicized, underlined, and the like. The sentence analysis result is, for example, a clause in the text to be analyzed is marked with parentheses. And (5) marking the fixed matching language in the text to be analyzed in a bolding mode.
According to the embodiment of the disclosure, for the part of speech result, sentence pattern classification result, grammar result and the like in the grammar analysis result, the user can know the components in the text to be analyzed at a glance in a visualization mode.
As shown in fig. 6, the machine translation result 642 of the target analysis result is displayed. And directly displaying the machine translation result of the text to be analyzed below the text to be analyzed. Through the Chinese-English comparison of the machine translation result, the user can very intuitively know the Chinese meaning of the text to be analyzed, such as an English sentence, and further know the semantics of the text to be analyzed by combining the visualized core component identification.
According to the embodiment of the disclosure, the dictionary interpretation result, English-American phonetic symbol and pronunciation of the word can be given in real time in response to the user's request for translation or reading in the case that the user clicks the word directly for the word unknown to the user.
As shown in fig. 6, a syntax analysis result 643 among the target analysis results is displayed. And displaying the grammar analysis result of the text to be analyzed in detail. For example, a sentence type result of a general question sentence of the text to be analyzed, a temporal result of a general present time, and a morphological result of a passive language are displayed. Also, for example, the recognition results of the clause types and positions of the clauses, and the like are displayed. For example, the part of speech of each word in the text to be analyzed and/or the components in the text to be analyzed are also displayed. The part of speech and the component of each word can be given in sequence according to the appearance sequence in a list form, so that the searching by a user is facilitated.
According to an embodiment of the present disclosure, intelligent error correction in target analysis results is displayed. In the case that errors such as grammar or spelling exist in the text to be analyzed, grammar correction can be performed, and a corrected recognition result is given. For example, the wrong place is colored so that the user can view the details and enter the error correction details page. And responding to a request for intelligent error correction of a user, updating the grammar analysis result of the text to be analyzed according to the correct byte, refreshing the page, and displaying the updated grammar analysis result. In order to improve the accuracy of the parsing results.
As shown in fig. 6, the relevant core vocabulary 644 in the target analysis result is displayed. The key core words, the listed words and the paraphrases thereof can be intelligently analyzed aiming at the text to be analyzed, so that the user can quickly search and collect words and the vocabulary amount is timely improved.
According to the embodiment of the disclosure, the text to be analyzed can be English, a text analysis method can be utilized to form a one-stop English grammar learning tool, and a deep and full-stack English grammar parsing technology is further provided. The method supports the analysis of the lexical and syntactic rules in the text to be analyzed, and basically supports all knowledge points of English grammar. Therefore, the core competitiveness of the text analysis method is improved, and the satisfaction degree of English learning requirements of users is improved.
According to other embodiments of the present disclosure, machine translation, error correction, and syntax analysis can be organically combined to fully exploit the respective values. By intelligently detecting the error correction result of the text to be analyzed, prompt and support one-key modification in time, the problem that the grammar analysis result has errors due to grammar analysis of the text to be analyzed containing language diseases is avoided. In addition, the Chinese paraphrase of the word in the text to be analyzed is organically supplemented through machine translation and a dictionary, so that a user can easily know and master the sentence, and the problem that the sentence is not clear due to the fact that the user knows the grammar but does not know the meaning of the word is avoided.
Fig. 7 schematically shows a block diagram of a text analysis apparatus according to an embodiment of the present disclosure.
As shown in fig. 7, the text analysis device 700 includes: a first analysis module 710, a first determination module 720, a second analysis module 730, and a second determination module 740.
The first analysis module 710 is configured to perform syntactic analysis on the text to be analyzed, so as to obtain a syntactic tree of the text to be analyzed.
A first determining module 720, configured to determine a parsing result of the text to be analyzed based on the syntax tree.
The second analysis module 730 is configured to perform error correction analysis on the text to be analyzed to obtain an error correction result of the text to be analyzed.
A second determining module 740 for determining a target analysis result based on the syntax analysis result and the error correction result.
According to the embodiment of the disclosure, the syntax tree comprises a plurality of nodes and a plurality of dependency edges, wherein the nodes are used for representing byte information in a text to be analyzed, and the dependency edges are used for representing the dependency relationship between the two nodes.
According to an embodiment of the present disclosure, the first determining module includes: a first determination submodule, a second determination submodule, and a third determination submodule.
And the first determination submodule is used for determining a target dependency edge which accords with the target dependency relationship from the plurality of dependency edges.
And a second determining submodule, configured to determine, based on the target dependency edge, a first target node from the plurality of nodes, the first target node being used for characterizing the predetermined part of speech.
And the third determining submodule is used for determining a grammatical analysis result of the text to be analyzed based on the first target node.
According to an embodiment of the present disclosure, the parsing result includes at least one of: temporal results, morphic results, statement type results. The first target node is a node for characterizing the predicate-lexical property.
According to an embodiment of the present disclosure, the third determination submodule includes: a first determination unit, a second determination unit, and a third determination unit.
The first determining unit is used for determining the word ending form of the first target node based on the first target node.
And a second determining unit configured to determine a second target node related to the first target node from a plurality of second nodes in a case where it is determined that the suffix form of the first target node coincides with the predetermined suffix form, wherein the plurality of second nodes includes nodes other than the first target node among the plurality of nodes.
And the third determining unit is used for determining a syntactic analysis result of the text to be analyzed based on the second target node and the first target node.
According to an embodiment of the present disclosure, the syntax analysis result includes a sentence analysis result.
According to an embodiment of the present disclosure, the first analysis module includes: a fourth determination submodule, a fifth determination submodule, and a sixth determination submodule.
And the fourth determining submodule is used for determining a third target node for representing the guide word from the plurality of nodes based on the plurality of nodes in the syntactic tree.
And the fifth determining sub-module is used for determining a target sentence associated with the third target node from the text to be analyzed based on the third target node.
And the sixth determining submodule is used for determining a syntactic analysis result based on the third target node and the target statement.
According to an embodiment of the present disclosure, the parsing result includes a sentence type result.
According to an embodiment of the present disclosure, the first analysis module further comprises: and a seventh determination sub-module.
And the seventh determining submodule is used for performing clause category identification processing on the text to be analyzed under the condition that the target sentence exists in the text to be analyzed, so as to obtain a syntax analysis result of the clause category of the target sentence.
According to an embodiment of the present disclosure, the second determination submodule includes: a fourth determination unit, a fifth determination unit, and a sixth determination unit.
And a fourth determining unit, configured to determine, based on the target dependency edges, a plurality of initial first target nodes used for characterizing the predetermined part of speech from the plurality of nodes.
A fifth determining unit, configured to determine, for each initial first target node in the multiple initial first target nodes, a syntactic relationship between the initial first target node and an adjacent node to obtain multiple syntactic relationships, where the adjacent node is a node adjacent to the initial first target node.
A sixth determining unit, configured to determine the first target node from the plurality of initial first target nodes based on the plurality of syntactic relations.
According to an embodiment of the present disclosure, the second analysis module includes: an error correction identification sub-module, a correction sub-module and an error correction determination sub-module.
And the error correction identification submodule is used for identifying the text to be analyzed to obtain an error correction identification result.
And the correction submodule is used for correcting the error bytes based on the error correction recognition result under the condition that the error bytes exist in the text to be analyzed based on the error correction recognition result so as to obtain correct bytes corresponding to the error bytes.
And the error correction determining submodule is used for determining an error correction result based on the correct byte and the error correction identification result.
According to an embodiment of the present disclosure, the second determining module includes: a correction submodule, an update submodule, and an eighth determination submodule.
And the correction submodule is used for correcting the error bytes based on the error correction recognition result under the condition that the error bytes exist in the text to be analyzed based on the error correction recognition result so as to obtain correct bytes corresponding to the error bytes.
And the updating submodule is used for updating the grammar analysis result by using the correct byte to obtain an updated grammar analysis result.
And the eighth determining submodule is used for determining a target analysis result based on the updated syntax analysis result, the correct byte and the error correction result.
According to an embodiment of the present disclosure, the first analysis module includes: a processing sub-module and a parsing sub-module.
The processing submodule is used for processing the text to be analyzed to obtain a first matrix and a second matrix related to the text to be analyzed, wherein the first matrix is used for representing whether the dependency relationship exists among the nodes, and the second matrix is used for representing the dependency relationship category among the nodes.
And the parsing submodule is used for carrying out syntax parsing on the first matrix and the second matrix to obtain a syntax tree.
According to an embodiment of the present disclosure, the text analysis apparatus further includes: the device comprises an identification module, a first display module and a second display module.
And the recognition module is used for carrying out target recognition on the received image to obtain a text to be analyzed.
The first display module is used for displaying the text to be analyzed on the display interface.
And the second display module is used for responding to the request for analyzing the text to be analyzed and displaying the target analysis result on the display interface.
The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.
According to an embodiment of the present disclosure, an electronic device includes: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method according to an embodiment of the disclosure.
According to an embodiment of the present disclosure, a non-transitory computer-readable storage medium having stored thereon computer instructions for causing a computer to perform a method as in an embodiment of the present disclosure.
According to an embodiment of the disclosure, a computer program product comprising a computer program which, when executed by a processor, implements a method as in an embodiment of the disclosure.
FIG. 8 illustrates a schematic block diagram of an example electronic device 800 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 8, the apparatus 800 includes a computing unit 801 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)802 or a computer program loaded from a storage unit 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data necessary for the operation of the device 800 can also be stored. The calculation unit 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to bus 804.
A number of components in the device 800 are connected to the I/O interface 805, including: an input unit 806, such as a keyboard, a mouse, or the like; an output unit 807 such as various types of displays, speakers, and the like; a storage unit 808, such as a magnetic disk, optical disk, or the like; and a communication unit 809 such as a network card, modem, wireless communication transceiver, etc. The communication unit 809 allows the device 800 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.
The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.
Claims (23)
1. A text analysis method, comprising:
carrying out syntactic analysis on a text to be analyzed to obtain a syntactic tree of the text to be analyzed;
determining a syntactic analysis result of the text to be analyzed based on the syntactic tree;
performing error correction analysis on the text to be analyzed to obtain an error correction result of the text to be analyzed; and
and determining a target analysis result based on the syntax analysis result and the error correction result.
2. The method according to claim 1, wherein the syntax tree includes a plurality of nodes for characterizing byte information in the text to be analyzed and a plurality of dependency edges for characterizing dependency relationships between two of the nodes;
the determining a parsing result of the text to be analyzed based on the syntax tree includes:
determining a target dependency edge which accords with the target dependency relationship from the plurality of dependency edges;
determining a first target node from the plurality of nodes for characterizing a predetermined part of speech based on the target dependency edges; and
and determining a syntax analysis result of the text to be analyzed based on the first target node.
3. The method of claim 2, wherein the parsing results include at least one of: temporal results, morpheme results, statement type results; the first target node is a node for representing the lexical property of the predicate;
the determining, based on the first target node, a syntax analysis result of the text to be analyzed includes:
determining a suffix form of the first target node based on the first target node;
determining a second target node related to the first target node from a plurality of second nodes in the case that the lexical form of the first target node is determined to be consistent with the predetermined lexical form, wherein the plurality of second nodes include nodes other than the first target node from the plurality of nodes; and
and determining a grammar analysis result of the text to be analyzed based on the second target node and the first target node.
4. The method of claim 1, wherein the parsing results include sentence analysis results;
the determining a parsing result of the text to be analyzed based on the syntax tree includes:
determining a third target node for characterizing a lead word from a plurality of nodes in the syntax tree based on the plurality of nodes;
determining a target sentence associated with the third target node from the text to be analyzed based on the third target node; and
determining the parsing result based on the third target node and the target statement.
5. The method of claim 4, wherein the parsing results include sentence classification results;
the determining a parsing result of the text to be analyzed based on the syntax tree further comprises:
and under the condition that the target sentence exists in the text to be analyzed, performing clause type identification processing on the text to be analyzed to obtain the grammatical analysis result of the sentence type result of the target sentence.
6. The method of claim 2, wherein the determining a first target node from the plurality of nodes to characterize a predetermined part of speech based on the target dependency edges comprises:
determining a plurality of initial first target nodes for characterizing a predetermined part of speech from the plurality of nodes based on the target dependency edges;
determining a grammatical relation between the initial first target node and an adjacent node aiming at each initial first target node in the plurality of initial first target nodes to obtain a plurality of grammatical relations, wherein the adjacent node is a node adjacent to the initial first target node; and
determining the first target node from the plurality of initial first target nodes based on the plurality of grammatical relationships.
7. The method according to any one of claims 1 to 6, wherein the performing error correction analysis on the text to be analyzed to obtain an error correction result of the text to be analyzed includes:
identifying the text to be analyzed to obtain an error correction identification result;
under the condition that the error bytes exist in the text to be analyzed is determined based on the error correction identification result, correcting the error bytes based on the error correction identification result to obtain correct bytes corresponding to the error bytes; and
and determining the error correction result based on the correct byte and the error correction identification result.
8. The method of claim 7, wherein the determining a target analysis result based on the parsing result and the error correction recognition result comprises:
under the condition that the error bytes exist in the text to be analyzed based on the error correction identification result, correcting the error bytes based on the error correction identification result to obtain correct bytes corresponding to the error bytes;
updating the syntactic analysis result by using the correct byte to obtain an updated syntactic analysis result; and
and determining the target analysis result based on the updated grammar analysis result, the correct byte and the error correction recognition result.
9. The method of claim 1, wherein the parsing the text to be analyzed to obtain a syntax tree of the text to be analyzed comprises:
processing the text to be analyzed to obtain a first matrix and a second matrix related to the text to be analyzed, wherein the first matrix is used for representing whether the dependency relationship exists among the nodes, and the second matrix is used for representing the dependency relationship category among the nodes; and
and carrying out syntactic analysis on the first matrix and the second matrix to obtain the syntactic tree.
10. The method of claim 1, further comprising:
carrying out target recognition on the received image to obtain the text to be analyzed;
displaying the text to be analyzed on a display interface; and
and responding to a request for analyzing the text to be analyzed, and displaying the target analysis result on the display interface.
11. A text analysis apparatus comprising:
the first analysis module is used for carrying out syntactic analysis on a text to be analyzed to obtain a syntactic tree of the text to be analyzed;
a first determining module, configured to determine a syntax analysis result of the text to be analyzed based on the syntax tree;
the second analysis module is used for carrying out error correction analysis on the text to be analyzed to obtain an error correction result of the text to be analyzed; and
and the second determination module is used for determining a target analysis result based on the grammar analysis result and the error correction result.
12. The apparatus according to claim 11, wherein the syntax tree includes a plurality of nodes for characterizing byte information in the text to be analyzed and a plurality of dependency edges for characterizing dependency relationships between two of the nodes;
the first determining module includes:
a first determining submodule, configured to determine, from the plurality of dependent edges, a target dependent edge that meets a target dependent relationship;
a second determining submodule, configured to determine, based on the target dependency edges, a first target node for characterizing a predetermined part of speech from the plurality of nodes; and
and the third determining submodule is used for determining a grammatical analysis result of the text to be analyzed based on the first target node.
13. The apparatus of claim 12, wherein the parsing results comprise at least one of: temporal results, morpheme results, statement type results; the first target node is a node for representing the lexical property of a predicate;
the third determination submodule includes:
a first determining unit, configured to determine a word end form of the first target node based on the first target node;
a second determination unit configured to determine a second target node related to the first target node from a plurality of second nodes, in a case where it is determined that the suffix form of the first target node coincides with a predetermined suffix form, wherein the plurality of second nodes includes nodes other than the first target node from the plurality of nodes; and
and the third determining unit is used for determining a syntactic analysis result of the text to be analyzed based on the second target node and the first target node.
14. The apparatus of claim 11, wherein the parsing results comprise sentence analysis results;
the first analysis module comprises:
a fourth determining submodule, configured to determine, based on a plurality of nodes in the syntax tree, a third target node for characterizing a lead word from the plurality of nodes;
a fifth determining submodule, configured to determine, based on the third target node, a target sentence associated with the third target node from the text to be analyzed; and
a sixth determining sub-module, configured to determine the syntax analysis result based on the third target node and the target statement.
15. The apparatus of claim 14, wherein the parsing result comprises a sentence classification result;
the first analysis module further comprises:
and the seventh determining submodule is used for performing clause type recognition processing on the text to be analyzed under the condition that the target sentence exists in the text to be analyzed, so as to obtain the syntactic analysis result of the sentence type result of the target sentence.
16. The apparatus of claim 12, wherein the second determination submodule comprises:
a fourth determining unit, configured to determine, based on the target dependency edges, a plurality of initial first target nodes used for characterizing a predetermined part of speech from the plurality of nodes;
a fifth determining unit, configured to determine, for each initial first target node in the multiple initial first target nodes, a syntactic relationship between the initial first target node and an adjacent node to obtain multiple syntactic relationships, where the adjacent node is a node adjacent to the initial first target node; and
a sixth determining unit, configured to determine the first target node from the plurality of initial first target nodes based on the plurality of grammatical relations.
17. The apparatus of any of claims 11 to 16, wherein the second analysis module comprises:
the error correction identification submodule is used for identifying the text to be analyzed to obtain an error correction identification result;
the correction submodule is used for correcting the error bytes based on the error correction recognition result under the condition that the error bytes exist in the text to be analyzed based on the error correction recognition result, so as to obtain correct bytes corresponding to the error bytes; and
and the error correction determining submodule is used for determining the error correction result based on the correct byte and the error correction identification result.
18. The apparatus of claim 17, wherein the second determining means comprises:
the correcting submodule is used for correcting the error bytes based on the error correction recognition result under the condition that the error bytes exist in the text to be analyzed based on the error correction recognition result to obtain correct bytes corresponding to the error bytes;
the updating submodule is used for updating the syntactic analysis result by using the correct byte to obtain an updated syntactic analysis result; and
an eighth determining submodule, configured to determine the target analysis result based on the updated parsing result, the correct byte, and the error correction result.
19. The apparatus of claim 11, wherein the first analysis module comprises:
the processing submodule is used for processing the text to be analyzed to obtain a first matrix and a second matrix related to the text to be analyzed, wherein the first matrix is used for representing whether the dependency relationship exists among a plurality of nodes, and the second matrix is used for representing the dependency relationship category among the nodes; and
and the parsing submodule is used for carrying out syntax parsing on the first matrix and the second matrix to obtain the syntax tree.
20. The apparatus of claim 11, further comprising:
the recognition module is used for carrying out target recognition on the received image to obtain the text to be analyzed;
the first display module is used for displaying the text to be analyzed on a display interface; and
and the second display module is used for responding to a request for analyzing the text to be analyzed and displaying the target analysis result on the display interface.
21. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1 to 10.
22. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1 to 10.
23. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1 to 10.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210828882.5A CN115034209A (en) | 2022-07-13 | 2022-07-13 | Text analysis method and device, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210828882.5A CN115034209A (en) | 2022-07-13 | 2022-07-13 | Text analysis method and device, electronic equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115034209A true CN115034209A (en) | 2022-09-09 |
Family
ID=83129734
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210828882.5A Withdrawn CN115034209A (en) | 2022-07-13 | 2022-07-13 | Text analysis method and device, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115034209A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115879446A (en) * | 2022-12-30 | 2023-03-31 | 北京百度网讯科技有限公司 | Text processing method, deep learning model training method, device and equipment |
-
2022
- 2022-07-13 CN CN202210828882.5A patent/CN115034209A/en not_active Withdrawn
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115879446A (en) * | 2022-12-30 | 2023-03-31 | 北京百度网讯科技有限公司 | Text processing method, deep learning model training method, device and equipment |
CN115879446B (en) * | 2022-12-30 | 2024-01-12 | 北京百度网讯科技有限公司 | Text processing method, deep learning model training method, device and equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20230142217A1 (en) | Model Training Method, Electronic Device, And Storage Medium | |
US8762130B1 (en) | Systems and methods for natural language processing including morphological analysis, lemmatizing, spell checking and grammar checking | |
CN113220836B (en) | Training method and device for sequence annotation model, electronic equipment and storage medium | |
CN109460552B (en) | Method and equipment for automatically detecting Chinese language diseases based on rules and corpus | |
Na | Conditional random fields for Korean morpheme segmentation and POS tagging | |
Jabbar et al. | An improved Urdu stemming algorithm for text mining based on multi-step hybrid approach | |
JP2020190970A (en) | Document processing device, method therefor, and program | |
CN111382571A (en) | Information extraction method, system, server and storage medium | |
Wong et al. | iSentenizer‐μ: Multilingual Sentence Boundary Detection Model | |
Zhang et al. | Design and implementation of Chinese Common Braille translation system integrating Braille word segmentation and concatenation rules | |
Varshini et al. | A recognizer and parser for basic sentences in telugu using cyk algorithm | |
Aliero et al. | Systematic review on text normalization techniques and its approach to non-standard words | |
Uchimoto et al. | Morphological analysis of the Corpus of Spontaneous Japanese | |
CN114970516A (en) | Data enhancement method and device, storage medium and electronic equipment | |
JP2010244385A (en) | Machine translation device, machine translation method, and program | |
CN115034209A (en) | Text analysis method and device, electronic equipment and storage medium | |
Singha et al. | Part of speech tagging in Manipuri with hidden markov model | |
Khoufi et al. | Statistical-based system for morphological annotation of Arabic texts | |
Hládek et al. | Online natural language processing of the Slovak language | |
Makwana et al. | Survey: Natural language parsing for Indian languages | |
CN114676699A (en) | Entity emotion analysis method and device, computer equipment and storage medium | |
Tachicart et al. | Towards automatic normalization of the Moroccan dialectal Arabic user generated text | |
US20210149995A1 (en) | System and Method for Negation Aware Sentiment Detection | |
Haq et al. | Correction of whitespace and word segmentation in noisy Pashto text using CRF | |
Ouersighni | Robust rule-based approach in Arabic processing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20220909 |
|
WW01 | Invention patent application withdrawn after publication |