CN104346480A - Information mining method and device - Google Patents
Information mining method and device Download PDFInfo
- Publication number
- CN104346480A CN104346480A CN201410710424.7A CN201410710424A CN104346480A CN 104346480 A CN104346480 A CN 104346480A CN 201410710424 A CN201410710424 A CN 201410710424A CN 104346480 A CN104346480 A CN 104346480A
- Authority
- CN
- China
- Prior art keywords
- description information
- message
- feature description
- message content
- information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 39
- 238000005065 mining Methods 0.000 title claims abstract description 29
- 238000012544 monitoring process Methods 0.000 claims abstract description 24
- 230000007547 defect Effects 0.000 claims description 33
- 238000003058 natural language processing Methods 0.000 claims description 11
- 230000010365 information processing Effects 0.000 claims description 5
- 230000004044 response Effects 0.000 claims description 4
- 230000000875 corresponding effect Effects 0.000 description 51
- 239000000047 product Substances 0.000 description 48
- 230000006872 improvement Effects 0.000 description 8
- 238000012827 research and development Methods 0.000 description 7
- 238000011161 development Methods 0.000 description 4
- 238000012549 training Methods 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000012423 maintenance Methods 0.000 description 2
- 239000013589 supplement Substances 0.000 description 2
- 238000013519 translation Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000002349 favourable effect Effects 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000008707 rearrangement Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000001502 supplementing effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2453—Query optimisation
- G06F16/24534—Query rewriting; Transformation
- G06F16/24537—Query rewriting; Transformation of operators
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2457—Query processing with adaptation to user needs
- G06F16/24575—Query processing with adaptation to user needs using context
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2477—Temporal data queries
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Software Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Mathematical Physics (AREA)
- Fuzzy Systems (AREA)
Abstract
The embodiment of the invention provides an information mining method and device. The method comprises the following steps: monitoring a message issued in an instant communication software application; resolving the monitored message to obtain message content; matching the message content with a keyword in a pre-established feature identification dictionary; when the matching is successful, capturing the message content or the message content and relevant content of the message content as feature description information, and saving the feature description message. The message issued in the instant communication software application has high category definition and high information speciality, so that the feature description message of a specific object can be automatically captured by matching the resolved message content with the keyword in the feature identification dictionary and grasping the successfully-matched message content or grasping the successfully-matched message content and relevant content of the message content. Therefore, the labor cost is reduced, and the speciality and accuracy of the obtained feature description information of the specific object are enhanced and increased.
Description
Technical Field
The embodiment of the invention relates to the technical field of information, in particular to an information mining method and device.
Background
In the prior art, when information related to objects such as products or services, for example, product defect description information helpful for product improvement, is acquired, the information is usually manually captured in forums or web pages in related fields, which is inefficient and not high in accuracy.
Disclosure of Invention
The embodiment of the invention provides an information mining method and device, which are used for automatically capturing characteristic information of a specific object, saving labor cost and improving the accuracy of the captured characteristic information of the specific object.
In a first aspect, an embodiment of the present invention provides an information mining method, including:
monitoring messages issued in the instant messaging software application;
analyzing the monitored message to obtain message content;
matching the message content with keywords in a pre-established feature recognition dictionary;
and when the matching is successful, capturing the message content, or taking the message content and the related content of the message content as feature description information, and storing the feature description information.
In a second aspect, an embodiment of the present invention further provides an information mining apparatus, including:
the message monitoring module is used for monitoring messages issued in the instant messaging software application;
the message analysis module is used for analyzing the monitored message to obtain message content;
the matching module is used for matching the message content with keywords in a pre-established feature recognition dictionary;
and the characteristic description information processing module is used for capturing the message content or taking the message content and the related content of the message content as characteristic description information when the matching is successful, and storing the characteristic description information.
According to the information mining method and device provided by the embodiment of the invention, the message issued in the instant messaging software application is monitored and analyzed, and because the message issued in the instant messaging software application has high category definition and high information specialty, the feature description information of the specific object can be automatically captured by matching the analyzed message content with the keywords in the pre-established feature recognition dictionary and capturing the successfully matched message content or capturing the successfully matched message content and the related content of the message content, so that the labor cost is saved, the specialty and the accuracy of the obtained feature description information of the specific object are improved, and the improvement of the specific object according to the feature description information is facilitated.
Drawings
Fig. 1 is a flowchart of an information mining method according to an embodiment of the present invention;
fig. 2 is a flowchart of an information mining method according to a second embodiment of the present invention;
fig. 3a is a flowchart of an information mining method according to a third embodiment of the present invention;
fig. 3b is a flowchart of another information mining method according to a third embodiment of the present invention;
fig. 3c is a flowchart of another information mining method according to a third embodiment of the present invention;
fig. 4 is a schematic structural diagram of an information mining apparatus according to a fourth embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Example one
Fig. 1 is a flowchart of an information mining method according to an embodiment of the present invention. The method of embodiments of the present invention may be performed by an information mining device configured to be implemented in hardware and/or software, typically in a server capable of providing data mining services.
The method comprises the following steps: operation 110 to operation 140.
110. And monitoring messages issued in the instant messaging software application.
Typically, each enterprise has an instant messaging software application associated with the enterprise product or department within it to facilitate the distribution of messages within the enterprise to the parties responsible for the development of the respective product or for operational maintenance.
For example, the Baidu Hi released by Baidu corporation is an instant messaging software application integrating functions of text messages, voice and video calls, file transmission and the like, and a group corresponding to a product "Baidu map" or a product "Baidu translation" and the like is established in the Baidu Hi, so that a worker in charge of research and development of each product or operation and maintenance in the Baidu corporation can conveniently issue messages.
The message may be issued in various manners, such as in a text form, or in other forms such as voice, video, or picture, which is not limited in this embodiment as long as the instant messaging software application supports the message.
The operation is specifically to monitor the text messages issued in groups related to enterprise products or groups related to enterprise departments in the instant messaging software application.
120. And analyzing the monitored message to obtain the message content.
In this operation, specifically, according to the communication protocol of the instant messaging software application, the monitored message is translated, and the original data corresponding to the monitored message, that is, the readable character string, is correctly restored.
130. And matching the message content with keywords in a pre-established feature recognition dictionary.
The operation specifically comprises the step of determining whether the message content contains the keywords in the feature recognition dictionary or not according to a pre-established feature recognition dictionary by utilizing a keyword matching technology.
It should be noted that the group publishing messages corresponding to each object in the enterprise are different, and the analyzed message contents are different. The groups have the characteristics of high category definition, high information specialization and obvious language characteristics (for example, group members contained in each group are all of a category or a group making the same product, and all have the same or similar professional backgrounds), so that messages issued by different groups can reflect enterprise object information.
The object may be a specific product or a macro object such as enterprise management.
For example, a group corresponding to a "Baidu map" product is a group of Baidu companies responsible for developing or operating and maintaining the "Baidu map", and messages issued by group members in the group include information on advantages and disadvantages of the product or information on subsequent improvements of the product.
For another example, a message issued by a group member in a debugging group corresponding to a product of the "Baidu browser" includes a bug or a suspected problem occurring in the product debugging process.
Therefore, a corresponding feature recognition dictionary can be established for a group corresponding to different objects of an enterprise, so as to obtain feature description information (for example, advantage and disadvantage information of different products or problems existing in enterprise management) corresponding to different objects (for example, different products or enterprise management); for different groups of the same object of the enterprise, corresponding feature recognition dictionaries are preferably established, so that feature description information of different levels related to the same object is obtained.
For example, a research and development related feature recognition dictionary is established for a research and development group in a 'Baidu map' product, and keywords in the dictionary can comprise 'research and development', 'progress', 'trend', 'cost', an 'opponent' and the like; establishing a feature recognition dictionary related to debugging for a debugging group in a Baidu map product, wherein keywords in the dictionary can comprise debugging errors, debugging periods, bugs, defects and the like; a feature recognition dictionary related to the publication is established for the publication population in the 'Baidu map' product, and keywords in the dictionary can comprise 'publication', 'publishing party', 'publishing travel', and 'publishing date', and the like.
140. And when the matching is successful, capturing the message content, or taking the message content and the related content of the message content as feature description information, and storing the feature description information.
In the operation, two implementation modes can be provided, one mode is that when the matching is successful, the message content is captured as the feature description information, and the feature description information is stored; and the other is that when the matching is successful, the message content and the related content of the message content are captured as the feature description information, and the feature description information is stored.
Preferably, the message content and the related content of the message content are captured as feature description information, and the feature description information is stored, and compared with the method of capturing only the message content, the preferred method is favorable for obtaining the complete feature description information of the object.
The grabbing time interval and/or the number of grabbing pieces may be set to grab the relevant content of the successfully matched message content, for example, the grabbing time interval is set to 15s, and the number of grabbing pieces is set to 5.
Further, the related content of the message content may include: a context message of the message content; and/or the supplementary content returned by the user after establishing a session with the user who issues the message content and sending a message content supplementary request to the user.
Example 1
The description will be given by taking an object as a 'Baidu browser' product as an example. The message issued by a certain group of the product contains a lot of comments and question discussions about the product, such as: one designer of the product issues a message in a development group, namely that the login authority is in a problem when logging in a hundred-degree browser, then another designer of the product issues a message in the development group, namely that the message is true, the reason is A, after matching operation, when logging in the hundred-degree browser, the login authority is in a problem, namely that the issued message is successfully matched with a keyword, namely the problem, in the feature recognition dictionary, when capturing the message content, namely that the login authority is in a problem when logging in the hundred-degree browser, the feature description information corresponding to the defect of the product can be obtained, and by capturing the context message of the message content, namely that the message is true, the reason is A, the feature description information corresponding to the generation reason of the defect in the product can be obtained, so that the feature description information of the product is enriched.
It should be noted that, the above description is given by taking the feature description information corresponding to the defect of the captured product and the feature description information corresponding to the cause of the defect as an example, and in addition to capturing the feature description information corresponding to the cause of the defect, other feature description information such as a solution corresponding to the defect may also be captured as complete information of the product defect, and formatted (for example, [ product name, defect content, cause ]) storage may be performed, which is not limited in this embodiment.
Example 2
After a session is established with a user who issues the message content, a heuristic question is taken to send a message content supplement request to the user to request for supplementing a complete description of a product defect, capturing can be performed based on the session (session), that is, a longer capturing time (for example, one minute) is set for the case that the defect description dimension is more (for example, the defect type, the defect generation reason, and the like), and the supplement content returned by the user is captured in the capturing time. If the supplementary description is not yet made within this time, only the basic information is recorded, or failure is returned because the necessary information is not complete.
According to the technical scheme of the embodiment, by monitoring and analyzing the message issued in the instant messaging software application, because the message issued in the instant messaging software application has high category definition and high information specialty, the feature description information of the specific object can be automatically captured by matching the analyzed message content with the keywords in the pre-established feature recognition dictionary and capturing the successfully matched message content or capturing the successfully matched message content and the related content of the message content, so that the labor cost is saved, the specialty and the accuracy of the obtained feature description information of the specific object are improved, and the improvement of the specific object according to the feature description information is facilitated.
In this embodiment, establishing the feature recognition dictionary may specifically include:
receiving keywords in a manually configured feature recognition dictionary; or,
and searching the typical sentences manually recorded in the chat history record of the instant messaging software, and mining keywords expressing corresponding features according to the context co-occurrence relation of the typical sentences and adding the keywords into a feature recognition dictionary.
In other words, each keyword in the feature recognition dictionary may be manually configured, for example, a keyword such as "question", "defect", or "improvement" may be configured in the feature recognition dictionary.
Or manually recording some typical sentences, and taking words which meet certain co-occurrence frequency and express the characteristics in the typical sentences as keywords according to the context co-occurrence relation of the typical sentences in the chat history record, and adding the keywords into a characteristic recognition dictionary; or mining semantic templates that express features.
For example, in the development group of the "Baidu browser" product of Baidu Hi, one person says "search formula is xxx, matching is wrong, and who sees, the other person answers" no mistake, is a question, and has recorded a defect ", if two sentences of" matching is wrong "and" recorded defect "appear in the group message many times, the two paired sentences are considered to have a co-occurrence relationship, which indicates that the matched defect is a defect to be recorded, and based on this, a semantic template" [ arbitrary word ] matching is wrong "expressing the defect can be mined.
The information mining method provided by the embodiment can be applied to various scenes, for example, defect description information of a product which is an object is obtained according to the established feature recognition dictionary corresponding to the product defect; for another example, debugging problem description information of an object as a product is obtained according to the established feature recognition dictionary corresponding to product debugging; for another example, the description information such as the management opinion collection, which is the object of the enterprise management event, is obtained according to the established feature recognition dictionary corresponding to the enterprise management, which is not limited in this embodiment.
Specifically, when used to capture defect description information in which the object is a product, the keywords in the feature recognition dictionary include keywords that reflect a product defect, and the feature description information is information that describes a product defect. The implementation mode provides a full-automatic implementation mode from product mining to defect related content capturing and finally storing to a designated space, and can cover important product groups of all product lines of an enterprise.
Example two
Fig. 2 is a flowchart of an information mining method according to a second embodiment of the present invention. The present embodiment provides a preferred solution before listening to messages published in the instant messaging software application on the basis of the above-described embodiments. The preferred method comprises: operation 210-operation 220.
210. And after the access right of the server corresponding to the instant messaging software application is obtained, establishing connection with the server.
For example, the access right of a server corresponding to the instant messaging software application "hundredth Hi" is acquired, and a connection is established with the server.
220. And sending a joining request of the group account or the individual user account in the instant messaging software application to the server.
For example, a join request for the group account "Baidu browser-research and development group" is sent to a server corresponding to the instant messaging software application "Baidu Hi", thereby enabling newly joined group members to publish messages about the product "Baidu browser" in the group.
For another example, a personal user account adding request is sent to a server corresponding to the instant messaging software application "hundredth Hi", a newly added personal account can chat with other personal accounts already added to the application in the same product to form a published message; the newly joined personal account may apply for joining the group account that has joined the application, so that the newly joined group member issues a message in the group.
According to the technical scheme of the embodiment, before monitoring the message issued in the instant messaging software application, the account added in the instant messaging software application can issue the message in the application by establishing connection with the server corresponding to the instant messaging software application and interacting the account adding request.
It should be noted that, after sending a join request for a group account or a personal user account in the instant messaging software application to the server, monitoring a message issued in the instant messaging software application specifically includes: and after receiving a response message of agreeing to join returned by the server, monitoring messages issued by the users in the joined group or the joined individual users.
EXAMPLE III
Please refer to fig. 3a, which is a flowchart illustrating an information mining method according to a third embodiment of the present invention. On the basis of the foregoing embodiments, the present embodiment provides a preferred scheme after capturing the message content or after the message content and the content related to the message content are taken as feature description information, before storing the feature description information.
The preferred method comprises: operation 310 to operation 360.
310. And monitoring messages issued in the instant messaging software application.
320. And analyzing the monitored message to obtain the message content.
330. And matching the message content with keywords in a pre-established feature recognition dictionary.
340. And when the matching is successful, capturing the message content, or using the message content and the related content of the message content as feature description information.
350. And matching the feature description information with keywords in a pre-established category recognition dictionary, and determining the category corresponding to the feature description information according to the matching result.
As described above, the information mining method provided by the embodiment of the present invention can be applied to various scenarios, and thus a category recognition dictionary including various application requirements can be established according to actual application requirements.
The keywords in the category recognition dictionary may be manually configured. The keywords in the category recognition dictionary may include: the present embodiment does not limit the defects of the Baidu map research and development, the Baidu browser debugging, the Baidu translation research and development improvement, and the like.
360. And associating and storing the determined category and the feature description information.
According to the technical scheme of the embodiment, by monitoring and analyzing the message issued in the instant messaging software application, because the message issued in the instant messaging software application has high category definition and high information specialty, the feature description information of the specific object can be automatically captured by matching the analyzed message content with the keywords in the pre-established feature recognition dictionary and capturing the successfully matched message content or capturing the successfully matched message content and the related content of the message content, so that the labor cost is saved, the specialty and the accuracy of the obtained feature description information of the specific object are improved, and the improvement of the specific object according to the feature description information is facilitated; after the feature description information of the object is captured, the corresponding category of the feature description information is determined, and the determined category and the feature description information are stored in a correlated mode, so that the responsible group corresponding to the category can be bound favorably, and the corresponding responsible group can timely acquire valuable feedback of the object according to the professional feature description information of the specific object.
It should be noted that, the operation 350 is only one of the embodiments of determining the category corresponding to the feature description information, and the determining the category corresponding to the feature description information may also be: the type corresponding to the feature description information is determined by a Natural Language Processing (NLP) model (operation 351 shown in fig. 3 b).
The category corresponding to the feature description information can be determined by specifically adopting a semantic similarity algorithm model and/or a click similarity algorithm model.
The semantic similarity utilizes a supervision method training model of natural language processing cloud background training to analyze the similarity of two sections of texts. The larger the value the more similar. The networking of semantic similarity provides a function of calculating similarity. For example, when inputting "notebook computer", the semantic similarity of "notebook" is 2.08478.
The click similarity can be used under the condition that the semantic similarity cannot reach a threshold (such as 1.8), the click similarity (such as titles in a search formula and a search result) of two sections of texts is analyzed, a cosine similarity value is calculated by using a trained embedding vector, the value range [ -1,1] is obtained, and the click similarity is stronger when the value is larger. For example, the click similarity of the input "Baidu your good" and "Zhouhong \3105454hello" is-0.121407, and the click similarity of the input "Baidu your good" and "Li-macroniu-good" is 0.218664; the latter click similarity is higher than the former.
In actual use, the semantic similarity judgment is preferentially carried out on the feature description information and a plurality of preset categories respectively, the category with the highest semantic similarity is returned, if the semantic similarity between the feature description information and the preset categories does not reach the threshold, the click similarity judgment is continuously carried out on the feature description information and the preset categories, if the click similarity reaches the threshold, the corresponding category is returned, and if the click similarity does not reach the threshold, the default category (such as others) is returned. The threshold is continuously fitted to the historical data to maintain a higher accuracy.
It should be further noted that, determining the category corresponding to the feature description information may also be: determining the category corresponding to the feature description information by using a probability model trained in advance according to the feature description text labeled with the category information, wherein the input of the probability model is the feature description text, and the output is a probability value belonging to a set category (operation 352 shown in fig. 3 c). Specifically, a probability model is trained in advance according to a feature description text with labeled category information, the feature description information is input into the probability model, a category a corresponding to the feature description information output by the probability model and a probability value corresponding to the category a are obtained, and if the probability value meets a certain threshold value, the category corresponding to the feature description information is determined to be the category a. For example, a probability model of P (type | feature description information) can be trained by manually classifying labels and corresponding description texts in the chat records, and a training method can be flexibly selected according to the characteristics of the service field of the system, typically a naive bayes method. In application, if the probability that a user question description belongs to a certain question category satisfies a certain threshold, it can be considered to belong to that category.
On the basis of this embodiment, after determining the category corresponding to the feature description information, the following operations may be further included:
determining the information of a receiver of the feature description information according to the category;
and sending the feature description information to the receiver according to the information of the receiver.
The information of the receiver can be an address of a set website, a short message number of a set receiving user, a mailbox address or an instant messaging software account of the set receiving user.
The embodiment provides an implementation mode for enabling the receiver to acquire the feature description information of the object after capturing the feature description information of the object and determining the category corresponding to the feature description information, and enables the receiver to serve as a responsible group corresponding to the category and interact with the professional feature description information of the object with the responsible group, so that the corresponding responsible group can acquire valuable feedback of the object in time according to the professional feature description information of the specific object.
Example four
Fig. 4 is a schematic structural diagram of an information mining apparatus according to a fourth embodiment of the present invention. The device includes: a message listening module 410, a message parsing module 420, a matching module 430 and a profile processing module 440.
The message monitoring module 410 is configured to monitor a message issued in an instant messaging software application; the message parsing module 420 is configured to parse the monitored message to obtain a message content; the matching module 430 is configured to match the message content with a keyword in a pre-established feature recognition dictionary; the feature description information processing module 440 is configured to capture the message content or the message content and the related content of the message content as feature description information when matching is successful, and store the feature description information.
According to the technical scheme of the embodiment, by monitoring and analyzing the message issued in the instant messaging software application, because the message issued in the instant messaging software application has high category definition and high information specialty, the feature description information of the specific object can be automatically captured by matching the analyzed message content with the keywords in the pre-established feature recognition dictionary and capturing the successfully matched message content or capturing the successfully matched message content and the related content of the message content, so that the labor cost is saved, the specialty and the accuracy of the obtained feature description information of the specific object are improved, and the improvement of the specific object according to the feature description information is facilitated.
In the foregoing aspect, the apparatus may further include: the device comprises a connection establishing module and a request sending module.
The connection establishing module is used for establishing connection with a server after acquiring the access authority of the server corresponding to the instant messaging software application before monitoring messages issued in the instant messaging software application; the request sending module is used for sending a joining request for a group account or a personal user account in the instant messaging software application to the server; the message monitoring module 410 is specifically configured to: and after receiving a response message of agreeing to join returned by the server, monitoring messages issued by the users in the joined group or the joined individual users.
In the above solution, the apparatus may further include a feature recognition dictionary establishing module, configured to receive a keyword in a manually configured feature recognition dictionary; or,
the system is used for searching the typical sentences which are manually included in the chat history of the instant messaging software, and according to the context co-occurrence relation of the typical sentences, the keywords which express the corresponding features are mined and added into the feature recognition dictionary.
In the foregoing aspect, the apparatus may further include: a first category determination module, or a second category determination module, or a third category determination module.
The first category determining module is used for matching the feature description information with a keyword in a pre-established category recognition dictionary after capturing the message content or after the message content and the related content of the message content are used as feature description information and before storing the feature description information, and determining a category corresponding to the feature description information according to a matching result; the second category determining module is used for determining a category corresponding to the feature description information through a Natural Language Processing (NLP) model after the message content is captured or the message content and the related content of the message content are used as the feature description information and before the feature description information is stored; the third category determining module is used for determining a category corresponding to the feature description information by adopting a probability model trained in advance according to a feature description text labeled with the category information after capturing the message content or after capturing the message content and the related content of the message content as the feature description information and before storing the feature description information; the feature description information processing module 440 is specifically configured to: and associating and storing the determined category and the feature description information.
Wherein the second category determining module is specifically configured to: and determining the category corresponding to the feature description information by adopting a semantic similarity algorithm model and/or a click similarity algorithm model.
Further, the apparatus may further include: a receiver information determining module and a characteristic description information sending module.
The receiver information determining module is used for determining the information of the receiver of the feature description information according to the category after determining the category corresponding to the feature description information; the characteristic description information sending module is used for sending the characteristic description information to the receiver according to the information of the receiver.
The information of the receiver can be an address of a set website, a short message number of a set receiving user, a mailbox address or an instant messaging software account of the set receiving user.
The related content of the message content may include: a context message of the message content; and/or the supplementary content returned by the user after establishing a session with the user who issues the message content and sending a message content supplementary request to the user.
In the above solution, the keywords in the feature recognition dictionary may include keywords reflecting product defects, and accordingly, the feature description information may be information describing product defects.
The information mining device provided by the embodiment of the invention can execute the information mining method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.
Claims (18)
1. An information mining method, comprising:
monitoring messages issued in the instant messaging software application;
analyzing the monitored message to obtain message content;
matching the message content with keywords in a pre-established feature recognition dictionary;
and when the matching is successful, capturing the message content, or taking the message content and the related content of the message content as feature description information, and storing the feature description information.
2. The method of claim 1, prior to listening for messages published in an instant messaging software application, further comprising:
after the access authority of a server corresponding to the instant messaging software application is obtained, connection is established with the server;
sending a joining request for a group account or a personal user account in the instant messaging software application to the server;
the monitoring of the message issued in the instant messaging software application specifically includes:
and after receiving a response message of agreeing to join returned by the server, monitoring messages issued by the users in the joined group or the joined individual users.
3. The method of claim 1, wherein creating the feature recognition dictionary specifically comprises:
receiving keywords in a manually configured feature recognition dictionary; or,
and searching the typical sentences manually recorded in the chat history record of the instant messaging software, and mining keywords expressing corresponding features according to the context co-occurrence relation of the typical sentences and adding the keywords into a feature recognition dictionary.
4. The method of claim 1, wherein after capturing the message content or the message content and the content related to the message content as the feature description information and before storing the feature description information, further comprising:
matching the feature description information with keywords in a pre-established category recognition dictionary, and determining a category corresponding to the feature description information according to a matching result; or, determining the category corresponding to the feature description information through a Natural Language Processing (NLP) model; or, determining the category corresponding to the feature description information by adopting a probability model trained in advance according to the feature description text with labeled category information;
storing the feature description information includes: and associating and storing the determined category and the feature description information.
5. The method according to claim 4, wherein determining the category corresponding to the feature description information through a natural language processing NLP model specifically includes:
and determining the category corresponding to the feature description information by adopting a semantic similarity algorithm model and/or a click similarity algorithm model.
6. The method of claim 4, wherein after determining the category to which the feature description information corresponds, further comprising:
determining the information of a receiver of the feature description information according to the category;
and sending the feature description information to the receiver according to the information of the receiver.
7. The method of claim 6, wherein the information of the receiving party is an address of a set website, a short message number of a set receiving user, a mailbox address or an instant messaging software account of the set receiving user.
8. The method of claim 1, wherein the content related to the message content comprises: a context message of the message content; and/or the supplementary content returned by the user after establishing a session with the user who issues the message content and sending a message content supplementary request to the user.
9. The method according to any one of claims 1 to 8, wherein the keywords in the feature recognition dictionary include keywords reflecting product defects, and the feature description information is information describing product defects.
10. An information mining apparatus, comprising:
the message monitoring module is used for monitoring messages issued in the instant messaging software application;
the message analysis module is used for analyzing the monitored message to obtain message content;
the matching module is used for matching the message content with keywords in a pre-established feature recognition dictionary;
and the characteristic description information processing module is used for capturing the message content or taking the message content and the related content of the message content as characteristic description information when the matching is successful, and storing the characteristic description information.
11. The apparatus of claim 10, wherein the apparatus further comprises:
the connection establishing module is used for establishing connection with a server after acquiring the access authority of the server corresponding to the instant messaging software application before monitoring the message issued in the instant messaging software application;
a request sending module, configured to send a join request for a group account or a personal user account in the instant messaging software application to the server;
the message monitoring module is specifically configured to: and after receiving a response message of agreeing to join returned by the server, monitoring messages issued by the users in the joined group or the joined individual users.
12. The apparatus of claim 10, wherein the apparatus further comprises a feature recognition dictionary creation module to receive keywords in a manually configured feature recognition dictionary; or,
the system is used for searching the typical sentences which are manually included in the chat history of the instant messaging software, and according to the context co-occurrence relation of the typical sentences, the keywords which express the corresponding features are mined and added into the feature recognition dictionary.
13. The apparatus of claim 10, wherein the apparatus further comprises:
the first class determination module is used for matching the feature description information with a keyword in a pre-established class recognition dictionary after capturing the message content or after the message content and the related content of the message content are used as feature description information and before storing the feature description information, and determining the class corresponding to the feature description information according to a matching result; or
A second category determining module, configured to determine, after capturing the message content, or after the message content and the content related to the message content are used as feature description information and before storing the feature description information, a category corresponding to the feature description information through a natural language processing NLP model; or
A third category determining module, configured to determine, after capturing the message content, or after the message content and the content related to the message content are used as feature description information and before storing the feature description information, a category corresponding to the feature description information by using a probability model trained in advance according to a feature description text labeled with category information;
the feature description information processing module is specifically configured to: and associating and storing the determined category and the feature description information.
14. The apparatus of claim 13, wherein the second category determination module is specifically configured to: and determining the category corresponding to the feature description information by adopting a semantic similarity algorithm model and/or a click similarity algorithm model.
15. The apparatus of claim 13, wherein the apparatus further comprises:
the receiver information determining module is used for determining the information of the receiver of the feature description information according to the category after determining the category corresponding to the feature description information;
and the characteristic description information sending module is used for sending the characteristic description information to the receiver according to the information of the receiver.
16. The apparatus of claim 15, wherein the information of the receiving party is an address of a set website, a short message number of a set receiving user, a mailbox address or an instant messaging software account of the set receiving user.
17. The apparatus of claim 10, wherein the content related to the message content comprises: a context message of the message content; and/or the supplementary content returned by the user after establishing a session with the user who issues the message content and sending a message content supplementary request to the user.
18. The apparatus according to any one of claims 10 to 17, wherein the keywords in the feature recognition dictionary include keywords reflecting product defects, and the feature description information is information describing product defects.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410710424.7A CN104346480B (en) | 2014-11-27 | 2014-11-27 | information mining method and device |
PCT/CN2015/086095 WO2016082575A1 (en) | 2014-11-27 | 2015-08-05 | Information mining method and apparatus, and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410710424.7A CN104346480B (en) | 2014-11-27 | 2014-11-27 | information mining method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104346480A true CN104346480A (en) | 2015-02-11 |
CN104346480B CN104346480B (en) | 2018-06-26 |
Family
ID=52502071
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410710424.7A Active CN104346480B (en) | 2014-11-27 | 2014-11-27 | information mining method and device |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN104346480B (en) |
WO (1) | WO2016082575A1 (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105282012A (en) * | 2015-10-23 | 2016-01-27 | 广东小天才科技有限公司 | Method and system for strengthening information reminding during group chat |
WO2016082575A1 (en) * | 2014-11-27 | 2016-06-02 | 百度在线网络技术(北京)有限公司 | Information mining method and apparatus, and storage medium |
CN106649404A (en) * | 2015-11-04 | 2017-05-10 | 陈包容 | Session scene database creation method and apparatus |
CN107491493A (en) * | 2017-07-22 | 2017-12-19 | 长沙兔子代跑网络科技有限公司 | A kind of intelligence obtains the method and device for running chat record in generation |
CN107526779A (en) * | 2017-07-22 | 2017-12-29 | 长沙兔子代跑网络科技有限公司 | A kind of method and device for excavating generation race client |
CN108345582A (en) * | 2017-01-23 | 2018-07-31 | 腾讯科技(深圳)有限公司 | A kind of method and device that identification social group is done business |
CN109063029A (en) * | 2018-07-10 | 2018-12-21 | 苏奇 | A kind of information filing management method based on instant communication software |
CN109582719A (en) * | 2018-10-19 | 2019-04-05 | 国电南瑞科技股份有限公司 | A kind of method and system of intelligent substation SCD file AutoLink virtual terminator |
CN113765767A (en) * | 2020-06-02 | 2021-12-07 | 上海回声网络科技有限公司 | Enterprise WeChat supervision method and system |
CN118690265A (en) * | 2024-08-28 | 2024-09-24 | 浙江吉利控股集团有限公司 | Accident vehicle message management method and device, electronic equipment and medium |
CN118690265B (en) * | 2024-08-28 | 2024-11-15 | 浙江吉利控股集团有限公司 | Accident vehicle message management method and device, electronic equipment and medium |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11587095B2 (en) * | 2019-10-15 | 2023-02-21 | Microsoft Technology Licensing, Llc | Semantic sweeping of metadata enriched service data |
CN113051476B (en) * | 2021-03-25 | 2023-06-13 | 北京百度网讯科技有限公司 | Method and device for sending message |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020133477A1 (en) * | 2001-03-05 | 2002-09-19 | Glenn Abel | Method for profile-based notice and broadcast of multimedia content |
CN101166160A (en) * | 2006-10-20 | 2008-04-23 | 阿里巴巴公司 | A method and system for filtering instant communication rubbish information |
CN102323933A (en) * | 2011-08-31 | 2012-01-18 | 张潇 | Information embedding and interaction system facing real-time communication and method |
CN102419778A (en) * | 2012-01-09 | 2012-04-18 | 中国科学院软件研究所 | Information searching method for mining and clustering sub-topics of query sentences |
CN102970210A (en) * | 2012-11-02 | 2013-03-13 | 北京百度网讯科技有限公司 | Method and device for reminding group messages in instant chat tool |
CN103577416A (en) * | 2012-07-20 | 2014-02-12 | 阿里巴巴集团控股有限公司 | Query expansion method and system |
CN103605690A (en) * | 2013-11-04 | 2014-02-26 | 北京奇虎科技有限公司 | Device and method for recognizing advertising messages in instant messaging |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1987852A (en) * | 2005-12-21 | 2007-06-27 | 腾讯科技(深圳)有限公司 | Method and device for determining communication object attribute according to news content |
CN104346480B (en) * | 2014-11-27 | 2018-06-26 | 百度在线网络技术(北京)有限公司 | information mining method and device |
-
2014
- 2014-11-27 CN CN201410710424.7A patent/CN104346480B/en active Active
-
2015
- 2015-08-05 WO PCT/CN2015/086095 patent/WO2016082575A1/en active Application Filing
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020133477A1 (en) * | 2001-03-05 | 2002-09-19 | Glenn Abel | Method for profile-based notice and broadcast of multimedia content |
CN101166160A (en) * | 2006-10-20 | 2008-04-23 | 阿里巴巴公司 | A method and system for filtering instant communication rubbish information |
CN102323933A (en) * | 2011-08-31 | 2012-01-18 | 张潇 | Information embedding and interaction system facing real-time communication and method |
CN102419778A (en) * | 2012-01-09 | 2012-04-18 | 中国科学院软件研究所 | Information searching method for mining and clustering sub-topics of query sentences |
CN103577416A (en) * | 2012-07-20 | 2014-02-12 | 阿里巴巴集团控股有限公司 | Query expansion method and system |
CN102970210A (en) * | 2012-11-02 | 2013-03-13 | 北京百度网讯科技有限公司 | Method and device for reminding group messages in instant chat tool |
CN103605690A (en) * | 2013-11-04 | 2014-02-26 | 北京奇虎科技有限公司 | Device and method for recognizing advertising messages in instant messaging |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2016082575A1 (en) * | 2014-11-27 | 2016-06-02 | 百度在线网络技术(北京)有限公司 | Information mining method and apparatus, and storage medium |
CN105282012A (en) * | 2015-10-23 | 2016-01-27 | 广东小天才科技有限公司 | Method and system for strengthening information reminding during group chat |
CN106649404B (en) * | 2015-11-04 | 2019-12-27 | 陈包容 | Method and device for creating session scene database |
CN106649404A (en) * | 2015-11-04 | 2017-05-10 | 陈包容 | Session scene database creation method and apparatus |
CN108345582A (en) * | 2017-01-23 | 2018-07-31 | 腾讯科技(深圳)有限公司 | A kind of method and device that identification social group is done business |
CN108345582B (en) * | 2017-01-23 | 2021-08-24 | 腾讯科技(深圳)有限公司 | Method and device for identifying social group engaged business |
CN107491493A (en) * | 2017-07-22 | 2017-12-19 | 长沙兔子代跑网络科技有限公司 | A kind of intelligence obtains the method and device for running chat record in generation |
CN107526779A (en) * | 2017-07-22 | 2017-12-29 | 长沙兔子代跑网络科技有限公司 | A kind of method and device for excavating generation race client |
CN109063029A (en) * | 2018-07-10 | 2018-12-21 | 苏奇 | A kind of information filing management method based on instant communication software |
CN109582719B (en) * | 2018-10-19 | 2021-08-24 | 国电南瑞科技股份有限公司 | Method and system for automatically linking SCD file of intelligent substation to virtual terminal |
CN109582719A (en) * | 2018-10-19 | 2019-04-05 | 国电南瑞科技股份有限公司 | A kind of method and system of intelligent substation SCD file AutoLink virtual terminator |
CN113765767A (en) * | 2020-06-02 | 2021-12-07 | 上海回声网络科技有限公司 | Enterprise WeChat supervision method and system |
CN118690265A (en) * | 2024-08-28 | 2024-09-24 | 浙江吉利控股集团有限公司 | Accident vehicle message management method and device, electronic equipment and medium |
CN118690265B (en) * | 2024-08-28 | 2024-11-15 | 浙江吉利控股集团有限公司 | Accident vehicle message management method and device, electronic equipment and medium |
Also Published As
Publication number | Publication date |
---|---|
WO2016082575A1 (en) | 2016-06-02 |
CN104346480B (en) | 2018-06-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104346480B (en) | information mining method and device | |
US8560567B2 (en) | Automatic question and answer detection | |
CN110197672B (en) | Voice call quality detection method, server and storage medium | |
US10073834B2 (en) | Systems and methods for language feature generation over multi-layered word representation | |
US10574608B2 (en) | Technology for multi-recipient electronic message modification based on recipient subset | |
US9626622B2 (en) | Training a question/answer system using answer keys based on forum content | |
US20100223335A1 (en) | Dynamically Managing Online Communication Groups | |
JP6998680B2 (en) | Interactive business support system and interactive business support program | |
CN109429522A (en) | Voice interactive method, apparatus and system | |
CN113094512B (en) | Fault analysis system and method in industrial production and manufacturing | |
CN115099239B (en) | Resource identification method, device, equipment and storage medium | |
US11423219B2 (en) | Generation and population of new application document utilizing historical application documents | |
US9916307B1 (en) | Dynamic translation of idioms | |
US20190197103A1 (en) | Asynchronous speech act detection in text-based messages | |
US11533279B2 (en) | Method for electronic messaging using image based noisy content | |
CN117951318A (en) | Multimedia information processing method, system and electronic equipment | |
CN113111658B (en) | Method, device, equipment and storage medium for checking information | |
CN116701604A (en) | Question and answer corpus construction method and device, question and answer method, equipment and medium | |
CN114528851B (en) | Reply sentence determination method, reply sentence determination device, electronic equipment and storage medium | |
CN112131378B (en) | Method and device for identifying civil problem category and electronic equipment | |
CN109739970B (en) | Information processing method and device and electronic equipment | |
CN114003737A (en) | Double-record examination assisting method, device, equipment and medium based on artificial intelligence | |
CN112287104A (en) | Natural language processing method and device | |
US20240086639A1 (en) | Automatically locating responses to previously asked questions in a live chat transcript using artificial intelligence (ai) | |
CN118071310B (en) | Business processing method and system based on flow engine |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |