WO2016082575A1 - Information mining method and apparatus, and storage medium - Google Patents
Information mining method and apparatus, and storage medium Download PDFInfo
- Publication number
- WO2016082575A1 WO2016082575A1 PCT/CN2015/086095 CN2015086095W WO2016082575A1 WO 2016082575 A1 WO2016082575 A1 WO 2016082575A1 CN 2015086095 W CN2015086095 W CN 2015086095W WO 2016082575 A1 WO2016082575 A1 WO 2016082575A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- message
- content
- description information
- feature description
- information
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2453—Query optimisation
- G06F16/24534—Query rewriting; Transformation
- G06F16/24537—Query rewriting; Transformation of operators
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2457—Query processing with adaptation to user needs
- G06F16/24575—Query processing with adaptation to user needs using context
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2477—Temporal data queries
Definitions
- the embodiments of the present invention relate to the field of information technologies, and in particular, to an information mining method, apparatus, and storage medium.
- the embodiment of the invention provides an information mining method, device and storage medium, so as to automatically capture feature information of a specific object, save labor cost, and improve the accuracy of the captured feature information of the specific object.
- an embodiment of the present invention provides an information mining method, including:
- the matching is successful, the content of the message that matches the success is captured, or the content of the message that matches the success and the related content of the message content that is successfully matched are captured as the feature description information, and the feature description information is saved.
- an embodiment of the present invention further provides an information mining apparatus, including:
- a message parsing module configured to parse the intercepted message to obtain a message content
- a matching module configured to match the content of the message with a keyword in a pre-established feature recognition dictionary
- a feature description information processing module configured to: when the matching is successful, capture the content of the message that matches the success, or capture the content of the message that matches the success and the related content of the message content that is successfully matched as the feature description information, and the feature is Describe the information for saving.
- an embodiment of the present invention further provides a non-volatile computer storage medium, where the computer storage medium stores one or more modules, when the one or more modules are used by a device that performs an information mining method.
- the device When executed, the device is caused to perform the following operations:
- the matching is successful, the content of the message that matches the success is captured, or the content of the message that matches the success and the related content of the message content that is successfully matched are captured as the feature description information, and the feature description information is saved.
- the information mining method, device and storage medium provided by the embodiments of the present invention monitor and parse messages published in the instant messaging software application, because the message published in the instant messaging software application not only has high category definition, but also has high professionalism, so Matching the parsed message content with the keywords in the pre-established feature recognition dictionary, and capturing the successfully matched message content, or capturing the successfully matched message content and the related content of the message content, can automatically capture the specific object
- the feature description information saves the labor cost, and improves the professionalism and accuracy of the obtained feature description information of the specific object, and is beneficial to improve the specific object according to the feature description information.
- FIG. 1 is a flowchart of an information mining method according to Embodiment 1 of the present invention.
- FIG. 2 is a flowchart of an information mining method according to Embodiment 2 of the present invention.
- FIG. 3a is a flowchart of an information mining method according to Embodiment 3 of the present invention.
- FIG. 3b is a flowchart of another information mining method according to Embodiment 3 of the present invention.
- 3c is a flowchart of still another information mining method according to Embodiment 3 of the present invention.
- FIG. 4 is a schematic structural diagram of an information mining apparatus according to Embodiment 4 of the present invention.
- FIG. 5 is a schematic structural diagram of hardware of an apparatus for performing an information mining method according to Embodiment 6 of the present invention.
- FIG. 1 is a flowchart of an information mining method according to Embodiment 1 of the present invention.
- the method of an embodiment of the present invention may be performed by an information mining device configured to be implemented in hardware and/or software, which is typically configured in a server capable of providing data mining services.
- the method includes operations 110 through 140.
- each enterprise has an instant messaging software application related to the enterprise product or department, so as to facilitate the release of messages in the enterprise responsible for each product development group or responsible for operation and maintenance.
- Baidu Hi launched by Baidu is a set of text messages, voice and video calls and file transfers.
- the operation may specifically be to listen to a text message published in a group related to the enterprise product or a group related to the enterprise department in the instant messaging software application. If the message is published in other forms, it is necessary to identify the content through a related identification technology (for example, voice recognition technology or picture recognition technology) to obtain a corresponding text message.
- a related identification technology for example, voice recognition technology or picture recognition technology
- the intercepted message is translated, and the original data corresponding to the monitored message is correctly restored, that is, the readable readable string is restored.
- the operation specifically uses a keyword matching technology to determine whether the keyword in the feature recognition dictionary is included in the message content according to a pre-established feature recognition dictionary.
- the group publishing messages corresponding to each object in the enterprise are different, and the content of the parsed message is different.
- the group has the characteristics of high definition, high information professionality and obvious language features (for example, each group contains group members that are a category or a group of people who have the same product or similar professional background) Therefore, messages published by different groups can reflect enterprise object information.
- the group corresponding to the "Baidu Map” product is a group that Baidu Company is responsible for the development or operation maintenance of "Baidu Map".
- the messages published by the members of the group usually contain the advantages and disadvantages of the product, or the follow-up of the product. Improve the information.
- the message posted by the group members in the debugging group corresponding to the "Baidu Browser" product usually contains bugs or suspected problems during the debugging process of the product.
- a corresponding feature recognition dictionary can be established for a group corresponding to different objects of the enterprise, thereby Obtain feature description information corresponding to different objects (such as different products, or enterprise management) (such as the advantages and disadvantages of different products, or problems in enterprise management); for different groups of the same object of the enterprise, it is preferred to establish corresponding features.
- the dictionary is identified to obtain different levels of feature description information related to the same object.
- the R&D group in the "Baidu Map” product establishes a feature recognition dictionary related to R&D, and the keywords in the dictionary may include “R&D”, “Progress”, “Trend”, “Cost”, and “Entity”;
- a feature recognition dictionary related to debugging is established for the debugging group in the "Baidu Map” product, and the keywords in the dictionary may include “debug error”, “debug cycle”, “bug”, “vulnerability” and “defect”, etc.;
- the feature recognition dictionary related to the release group in the "Baidu Map” product is established, and the keywords in the dictionary may include “release", “release conference", “release schedule”, and "release date”.
- the matching is successful, the content of the message that matches the success is captured, or the content of the message that matches the success and the related content of the message content that is successfully matched are captured as the feature description information, and the feature description information is saved.
- this operation there may be two implementation manners, one is to capture the message content of the matching success as the feature description information and save the feature description information when the matching is successful; the other is that when the matching is successful, The content of the matching successful message and the related content of the successfully matched message content are captured as feature description information, and the feature description information is saved.
- the content of the message matching the success and the related content of the message content that is successfully matched are captured as the feature description information, and the feature description information is saved, compared to only capturing the message content that is successfully matched.
- the preferred manner facilitates obtaining complete characterization information for the object.
- the related content of the successfully matched message content may include: the context message matching the successful message content; and/or establishing a session with the user who issues the message content and sending the message content to the user The supplementary content returned by the user after the request is supplemented.
- feature description information corresponding to the defect corresponding to the defect of the product and the feature description information corresponding to the cause of the defect are described as an example, in addition to the feature description information corresponding to the cause of the defect, Other feature description information such as the solution corresponding to the defect may be captured as the complete information of the product defect, and formatted (for example, [product name, defect content, cause of occurrence]) is stored, which is not limited in this embodiment.
- the session may be crawled. That is, for a case where the defect description dimension is large (for example, the defect type, the cause of the defect, etc.), a long crawl time (for example, one minute) is set, and during this time, the supplementary content returned by the user is captured. If there is still no supplementary description within this time, only the basic information is recorded, or the failure is returned because the necessary information is incomplete.
- the defect description dimension for example, the defect type, the cause of the defect, etc.
- the technical solution of the embodiment by monitoring and parsing the message published in the instant messaging software application, because the message published in the instant messaging software application is not only high in class definition, but also high in information professionality, so the content of the message to be parsed is
- the keywords in the established feature recognition dictionary are matched, and the content of the message matching the success is captured, or the content of the message matching the success and the related content of the message content are captured, and the feature description information of the specific object can be automatically captured, thereby saving manpower. Cost, and improve the professionalism and accuracy of the obtained feature description information of a specific object, and facilitate improvement of a specific object according to the feature description information.
- the establishing the feature recognition dictionary may specifically include:
- a typical sentence of the manual inclusion is searched in the chat history of the instant messaging software, and keywords corresponding to the corresponding features are mined and added in the feature recognition dictionary according to the context co-occurrence relationship of the typical sentence.
- each keyword in the feature recognition dictionary can be manually configured, for example, keywords such as "problem”, “defect” or “improvement” are configured in the feature recognition dictionary.
- the information mining method provided in this embodiment can be applied to various scenarios.
- the object is obtained as defect description information of the product; for example, according to the established product debugging corresponding feature recognition.
- the dictionary obtains the description information of the debugging problem of the product.
- the description information such as the management opinion collection of the enterprise management event is obtained, which is not limited in this embodiment.
- the defect description information for capturing an object is a product
- the keyword in the feature recognition dictionary includes a keyword reflecting a defect of the product
- the feature description information is information describing a defect of the product.
- FIG. 2 is a flowchart of an information mining method according to Embodiment 2 of the present invention.
- This embodiment provides a preferred solution before listening to messages published in an instant messaging software application on the basis of the above embodiments.
- the preferred method includes operations 210 through 220.
- the access right of the server corresponding to the instant messaging software application "Baidu Hi” is obtained, and a connection is established with the server.
- the server corresponding to the instant messaging software application “Baidu Hi” sends a join request of the group account “Baidu Browser-R&D Group”, so that the newly joined group members can publish the product “Baidu Browser” in the group. "Related news.
- the server corresponding to the instant messaging software application “Baidu Hi” sends a personal user account joining request, and the newly added personal account can chat with the same personal account that has joined the application to form a published message; newly added A personal account can apply to join a group account that has joined the application, so that the newly joined group member posts a message in the group.
- the message posted in the instant messaging software application may be monitored, which may include: receiving the After the response message returned by the server agrees to join, the user in the joined group or the message posted by the joined individual user is monitored.
- FIG. 3a is a flowchart of an information mining method according to Embodiment 3 of the present invention.
- the present embodiment provides a message content that captures a successful match, or captures the content of the message that is successfully matched and the related content of the message content that is successfully matched as the feature description information.
- the preferred method includes operations 310 through 360.
- the matching is successful, the content of the message that matches the success is captured, or the content of the message that matches the success and the related content of the message content that is successfully matched are captured as the feature description information.
- the information mining method provided by the embodiment of the present invention can be applied to various scenarios, and therefore, a category recognition dictionary including a plurality of application requirements can be established according to actual application requirements.
- Keywords in the category recognition dictionary can be manually configured.
- the keyword in the category recognition dictionary may include: Baidu map development defect, Baidu browser debugging defect, and Baidu translation development improvement, etc., which is not limited in this embodiment.
- the technical solution of the embodiment by monitoring and parsing the message published in the instant messaging software application, because the message published in the instant messaging software application is not only high in class definition, but also high in information professionality, so the content of the message to be parsed is
- the keywords in the established feature recognition dictionary are matched, and the content of the message matching the success is captured, or the content of the message matching the success and the related content of the message content are captured, and the feature description information of the specific object can be automatically captured, thereby saving manpower.
- determining the category corresponding to the feature description information may also be: determining the feature by a Natural Language Processing (NLP) model.
- NLP Natural Language Processing
- the category corresponding to the description information (operation 351 shown in Figure 3b).
- the semantic similarity algorithm model and/or the click similarity algorithm model may be used to determine the category corresponding to the feature description information.
- the semantic similarity uses the supervised method training model of natural language processing cloud background training to analyze the similarity of two pieces of text. The larger the value, the more similar. Networking of semantic similarities provides the ability to compute similarities. For example, input “laptop”, “notebook” semantic similarity is 2.08478.
- the click similarity can be used when the semantic similarity cannot reach the threshold (such as 1.8), analyze the click similarity of the two texts (such as the title in the search and search results), and calculate the cosine similarity using the trained embedding vector.
- Degree value the range of values [-1, 1], the greater the value, the stronger the click similarity.
- the click similarity between "Baidu Hello” and “Zhou Hongyi Hello” is -0.121407.
- the click similarity between "Baidu Hello” and “Li Yanhong Hello” is 0.218664; the latter clicks the similarity ratio The former is high.
- the feature description information is preferentially compared with the preset plurality of categories, and the semantic similarity reaches the threshold and the highest category is returned. If the semantic similarity between the feature description information and the preset category does not reach the threshold, then The feature description information is further determined by clicking similarity with the preset category, and if the click similarity reaches the threshold, the corresponding category is returned, and if the click similarity does not reach the threshold, the default category (eg, other) is returned. Thresholds are constantly fitted to historical data to maintain a higher degree of accuracy.
- determining the category corresponding to the feature description information may further be: determining, by using a probability model trained according to the feature description text of the labeled category information in advance, a category corresponding to the feature description information, where the input of the probability model is The characterization text is output as a probability value belonging to the set category (operation 352 as shown in Figure 3c).
- the probability model is pre-trained according to the feature description text of the labeled category information, and the feature description information is input into the probability model, and the category A corresponding to the feature description information output by the probability model and the corresponding category A are obtained.
- the probability value if the probability value satisfies a certain threshold, determines that the category corresponding to the feature description information is category A.
- a probabilistic model of P (type
- the probability that the user problem description belongs to a certain problem classification meets a certain threshold, it can be considered as belonging to the classification.
- the information of the receiving party may be an address of setting a website, setting a short message number of the receiving user, an email address, or setting an instant messaging software account of the receiving user.
- the receiver is notified of the feature description information of the object, and the receiver is regarded as the responsible group corresponding to the category. And the professional feature description information of the object interacting with the responsible group, so that the corresponding responsible group can timely obtain the valuable feedback of the object according to the professional feature description information of the specific object.
- FIG. 4 is a schematic structural diagram of an information mining apparatus according to Embodiment 4 of the present invention.
- the device includes a message monitoring module 410, a message parsing module 420, a matching module 430, and a feature description information processing module 440.
- the message monitoring module 410 is configured to listen to the message published in the instant messaging software application; the message parsing module 420 is configured to parse the monitored message to obtain the message content; the matching module 430 is configured to use the message content and the pre-established The keyword in the feature recognition dictionary is matched; the feature description information processing module 440 is configured to capture the message content that is successfully matched when the matching is successful, or capture the content of the message that matches the success and the related content of the message content that is successfully matched.
- the feature description information is saved as the feature description information.
- the technical solution of the embodiment by monitoring and parsing the message published in the instant messaging software application, because the message published in the instant messaging software application is not only high in class definition, but also high in information professionality, so the content of the message to be parsed is
- the keywords in the established feature recognition dictionary are matched, and the content of the message matching the success is captured, or the content of the message matching the success and the related content of the message content are captured, and the feature description information of the specific object can be automatically captured, thereby saving manpower. Cost, and improve the professionalism and accuracy of the obtained feature description information of a specific object, and facilitate improvement of a specific object according to the feature description information.
- the apparatus may further include: a connection establishment module and a request transmission module.
- the connection establishing module is configured to establish a connection with the server after obtaining the access right of the server corresponding to the instant messaging software application before listening to the message published in the instant messaging software application; and requesting the sending module to use the
- the server sends a request to join the group account or the personal user account in the instant messaging software application;
- the message monitoring module 410 is specifically configured to: after receiving the response message returned by the server and agree to join, Messages posted by users in the group or joined by individual users.
- the device may further include a feature recognition dictionary establishing module, configured to receive a keyword in the manually configured feature recognition dictionary; or
- the apparatus may further include: a first category determining module, or a second category determining module, or a third category determining module.
- the first category determining module is configured to: after capturing the successfully matched message content, or after capturing the successfully matched message content and the related content of the successfully matched message content as the feature description information, performing the feature description information. Before the saving, the feature description information is matched with the keywords in the pre-established category recognition dictionary, and the category corresponding to the feature description information is determined according to the matching result; the second category determining module is configured to capture the message that the matching is successful. Determining the feature description by a natural language processing (NLP) model after the content, or the content of the successfully matched message content and the content of the matching successful message content are captured as feature description information, and the feature description information is saved.
- NLP natural language processing
- the third category determining module is configured to: after the content of the message that matches the success is captured, or the content of the message that matches the successful match and the related content of the message content that is successfully matched are used as the feature description information, the feature is Before the description is saved, the pre-according is used Feature description text information is not trained to determine the probability model corresponding to the category feature descriptor; the feature of the information processing module 440 may be specifically configured to: determine a category of the features described in the associated information stored.
- the second category determining module is specifically configured to: determine a category corresponding to the feature description information by using a semantic similarity algorithm model and/or a click similarity algorithm model.
- the apparatus may further include: a receiver information determining module and a feature description information sending module.
- the receiver information determining module is configured to determine information of the receiver of the feature description information according to the category after determining the category corresponding to the feature description information; and the feature description information sending module is configured to be used according to the receiver The information transmits the feature description information to the recipient.
- the information of the receiving party may be an address of setting a website, setting a short message number of the receiving user, an email address, or setting an instant messaging software account of the receiving user.
- the related content of the successfully matched message content may include: the context message matching the successful message content; and/or, after establishing a session with the user who issues the message content and sending a message content supplement request to the user , the supplemental content returned by the user.
- the keyword in the feature recognition dictionary may include a keyword reflecting a defect of the product, and correspondingly, the feature description information may be information describing a defect of the product.
- the information mining device provided by the embodiment of the present invention can execute the information mining method provided by any embodiment of the present invention, and has the corresponding functional modules and beneficial effects of the execution method.
- the embodiment provides a non-volatile computer storage medium storing one or more modules, when the one or more modules are executed by a device performing an information mining method, causing the device Do the following:
- the matching is successful, the content of the message that matches the success is captured, or the content of the message that matches the success and the related content of the message content that is successfully matched are captured as the feature description information, and the feature description information is saved.
- the method may further include:
- the message that is sent in the monitoring instant messaging software application may specifically include:
- the user in the joined group or the message posted by the joined individual user is monitored.
- the feature recognition dictionary may be established, which may specifically include:
- a typical sentence of the manual inclusion is searched in the chat history of the instant messaging software, and keywords corresponding to the corresponding features are mined and added in the feature recognition dictionary according to the context co-occurrence relationship of the typical sentence.
- the method may further include:
- Saving the feature description information may include: associating the determined category with the feature description information.
- determining a category corresponding to the feature description information by using a natural language processing NLP model may specifically include:
- the semantic similarity algorithm model and/or the click similarity algorithm model are used to determine the category corresponding to the feature description information.
- the method may further include:
- the information of the receiver may be an address of setting a website, setting a short message number of the receiving user, an email address, or setting an instant messaging software account of the receiving user.
- the related content of the message content that is successfully matched may include: the context message that matches the successful message content; and/or, in the content of the message content
- the keyword in the feature recognition dictionary may include a keyword reflecting a defect of the product, and the feature description information is information describing a defect of the product.
- FIG. 5 is a schematic structural diagram of hardware of an apparatus for performing an information mining method according to Embodiment 6 of the present invention.
- the device includes:
- One or more processors 510, one processor 510 is taken as an example in FIG. 5;
- Memory 520 and one or more modules.
- the device may also include an input device 530 and an output device 540.
- the processor 510, the memory 520, the input device 530, and the output device 540 in the device may be connected by a bus or other means, and the bus connection is taken as an example in FIG.
- the memory 520 is used as a computer readable storage medium for storing software programs, computer executable programs, and modules, such as the program instructions/modules corresponding to the information mining method in the embodiment of the present invention (for example, the information shown in FIG. 4).
- the processor 510 executes various functional applications and data processing of the server by executing software programs, instructions, and modules stored in the memory 520, that is, implementing the information mining method in the foregoing method embodiments.
- the memory 520 can include a storage program area and a storage data area, wherein the storage program area can store operations The system, at least one function required application; the storage data area can store data created according to the use of the terminal device, and the like. Further, the memory 520 may include a high speed random access memory, and may also include a nonvolatile memory such as at least one magnetic disk storage device, flash memory device, or other nonvolatile solid state storage device. In some examples, memory 520 can further include memory remotely located relative to processor 510, which can be connected to the terminal device over a network. Examples of such networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
- Input device 530 can be used to receive input digital or character information and to generate key signal inputs related to user settings and function control of the terminal.
- the output device 540 can include a display device such as a display screen.
- the one or more modules are stored in the memory 520, and when executed by the one or more processors 510, perform the following operations:
- the matching is successful, the content of the message that matches the success is captured, or the content of the message that matches the success and the related content of the message content that is successfully matched are captured as the feature description information, and the feature description information is saved.
- the method may further include:
- the message that is sent in the monitoring instant messaging software application specifically includes:
- the user in the joined group or the message posted by the joined individual user is monitored.
- the establishing the feature recognition dictionary may specifically include:
- the method may further include :
- Saving the feature description information may include: associating the determined category with the feature description information.
- determining the category corresponding to the feature description information by using the natural language processing NLP model may specifically include:
- the semantic similarity algorithm model and/or the click similarity algorithm model are used to determine the category corresponding to the feature description information.
- the method further includes:
- the information of the receiving party may be an address of setting a website, setting a short message number of the receiving user, an email address, or setting an instant messaging software account of the receiving user.
- the related content of the successfully matched message content may include: the context message matching the successful message content; and/or establishing a session with the user who issues the message content and sending the message content to the user The supplementary content returned by the user after the request is supplemented.
- the keywords in the feature recognition dictionary may include keywords reflecting product defects, and the feature description information is information describing product defects.
- the present invention can be implemented by software and necessary general hardware, and can also be implemented by hardware, but in many cases, the former is a better implementation. .
- the technical solution of the present invention which is essential or contributes to the prior art, can be embodied in the form of a software product.
- the software product can be stored in a computer readable storage medium, such as a computer floppy disk, read-only memory (ROM), random access memory (RAM), flash memory (FLASH), hard disk or optical disk, etc.
- a number of instructions are included to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform the methods described in various embodiments of the present invention.
- the units and modules included in the information mining apparatus are only divided according to functional logic, but are not limited to the foregoing division, as long as the corresponding functions can be implemented;
- the specific names of the functional units are also for convenience of distinguishing from each other and are not intended to limit the scope of the present invention.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Software Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Mathematical Physics (AREA)
- Fuzzy Systems (AREA)
Abstract
Provided are an information mining method and apparatus, and a storage medium. The method includes: monitoring a message issued in an instant communication software application (110); parsing the monitored message to obtain message content (120); matching the message content with a keyword in a pre-established feature identification dictionary (130); and when the matching succeeds, capturing the message content or the message content and related content of the message content as feature description information, and saving the feature description information (140). The message issued in the instant communication software application has high category definition and high information specialism, such that the feature description information of a specific object can be automatically captured by matching the parsed message content with the keyword in the feature identification dictionary and capturing the successfully matched message content or capturing the successfully matched message content and the related content of the message content, thereby reducing the labour costs, and improving the specialism and accuracy of the obtained feature description information of the specific object.
Description
本专利申请要求于2014年11月27日提交的、申请号为201410710424.7、申请人为百度在线网络技术(北京)有限公司、发明名称为“信息挖掘方法和装置”的中国专利申请的优先权,该申请的全文以引用的方式并入本申请中。This patent application claims the priority of the Chinese patent application filed on November 27, 2014, the application number is 201410710424.7, the applicant is Baidu Online Network Technology (Beijing) Co., Ltd., and the invention name is "Information Mining Method and Apparatus". The full text of the application is hereby incorporated by reference.
本发明实施例涉及信息技术领域,尤其涉及一种信息挖掘方法、装置和存储介质。The embodiments of the present invention relate to the field of information technologies, and in particular, to an information mining method, apparatus, and storage medium.
现有技术中获取与产品或服务等对象相关的信息,比如对产品的改进有帮助的产品缺陷描述信息时,通常是通过人工在相关领域的论坛或网页中进行抓取,效率低下且准确度不高。In the prior art, information related to objects such as products or services, such as product defect description information that is helpful for product improvement, is usually manually captured in a forum or webpage in a related field, which is inefficient and accurate. not tall.
发明内容Summary of the invention
本发明实施例提供一种信息挖掘方法、装置和存储介质,以实现自动捕获特定对象的特征信息,节省人力成本,并提升捕获到的特定对象的特征信息的准确度。The embodiment of the invention provides an information mining method, device and storage medium, so as to automatically capture feature information of a specific object, save labor cost, and improve the accuracy of the captured feature information of the specific object.
第一方面,本发明实施例提供了一种信息挖掘方法,包括:In a first aspect, an embodiment of the present invention provides an information mining method, including:
监听即时通信软件应用中发布的消息;Listening for messages posted in instant messaging software applications;
对监听到的消息进行解析,得到消息内容;Parsing the intercepted message to get the message content;
将所述消息内容与预先建立的特征识别词典中的关键词进行匹配;Matching the message content with keywords in a pre-established feature recognition dictionary;
在匹配成功时,抓取匹配成功的消息内容,或者抓取匹配成功的消息内容和所述匹配成功的消息内容的相关内容作为特征描述信息,并将所述特征描述信息进行保存。
When the matching is successful, the content of the message that matches the success is captured, or the content of the message that matches the success and the related content of the message content that is successfully matched are captured as the feature description information, and the feature description information is saved.
第二方面,本发明实施例还提供了一种信息挖掘装置,包括:In a second aspect, an embodiment of the present invention further provides an information mining apparatus, including:
消息监听模块,用于监听即时通信软件应用中发布的消息;a message monitoring module for monitoring messages published in an instant messaging software application;
消息解析模块,用于对监听到的消息进行解析,得到消息内容;a message parsing module, configured to parse the intercepted message to obtain a message content;
匹配模块,用于将所述消息内容与预先建立的特征识别词典中的关键词进行匹配;a matching module, configured to match the content of the message with a keyword in a pre-established feature recognition dictionary;
特征描述信息处理模块,用于在匹配成功时,抓取匹配成功的消息内容,或者抓取匹配成功的消息内容和所述匹配成功的消息内容的相关内容作为特征描述信息,并将所述特征描述信息进行保存。a feature description information processing module, configured to: when the matching is successful, capture the content of the message that matches the success, or capture the content of the message that matches the success and the related content of the message content that is successfully matched as the feature description information, and the feature is Describe the information for saving.
第三方面,本发明实施例还提供了一种非易失性计算机存储介质,所述计算机存储介质存储有一个或者多个模块,当所述一个或者多个模块被一个执行信息挖掘方法的设备执行时,使得所述设备执行如下操作:In a third aspect, an embodiment of the present invention further provides a non-volatile computer storage medium, where the computer storage medium stores one or more modules, when the one or more modules are used by a device that performs an information mining method. When executed, the device is caused to perform the following operations:
监听即时通信软件应用中发布的消息;Listening for messages posted in instant messaging software applications;
对监听到的消息进行解析,得到消息内容;Parsing the intercepted message to get the message content;
将所述消息内容与预先建立的特征识别词典中的关键词进行匹配;Matching the message content with keywords in a pre-established feature recognition dictionary;
在匹配成功时,抓取匹配成功的消息内容,或者抓取匹配成功的消息内容和所述匹配成功的消息内容的相关内容作为特征描述信息,并将所述特征描述信息进行保存。When the matching is successful, the content of the message that matches the success is captured, or the content of the message that matches the success and the related content of the message content that is successfully matched are captured as the feature description information, and the feature description information is saved.
本发明实施例提供的信息挖掘方法、装置和存储介质,通过监听并解析即时通信软件应用中发布的消息,由于即时通信软件应用中发布消息不仅类别清晰度高,而且信息专业性高,因此通过将解析到的消息内容与预先建立的特征识别词典中的关键词进行匹配,并抓取匹配成功的消息内容,或者抓取匹配成功的消息内容和该消息内容的相关内容,可以自动捕获特定对象的特征描述信息,节省了人力成本,并提升了得到的特定对象的特征描述信息的专业性和准确性,有利于根据所述特征描述信息对特定对象进行改进。The information mining method, device and storage medium provided by the embodiments of the present invention monitor and parse messages published in the instant messaging software application, because the message published in the instant messaging software application not only has high category definition, but also has high professionalism, so Matching the parsed message content with the keywords in the pre-established feature recognition dictionary, and capturing the successfully matched message content, or capturing the successfully matched message content and the related content of the message content, can automatically capture the specific object The feature description information saves the labor cost, and improves the professionalism and accuracy of the obtained feature description information of the specific object, and is beneficial to improve the specific object according to the feature description information.
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所
需使用的附图作简单地介绍,当然,以下描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以对这些附图进行修改和替换。In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the following description will be made on the embodiments.
The drawings to be used are briefly described. Of course, the drawings in the following description are only some embodiments of the present invention, and those skilled in the art can also attach these without any creative work. The diagram is modified and replaced.
图1为本发明实施例一提供的一种信息挖掘方法的流程图;FIG. 1 is a flowchart of an information mining method according to Embodiment 1 of the present invention;
图2为本发明实施例二提供的一种信息挖掘方法的流程图;2 is a flowchart of an information mining method according to Embodiment 2 of the present invention;
图3a为本发明实施例三提供的一种信息挖掘方法的流程图;3a is a flowchart of an information mining method according to Embodiment 3 of the present invention;
图3b为本发明实施例三提供的另一种信息挖掘方法的流程图;FIG. 3b is a flowchart of another information mining method according to Embodiment 3 of the present invention;
图3c为本发明实施例三提供的又一种信息挖掘方法的流程图;3c is a flowchart of still another information mining method according to Embodiment 3 of the present invention;
图4为本发明实施例四提供的一种信息挖掘装置的结构示意图;4 is a schematic structural diagram of an information mining apparatus according to Embodiment 4 of the present invention;
图5为本发明实施例六提供的一种执行信息挖掘方法的设备的硬件结构示意图。FIG. 5 is a schematic structural diagram of hardware of an apparatus for performing an information mining method according to Embodiment 6 of the present invention.
下面将结合附图对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明的一部分实施例,而不是全部的实施例,是为了阐述本发明的原理,而不是要将本发明限制于这些具体的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings. Rather than limiting the invention to these specific embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.
实施例一Embodiment 1
请参阅图1,为本发明实施例一提供的一种信息挖掘方法的流程图。本发明实施例的方法可以由配置以硬件和/或软件实现的信息挖掘装置来执行,该实现装置典型的是配置于能够提供数据挖掘服务的服务器中。FIG. 1 is a flowchart of an information mining method according to Embodiment 1 of the present invention. The method of an embodiment of the present invention may be performed by an information mining device configured to be implemented in hardware and/or software, which is typically configured in a server capable of providing data mining services.
该方法包括:操作110~操作140。The method includes operations 110 through 140.
110、监听即时通信软件应用中发布的消息。110. Monitor messages published in the instant messaging software application.
通常,每个企业内部都有跟该企业产品或部门相关的即时通信软件应用,以方便该企业内负责各产品研发群体或负责运营维护的群体发布消息。Usually, each enterprise has an instant messaging software application related to the enterprise product or department, so as to facilitate the release of messages in the enterprise responsible for each product development group or responsible for operation and maintenance.
例如,百度公司推出的百度Hi是一款集文字消息、语音视频通话和文件传
输等功能的即时通讯软件应用,在百度Hi中可以建立跟产品“百度地图”或产品“百度翻译”等对应的群组,以方便百度公司内负责各产品研发或负责运营维护的工作人员发布消息。For example, Baidu Hi launched by Baidu is a set of text messages, voice and video calls and file transfers.
For the instant messaging software application of the loss function, you can establish a group corresponding to the product "Baidu Map" or the product "Baidu Translation" in Baidu Hi, so as to facilitate the release of the staff responsible for product development or operation and maintenance in Baidu. Message.
其中,发布消息的方式有多种,可以以文字形式发布,也可以以语音、视频或图片等其他形式发布,本实施例对此不进行限制,只要得到即时通信软件应用支持即可。There are a plurality of ways for publishing a message, which may be advertised in the form of text, or may be advertised in other forms such as voice, video, or picture. This embodiment does not limit this, as long as the instant messaging software application support is available.
本操作具体可以是监听即时通信软件应用中与企业产品相关的群组或与企业部门相关的群组中发布的文字消息。若为其他形式发布的消息,需要通过相关识别技术(例如,语音识别技术或图片识别技术)识别其中的内容,得到对应的文字消息。The operation may specifically be to listen to a text message published in a group related to the enterprise product or a group related to the enterprise department in the instant messaging software application. If the message is published in other forms, it is necessary to identify the content through a related identification technology (for example, voice recognition technology or picture recognition technology) to obtain a corresponding text message.
120、对监听到的消息进行解析,得到消息内容。120. Parse the monitored message to obtain the message content.
本操作中,具体是根据即时通信软件应用的通信协议,对监听到的消息进行翻译,正确还原出与监听到的消息对应的原始数据,也即还原出可阅读的字符串。In this operation, specifically, according to the communication protocol of the instant messaging software application, the intercepted message is translated, and the original data corresponding to the monitored message is correctly restored, that is, the readable readable string is restored.
130、将所述消息内容与预先建立的特征识别词典中的关键词进行匹配。130. Match the content of the message with a keyword in a pre-established feature recognition dictionary.
本操作具体是利用关键词匹配技术,根据预先建立的特征识别词典,确定所述消息内容中是否包含所述特征识别词典中的关键词。The operation specifically uses a keyword matching technology to determine whether the keyword in the feature recognition dictionary is included in the message content according to a pre-established feature recognition dictionary.
需要说明的是,企业内各对象对应的群体发布消息不同,解析到的消息内容不同。群体具有类别清晰度高、信息专业性高和语言特征明显的特点(例如每个群组包含的群成员都是一种类别或者做同一产品的人群,群成员都具相同或相似的专业背景),因此不同群体发布的消息能够反映企业对象信息。It should be noted that the group publishing messages corresponding to each object in the enterprise are different, and the content of the parsed message is different. The group has the characteristics of high definition, high information professionality and obvious language features (for example, each group contains group members that are a category or a group of people who have the same product or similar professional background) Therefore, messages published by different groups can reflect enterprise object information.
其中,对象可以是具体的各个产品,也可以是企业管理等宏观对象。Among them, the object can be a specific product, or it can be a macro object such as enterprise management.
例如,“百度地图”产品对应的群组是百度公司负责“百度地图”研发或运营维护的群体,该群组中群成员发布的消息通常包含有该产品的优缺点信息、或该产品的后续改进信息。For example, the group corresponding to the "Baidu Map" product is a group that Baidu Company is responsible for the development or operation maintenance of "Baidu Map". The messages published by the members of the group usually contain the advantages and disadvantages of the product, or the follow-up of the product. Improve the information.
又如,“百度浏览器”产品对应的调试群组中群成员发布的消息通常包含有该产品调试过程中出现的bug或者疑似问题。For example, the message posted by the group members in the debugging group corresponding to the "Baidu Browser" product usually contains bugs or suspected problems during the debugging process of the product.
因此,可以对企业不同的对象对应的群组建立相应的特征识别词典,从而
得到不同对象(例如不同的产品,或者企业管理)对应的特征描述信息(例如不同产品的优缺点信息,或企业管理存在的问题);对企业同一对象的不同群组,优选是建立相应的特征识别词典,从而得到与同一对象有关的不同层面的特征描述信息。Therefore, a corresponding feature recognition dictionary can be established for a group corresponding to different objects of the enterprise, thereby
Obtain feature description information corresponding to different objects (such as different products, or enterprise management) (such as the advantages and disadvantages of different products, or problems in enterprise management); for different groups of the same object of the enterprise, it is preferred to establish corresponding features. The dictionary is identified to obtain different levels of feature description information related to the same object.
例如,对“百度地图”产品中的研发群体建立与研发有关的特征识别词典,该词典中的关键词可以包括“研发”、“进展”、“趋势”、“成本”和“对手”等;对“百度地图”产品中的调试群体建立与调试有关的特征识别词典,该词典中的关键词可以包括“调试错误”、“调试周期”、“bug”、“漏洞”和“缺陷”等;对“百度地图”产品中的发布群体建立与发布有关的特征识别词典,该词典中的关键词可以包括“发布”、“发布会”、“发布行程”和“发布日期”等。For example, the R&D group in the "Baidu Map" product establishes a feature recognition dictionary related to R&D, and the keywords in the dictionary may include "R&D", "Progress", "Trend", "Cost", and "Entity"; A feature recognition dictionary related to debugging is established for the debugging group in the "Baidu Map" product, and the keywords in the dictionary may include "debug error", "debug cycle", "bug", "vulnerability" and "defect", etc.; The feature recognition dictionary related to the release group in the "Baidu Map" product is established, and the keywords in the dictionary may include "release", "release conference", "release schedule", and "release date".
140、在匹配成功时,抓取匹配成功的消息内容,或者抓取匹配成功的消息内容和所述匹配成功的消息内容的相关内容作为特征描述信息,并将所述特征描述信息进行保存。140. When the matching is successful, the content of the message that matches the success is captured, or the content of the message that matches the success and the related content of the message content that is successfully matched are captured as the feature description information, and the feature description information is saved.
本操作中,可以有两种实施方式,一种是在匹配成功时,抓取匹配成功的消息内容作为特征描述信息,并将所述特征描述信息进行保存;另一种是在匹配成功时,抓取匹配成功的消息内容和所述匹配成功的消息内容的相关内容作为特征描述信息,并将所述特征描述信息进行保存。In this operation, there may be two implementation manners, one is to capture the message content of the matching success as the feature description information and save the feature description information when the matching is successful; the other is that when the matching is successful, The content of the matching successful message and the related content of the successfully matched message content are captured as feature description information, and the feature description information is saved.
其中,优选是抓取匹配成功的消息内容和所述匹配成功的消息内容的相关内容作为特征描述信息,并将所述特征描述信息进行保存,相比于只抓取匹配成功的消息内容,该优选方式有利于得到对象的完整的特征描述信息。Preferably, the content of the message matching the success and the related content of the message content that is successfully matched are captured as the feature description information, and the feature description information is saved, compared to only capturing the message content that is successfully matched. The preferred manner facilitates obtaining complete characterization information for the object.
可以设定抓取时间间隔和/或抓取条数,以抓取匹配成功的消息内容的相关内容,例如将抓取时间间隔设定为15s,将抓取条数设定为5。You can set the crawl interval and/or the number of crawls to capture the relevant content of the message content that matches the success. For example, set the crawl interval to 15s and the number of grabs to 5.
进一步地,所述匹配成功的消息内容的相关内容可以包括:所述匹配成功的消息内容的上下文消息;和/或,在与发布所述消息内容的用户建立会话并向所述用户发送消息内容补充请求后,所述用户返回的补充内容。Further, the related content of the successfully matched message content may include: the context message matching the successful message content; and/or establishing a session with the user who issues the message content and sending the message content to the user The supplementary content returned by the user after the request is supplemented.
示例1Example 1
以对象为“百度浏览器”产品为例进行说明。该产品的某个群组发布的消
息中包含大量有关该产品的评价和问题讨论,例如:该产品的一个设计人员在开发群组中发布消息“登录百度浏览器时,登录权限有问题”,接着该产品的另一个设计人员在该开发群组中发布消息“确实,原因是A”,通过匹配操作后,“登录百度浏览器时,登录权限有问题”这条发布消息与所述特征识别词典中的关键词“问题”匹配成功,通过抓取消息内容“登录百度浏览器时,登录权限有问题”,可以得到该产品的缺陷对应的特征描述信息,并通过抓取该消息内容的上下文消息“确实,原因是A”,可以得到该产品中该缺陷的产生原因对应的特征描述信息,从而丰富了该产品的特征描述信息。Take the object as a "Baidu Browser" product as an example. The release of a group of the product
The content contains a lot of comments and problem discussions about the product, for example: a designer of the product posted a message in the development group "When logging into Baidu browser, there is a problem with the login permissions", then another designer of the product is The development group publishes the message "Really, the reason is A". After the matching operation, the "release login has a problem when logging in to the Baidu browser" message matches the keyword "problem" in the feature recognition dictionary. Succeeded, by grabbing the message content "When logging in to Baidu browser, there is a problem with the login permission", the feature description information corresponding to the defect of the product can be obtained, and the context message of the content of the message is "acquired, the reason is A", The feature description information corresponding to the cause of the defect in the product can be obtained, thereby enriching the feature description information of the product.
需要说明的是,上述以抓取产品的缺陷对应的特征描述信息和该缺陷的产生原因对应的特征描述信息为例进行说明,除了抓取该缺陷的产生原因对应的特征描述信息之外,还可以抓取该缺陷对应的解决方案等其他特征描述信息,作为产品缺陷的完整信息,并进行格式化(例如[产品名称、缺陷内容、产生原因])存储,本实施例对此不进行限制。It should be noted that the feature description information corresponding to the defect corresponding to the defect of the product and the feature description information corresponding to the cause of the defect are described as an example, in addition to the feature description information corresponding to the cause of the defect, Other feature description information such as the solution corresponding to the defect may be captured as the complete information of the product defect, and formatted (for example, [product name, defect content, cause of occurrence]) is stored, which is not limited in this embodiment.
示例2Example 2
在与发布所述消息内容的用户建立会话后,并采取启发式的提问向所述用户发送消息内容补充请求,以请求补充产品缺陷的完整描述,此时可以基于会话(session)进行抓取,即针对缺陷描述维度较多(例如缺陷类型、缺陷产生原因等)的情况,设置一个较长的抓取时间(如一分钟),在此时间内,抓取所述用户返回的补充内容。如果在此时间内仍没有补充描述,则只记录基本信息,或由于必要信息不全而返回失败。After establishing a session with the user who publishes the content of the message, and taking a heuristic question to send a message content supplement request to the user to request a complete description of the product defect, and at this time, the session may be crawled. That is, for a case where the defect description dimension is large (for example, the defect type, the cause of the defect, etc.), a long crawl time (for example, one minute) is set, and during this time, the supplementary content returned by the user is captured. If there is still no supplementary description within this time, only the basic information is recorded, or the failure is returned because the necessary information is incomplete.
本实施例的技术方案,通过监听并解析即时通信软件应用中发布的消息,由于即时通信软件应用中发布消息不仅类别清晰度高,而且信息专业性高,因此通过将解析到的消息内容与预先建立的特征识别词典中的关键词进行匹配,并抓取匹配成功的消息内容,或者抓取匹配成功的消息内容和该消息内容的相关内容,可以自动捕获特定对象的特征描述信息,节省了人力成本,并提升了得到的特定对象的特征描述信息的专业性和准确性,有利于根据所述特征描述信息对特定对象进行改进。The technical solution of the embodiment, by monitoring and parsing the message published in the instant messaging software application, because the message published in the instant messaging software application is not only high in class definition, but also high in information professionality, so the content of the message to be parsed is The keywords in the established feature recognition dictionary are matched, and the content of the message matching the success is captured, or the content of the message matching the success and the related content of the message content are captured, and the feature description information of the specific object can be automatically captured, thereby saving manpower. Cost, and improve the professionalism and accuracy of the obtained feature description information of a specific object, and facilitate improvement of a specific object according to the feature description information.
在本实施例中,建立所述特征识别词典,具体可以包括:
In this embodiment, the establishing the feature recognition dictionary may specifically include:
接收人工配置的特征识别词典中的关键词;或者,Receiving keywords in a manually configured feature recognition dictionary; or,
在所述即时通信软件的聊天历史记录中查找人工收录的典型语句,根据该典型语句的上下文共现关系,挖掘出表达相应特征的关键词并添加在特征识别词典中。A typical sentence of the manual inclusion is searched in the chat history of the instant messaging software, and keywords corresponding to the corresponding features are mined and added in the feature recognition dictionary according to the context co-occurrence relationship of the typical sentence.
换言之,可以人工配置特征识别词典中的各关键词,例如,在特征识别词典中配置“问题”、“缺陷”或“改进”等关键词。In other words, each keyword in the feature recognition dictionary can be manually configured, for example, keywords such as "problem", "defect" or "improvement" are configured in the feature recognition dictionary.
也可以人工收录一些典型语句,并根据聊天历史记录中典型语句的上下文共现关系,从而将满足一定共现频率的典型语句中的表达特征的词作为关键词,并添加至特征识别词典中;或者挖掘出表达特征的语义模板。It is also possible to manually include some typical sentences, and according to the context co-occurrence relationship of the typical sentences in the chat history, the words expressing the features in the typical sentences satisfying a certain co-occurrence frequency are used as keywords and added to the feature recognition dictionary; Or dig out a semantic template that expresses features.
例如,在百度Hi的“百度浏览器”产品的研发群组中,一个人说“检索式=xxx,配图错误啊,谁谁看看”,另一个人回答“没错,是个问题,已记录缺陷”,如果群消息里多次出现“配图错误”和“已记录缺陷”这两句话配对时,就认为配对的这两句话存在共现关系,表明这个是需要记录的缺陷,基于此,可以挖掘出表达缺陷的语义模板“[任意词]配图错误”。For example, in Baidu Hi's "Baidu Browser" product research and development group, one person said "search type = xxx, the picture is wrong, who sees", another person replies "yes, it is a problem, has been recorded Defect", if there are multiple pairs of "map error" and "recorded defect" in the group message, it is considered that there is a co-occurrence relationship between the two sentences, indicating that this is a defect that needs to be recorded, based on In this way, the semantic template "[arbitrary word] map error" expressing the defect can be unearthed.
本实施例提供的信息挖掘方法,可以应用于多种场景,例如,根据建立的产品缺陷对应的特征识别词典,得到对象为产品的缺陷描述信息;又如,根据建立的产品调试对应的特征识别词典,得到对象为产品的调试问题描述信息;再如,根据建立的企业管理对应的特征识别词典,得到对象为企业管理事件的管理意见征集等描述信息,本实施例对此不进行限制。The information mining method provided in this embodiment can be applied to various scenarios. For example, according to the feature recognition dictionary corresponding to the established product defect, the object is obtained as defect description information of the product; for example, according to the established product debugging corresponding feature recognition. The dictionary obtains the description information of the debugging problem of the product. For example, according to the feature recognition dictionary corresponding to the established enterprise management, the description information such as the management opinion collection of the enterprise management event is obtained, which is not limited in this embodiment.
具体地,当用于捕获对象为产品的缺陷描述信息时,所述特征识别词典中的关键词包含反映产品缺陷的关键词,所述特征描述信息为描述产品缺陷的信息。本实施方式提供了从产品挖掘,到缺陷相关内容抓取,最终保存到指定空间的全自动化的实现方式,可以覆盖企业所有产品线的重要产品群。Specifically, when the defect description information for capturing an object is a product, the keyword in the feature recognition dictionary includes a keyword reflecting a defect of the product, and the feature description information is information describing a defect of the product. The present embodiment provides a fully automated implementation method from product mining to defect-related content capture and finally to a designated space, which can cover an important product group of all product lines of the enterprise.
实施例二Embodiment 2
请参阅图2,为本发明实施例二提供的一种信息挖掘方法的流程图。本实施例在上述实施例的基础上,提供了在监听即时通信软件应用中发布的消息之前的优选方案。该优选方法包括:操作210~操作220。
FIG. 2 is a flowchart of an information mining method according to Embodiment 2 of the present invention. This embodiment provides a preferred solution before listening to messages published in an instant messaging software application on the basis of the above embodiments. The preferred method includes operations 210 through 220.
210、在获取与所述即时通信软件应用对应的服务器的访问权限后,与所述服务器建立连接。210. After obtaining access rights of the server corresponding to the instant messaging software application, establish a connection with the server.
例如,获取与即时通信软件应用“百度Hi”对应的服务器的访问权限,并与该服务器建立连接。For example, the access right of the server corresponding to the instant messaging software application "Baidu Hi" is obtained, and a connection is established with the server.
220、向所述服务器发送对所述即时通信软件应用中的群组账号或个人用户账号的加入请求。220. Send a request for joining a group account or a personal user account in the instant messaging software application to the server.
例如,向即时通信软件应用“百度Hi”对应的服务器发送群组账号“百度浏览器-研发群”的加入请求,从而使得新加入的群成员能够在该群组中发布与产品“百度浏览器”有关的消息。For example, the server corresponding to the instant messaging software application “Baidu Hi” sends a join request of the group account “Baidu Browser-R&D Group”, so that the newly joined group members can publish the product “Baidu Browser” in the group. "Related news.
又如,向即时通信软件应用“百度Hi”对应的服务器发送个人用户账号加入请求,新加入的个人账号可以与已经加入该应用的其他个人账号就同一产品聊天,形成发布的消息;新加入的个人账号可以申请加入已经加入该应用的群组账号,从而使得新加入的群成员在该群组中发布消息。For example, the server corresponding to the instant messaging software application “Baidu Hi” sends a personal user account joining request, and the newly added personal account can chat with the same personal account that has joined the application to form a published message; newly added A personal account can apply to join a group account that has joined the application, so that the newly joined group member posts a message in the group.
本实施例的技术方案,在监听即时通信软件应用中发布的消息之前,通过与即时通信软件应用对应的服务器建立连接,并交互账号加入请求,从而使得加入该即时通信软件应用中的账号能够在该应用中发布消息。In the technical solution of the embodiment, before monitoring the message published in the instant messaging software application, establishing a connection with the server corresponding to the instant messaging software application, and interacting with the account joining request, so that the account added to the instant messaging software application can be The app posts the message.
需要说明的是,在向所述服务器发送对所述即时通信软件应用中的群组账号或个人用户账号的加入请求之后,监听即时通信软件应用中发布的消息,具体可以包括:在接收到所述服务器返回的同意加入的响应消息后,监听加入的群组中的用户或加入的个人用户发布的消息。It should be noted that after the request for the group account or the personal user account in the instant messaging software application is sent to the server, the message posted in the instant messaging software application may be monitored, which may include: receiving the After the response message returned by the server agrees to join, the user in the joined group or the message posted by the joined individual user is monitored.
实施例三Embodiment 3
请参阅图3a,为本发明实施例三提供的一种信息挖掘方法的流程图。本实施例在上述各实施例的基础上,提供了在抓取匹配成功的消息内容,或者抓取匹配成功的消息内容和所述匹配成功的消息内容的相关内容作为特征描述信息之后、将所述特征描述信息进行保存之前的优选方案。Please refer to FIG. 3a, which is a flowchart of an information mining method according to Embodiment 3 of the present invention. On the basis of the above embodiments, the present embodiment provides a message content that captures a successful match, or captures the content of the message that is successfully matched and the related content of the message content that is successfully matched as the feature description information. The preferred description of the feature description information before saving.
该优选方法包括:操作310~操作360。The preferred method includes operations 310 through 360.
310、监听即时通信软件应用中发布的消息。
310. Monitor messages published in the instant messaging software application.
320、对监听到的消息进行解析,得到消息内容。320. Parse the intercepted message to obtain the content of the message.
330、将所述消息内容与预先建立的特征识别词典中的关键词进行匹配。330. Match the content of the message with a keyword in a pre-established feature recognition dictionary.
340、在匹配成功时,抓取匹配成功的消息内容,或者抓取匹配成功的消息内容和所述匹配成功的消息内容的相关内容作为特征描述信息。340. When the matching is successful, the content of the message that matches the success is captured, or the content of the message that matches the success and the related content of the message content that is successfully matched are captured as the feature description information.
350、将所述特征描述信息与预先建立的类别识别词典中的关键词进行匹配,根据匹配结果确定所述特征描述信息对应的类别。350. Match the feature description information with a keyword in a pre-established category recognition dictionary, and determine a category corresponding to the feature description information according to the matching result.
如前所述,本发明实施例提供的信息挖掘方法,可以应用于多种场景,因此可以根据实际应用需求,建立包含有多种应用需求的类别识别词典。As described above, the information mining method provided by the embodiment of the present invention can be applied to various scenarios, and therefore, a category recognition dictionary including a plurality of application requirements can be established according to actual application requirements.
类别识别词典中的关键词可以人工配置。所述类别识别词典中的关键词可以包括:百度地图研发缺陷、百度浏览器调试缺陷和百度翻译研发改进等,本实施例对此不进行限制。Keywords in the category recognition dictionary can be manually configured. The keyword in the category recognition dictionary may include: Baidu map development defect, Baidu browser debugging defect, and Baidu translation development improvement, etc., which is not limited in this embodiment.
360、将确定的类别与所述特征描述信息进行关联保存。360. Associate the determined category with the feature description information.
本实施例的技术方案,通过监听并解析即时通信软件应用中发布的消息,由于即时通信软件应用中发布消息不仅类别清晰度高,而且信息专业性高,因此通过将解析到的消息内容与预先建立的特征识别词典中的关键词进行匹配,并抓取匹配成功的消息内容,或者抓取匹配成功的消息内容和该消息内容的相关内容,可以自动捕获特定对象的特征描述信息,节省了人力成本,并提升了得到的特定对象的特征描述信息的专业性和准确性,有利于根据所述特征描述信息对特定对象进行改进;在抓取到对象的特征描述信息之后,通过确定所述特征描述信息对应的类别,并将确定的类别与所述特征描述信息进行关联保存,有利于绑定类别对应的负责群体,从而能够使相应负责群体根据特定对象的专业的特征描述信息,及时获知对象的有价值的反馈。The technical solution of the embodiment, by monitoring and parsing the message published in the instant messaging software application, because the message published in the instant messaging software application is not only high in class definition, but also high in information professionality, so the content of the message to be parsed is The keywords in the established feature recognition dictionary are matched, and the content of the message matching the success is captured, or the content of the message matching the success and the related content of the message content are captured, and the feature description information of the specific object can be automatically captured, thereby saving manpower. Cost, and improve the professionalism and accuracy of the obtained feature description information of the specific object, and facilitate improvement of the specific object according to the feature description information; after grasping the feature description information of the object, by determining the feature Describe the category corresponding to the information, and associate the determined category with the feature description information, which is beneficial to the responsible group corresponding to the binding category, thereby enabling the corresponding responsible group to know the object in time according to the professional feature description information of the specific object. Valuable feedback.
需要说明的是,操作350只是确定特征描述信息对应的类别的实施方式中的其中一种,确定特征描述信息对应的类别还可以是:通过自然语言处理(Natural Language Processing NLP)模型确定所述特征描述信息对应的类别(如图3b所示的操作351)。It should be noted that the operation 350 is only one of the implementation manners for determining the category corresponding to the feature description information, and determining the category corresponding to the feature description information may also be: determining the feature by a Natural Language Processing (NLP) model. The category corresponding to the description information (operation 351 shown in Figure 3b).
具体可以采用语义相似度算法模型和/或点击相似度算法模型,确定所述的特征描述信息对应的类别。
Specifically, the semantic similarity algorithm model and/or the click similarity algorithm model may be used to determine the category corresponding to the feature description information.
其中,语义相似度利用了自然语言处理云后台训练的监督方法训练模型来分析两段文本的相似度。值越大越相似。语义相似度的网络化提供了计算相似度的功能。比如输入“笔记本电脑”,“笔记本”的语义相似度为2.08478。Among them, the semantic similarity uses the supervised method training model of natural language processing cloud background training to analyze the similarity of two pieces of text. The larger the value, the more similar. Networking of semantic similarities provides the ability to compute similarities. For example, input "laptop", "notebook" semantic similarity is 2.08478.
其中,点击相似度可以在语义相似度无法达到阈值(如1.8)的情况下使用,分析两段文本的点击相似度(比如检索式和检索结果中的标题),使用训练的embedding向量计算cosine相似度值,取值范围[-1,1],值越大点击相似度越强。比如输入“百度你好”和“周鸿祎你好”两者的点击相似度是-0.121407,输入“百度你好”和“李彦宏你好”两者的点击相似度是0.218664;后者点击相似度比前者高。Among them, the click similarity can be used when the semantic similarity cannot reach the threshold (such as 1.8), analyze the click similarity of the two texts (such as the title in the search and search results), and calculate the cosine similarity using the trained embedding vector. Degree value, the range of values [-1, 1], the greater the value, the stronger the click similarity. For example, the click similarity between "Baidu Hello" and "Zhou Hongyi Hello" is -0.121407. The click similarity between "Baidu Hello" and "Li Yanhong Hello" is 0.218664; the latter clicks the similarity ratio The former is high.
实际使用中优先将特征描述信息与预设的多个类别分别进行语义相似度判断,返回语义相似度达到阈值且最高的类别,如果特征描述信息与预设类别的语义相似度未达到阈值,则继续将特征描述信息与该预设类别进行点击相似度判断,如果点击相似度达到阈值则返回相应类别,如果点击相似度未达到阈值,则返回默认类别(如:其他)。阈值会根据历史数据不断拟合,以保持更高的准度。In actual use, the feature description information is preferentially compared with the preset plurality of categories, and the semantic similarity reaches the threshold and the highest category is returned. If the semantic similarity between the feature description information and the preset category does not reach the threshold, then The feature description information is further determined by clicking similarity with the preset category, and if the click similarity reaches the threshold, the corresponding category is returned, and if the click similarity does not reach the threshold, the default category (eg, other) is returned. Thresholds are constantly fitted to historical data to maintain a higher degree of accuracy.
还需要说明的是,确定特征描述信息对应的类别还可以是:采用预先根据已标注类别信息的特征描述文本训练出的概率模型确定所述特征描述信息对应的类别,所述概率模型的输入为特征描述文本,输出为属于设定类别的概率值(如图3c所示的操作352)。具体的,根据已标注类别信息的特征描述文本预先训练出概率模型,将所述特征描述信息输入该概率模型,得到该概率模型输出的所述特征描述信息所对应的类别A及对应该类别A的概率值,若该概率值满足一定阈值,则确定所述特征描述信息对应的类别为类别A。例如可以通过聊天记录中的人工分类标注和对应的描述文本,训练出P(类型|特征描述信息)的概率模型,训练方法可根据系统的业务领域特点灵活选择,典型的如朴素贝叶斯方法。在应用中,若用户问题描述属于某一问题分类的概率满足一定阈值,即可认为属于该分类。It should be further noted that determining the category corresponding to the feature description information may further be: determining, by using a probability model trained according to the feature description text of the labeled category information in advance, a category corresponding to the feature description information, where the input of the probability model is The characterization text is output as a probability value belonging to the set category (operation 352 as shown in Figure 3c). Specifically, the probability model is pre-trained according to the feature description text of the labeled category information, and the feature description information is input into the probability model, and the category A corresponding to the feature description information output by the probability model and the corresponding category A are obtained. The probability value, if the probability value satisfies a certain threshold, determines that the category corresponding to the feature description information is category A. For example, a probabilistic model of P (type | feature description information) can be trained through manual classification annotation and corresponding description text in the chat record, and the training method can be flexibly selected according to the characteristics of the service domain of the system, and a typical naive Bayesian method is used. . In the application, if the probability that the user problem description belongs to a certain problem classification meets a certain threshold, it can be considered as belonging to the classification.
在本实施例的基础上,在确定所述特征描述信息对应的类别之后,还可以包括下述操作:
On the basis of the embodiment, after determining the category corresponding to the feature description information, the following operations may also be included:
根据所述类别确定所述特征描述信息的接收方的信息;Determining information of a recipient of the feature description information according to the category;
根据所述接收方的信息将所述特征描述信息发送给所述接收方。And transmitting the feature description information to the receiver according to the information of the receiver.
其中,所述接收方的信息可以为设定网站的地址、设定接收用户的短信号码、邮箱地址或设定接收用户的即时通信软件账号。The information of the receiving party may be an address of setting a website, setting a short message number of the receiving user, an email address, or setting an instant messaging software account of the receiving user.
本实施方式,提供了在抓取到对象的特征描述信息、并确定所述特征描述信息对应的类别之后,使接收方获知对象的特征描述信息的实现方式,将接收方作为类别对应的负责群体,并与该负责群体交互对象的专业的特征描述信息,从而能够使相应负责群体根据特定对象的专业的特征描述信息,及时获知对象的有价值的反馈。In this embodiment, after the feature description information of the object is captured and the category corresponding to the feature description information is determined, the receiver is notified of the feature description information of the object, and the receiver is regarded as the responsible group corresponding to the category. And the professional feature description information of the object interacting with the responsible group, so that the corresponding responsible group can timely obtain the valuable feedback of the object according to the professional feature description information of the specific object.
实施例四Embodiment 4
请参阅图4,为本发明实施例四提供的一种信息挖掘装置的结构示意图。该装置包括:消息监听模块410、消息解析模块420、匹配模块430和特征描述信息处理模块440。FIG. 4 is a schematic structural diagram of an information mining apparatus according to Embodiment 4 of the present invention. The device includes a message monitoring module 410, a message parsing module 420, a matching module 430, and a feature description information processing module 440.
其中,消息监听模块410用于监听即时通信软件应用中发布的消息;消息解析模块420用于对监听到的消息进行解析,得到消息内容;匹配模块430用于将所述消息内容与预先建立的特征识别词典中的关键词进行匹配;特征描述信息处理模块440用于在匹配成功时,抓取匹配成功的消息内容,或者抓取匹配成功的消息内容和所述匹配成功的消息内容的相关内容作为特征描述信息,并将所述特征描述信息进行保存。The message monitoring module 410 is configured to listen to the message published in the instant messaging software application; the message parsing module 420 is configured to parse the monitored message to obtain the message content; the matching module 430 is configured to use the message content and the pre-established The keyword in the feature recognition dictionary is matched; the feature description information processing module 440 is configured to capture the message content that is successfully matched when the matching is successful, or capture the content of the message that matches the success and the related content of the message content that is successfully matched. The feature description information is saved as the feature description information.
本实施例的技术方案,通过监听并解析即时通信软件应用中发布的消息,由于即时通信软件应用中发布消息不仅类别清晰度高,而且信息专业性高,因此通过将解析到的消息内容与预先建立的特征识别词典中的关键词进行匹配,并抓取匹配成功的消息内容,或者抓取匹配成功的消息内容和该消息内容的相关内容,可以自动捕获特定对象的特征描述信息,节省了人力成本,并提升了得到的特定对象的特征描述信息的专业性和准确性,有利于根据所述特征描述信息对特定对象进行改进。The technical solution of the embodiment, by monitoring and parsing the message published in the instant messaging software application, because the message published in the instant messaging software application is not only high in class definition, but also high in information professionality, so the content of the message to be parsed is The keywords in the established feature recognition dictionary are matched, and the content of the message matching the success is captured, or the content of the message matching the success and the related content of the message content are captured, and the feature description information of the specific object can be automatically captured, thereby saving manpower. Cost, and improve the professionalism and accuracy of the obtained feature description information of a specific object, and facilitate improvement of a specific object according to the feature description information.
在上述方案中,所述装置还可以包括:连接建立模块和请求发送模块。
In the above solution, the apparatus may further include: a connection establishment module and a request transmission module.
其中,连接建立模块用于在监听即时通信软件应用中发布的消息之前,在获取与所述即时通信软件应用对应的服务器的访问权限后,与所述服务器建立连接;请求发送模块用于向所述服务器发送对所述即时通信软件应用中的群组账号或个人用户账号的加入请求;所述消息监听模块410具体用于:在接收到所述服务器返回的同意加入的响应消息后,监听加入的群组中的用户或加入的个人用户发布的消息。The connection establishing module is configured to establish a connection with the server after obtaining the access right of the server corresponding to the instant messaging software application before listening to the message published in the instant messaging software application; and requesting the sending module to use the The server sends a request to join the group account or the personal user account in the instant messaging software application; the message monitoring module 410 is specifically configured to: after receiving the response message returned by the server and agree to join, Messages posted by users in the group or joined by individual users.
在上述方案中,所述装置还可以包括特征识别词典建立模块,用于接收人工配置的特征识别词典中的关键词;或者,In the above solution, the device may further include a feature recognition dictionary establishing module, configured to receive a keyword in the manually configured feature recognition dictionary; or
用于在所述即时通信软件的聊天历史记录中查找人工收录的典型语句,根据该典型语句的上下文共现关系,挖掘出表达相应特征的关键词并添加在特征识别词典中。And searching for a typical sentence of the manual inclusion in the chat history of the instant messaging software, and excerpting the keyword expressing the corresponding feature according to the context co-occurrence relationship of the typical sentence and adding it to the feature recognition dictionary.
在上述方案中,所述装置还可以包括:第一类别确定模块、或第二类别确定模块、或第三类别确定模块。In the above solution, the apparatus may further include: a first category determining module, or a second category determining module, or a third category determining module.
其中,第一类别确定模块用于在抓取匹配成功的消息内容,或者抓取匹配成功的消息内容和所述匹配成功的消息内容的相关内容作为特征描述信息之后、将所述特征描述信息进行保存之前,将所述特征描述信息与预先建立的类别识别词典中的关键词进行匹配,根据匹配结果确定所述特征描述信息对应的类别;第二类别确定模块用于在抓取匹配成功的消息内容,或者抓取匹配成功的消息内容和所述匹配成功的消息内容的相关内容作为特征描述信息之后、将所述特征描述信息进行保存之前,通过自然语言处理(NLP)模型确定所述特征描述信息对应的类别;第三类别确定模块用于在抓取匹配成功的消息内容,或者抓取匹配成功的消息内容和所述匹配成功的消息内容的相关内容作为特征描述信息之后、将所述特征描述信息进行保存之前,采用预先根据已标注类别信息的特征描述文本训练出的概率模型确定所述特征描述信息对应的类别;所述特征描述信息处理模块440具体可用于:将确定的类别与所述特征描述信息进行关联保存。The first category determining module is configured to: after capturing the successfully matched message content, or after capturing the successfully matched message content and the related content of the successfully matched message content as the feature description information, performing the feature description information. Before the saving, the feature description information is matched with the keywords in the pre-established category recognition dictionary, and the category corresponding to the feature description information is determined according to the matching result; the second category determining module is configured to capture the message that the matching is successful. Determining the feature description by a natural language processing (NLP) model after the content, or the content of the successfully matched message content and the content of the matching successful message content are captured as feature description information, and the feature description information is saved. a category corresponding to the information; the third category determining module is configured to: after the content of the message that matches the success is captured, or the content of the message that matches the successful match and the related content of the message content that is successfully matched are used as the feature description information, the feature is Before the description is saved, the pre-according is used Feature description text information is not trained to determine the probability model corresponding to the category feature descriptor; the feature of the information processing module 440 may be specifically configured to: determine a category of the features described in the associated information stored.
其中,所述第二类别确定模块具体可用于:采用语义相似度算法模型和/或点击相似度算法模型,确定所述的特征描述信息对应的类别。
The second category determining module is specifically configured to: determine a category corresponding to the feature description information by using a semantic similarity algorithm model and/or a click similarity algorithm model.
进一步地,所述装置还可以包括:接收方信息确定模块和特征描述信息发送模块。Further, the apparatus may further include: a receiver information determining module and a feature description information sending module.
其中,接收方信息确定模块用于在确定所述特征描述信息对应的类别之后,根据所述类别确定所述特征描述信息的接收方的信息;特征描述信息发送模块用于根据所述接收方的信息将所述特征描述信息发送给所述接收方。The receiver information determining module is configured to determine information of the receiver of the feature description information according to the category after determining the category corresponding to the feature description information; and the feature description information sending module is configured to be used according to the receiver The information transmits the feature description information to the recipient.
其中,所述接收方的信息可以为设定网站的地址、设定接收用户的短信号码、邮箱地址或设定接收用户的即时通信软件账号。The information of the receiving party may be an address of setting a website, setting a short message number of the receiving user, an email address, or setting an instant messaging software account of the receiving user.
所述匹配成功的消息内容的相关内容可以包括:所述匹配成功的消息内容的上下文消息;和/或,在与发布所述消息内容的用户建立会话并向所述用户发送消息内容补充请求后,所述用户返回的补充内容。The related content of the successfully matched message content may include: the context message matching the successful message content; and/or, after establishing a session with the user who issues the message content and sending a message content supplement request to the user , the supplemental content returned by the user.
在上述方案中,所述特征识别词典中的关键词可以包含反映产品缺陷的关键词,相应地,所述特征描述信息可以为描述产品缺陷的信息。In the above solution, the keyword in the feature recognition dictionary may include a keyword reflecting a defect of the product, and correspondingly, the feature description information may be information describing a defect of the product.
本发明实施例提供的信息挖掘装置可执行本发明任意实施例所提供的信息挖掘方法,具备执行方法相应的功能模块和有益效果。The information mining device provided by the embodiment of the present invention can execute the information mining method provided by any embodiment of the present invention, and has the corresponding functional modules and beneficial effects of the execution method.
实施例五Embodiment 5
本实施例提供一种非易失性计算机存储介质,所述计算机存储介质存储有一个或者多个模块,当所述一个或者多个模块被一个执行信息挖掘方法的设备执行时,使得所述设备执行如下操作:The embodiment provides a non-volatile computer storage medium storing one or more modules, when the one or more modules are executed by a device performing an information mining method, causing the device Do the following:
监听即时通信软件应用中发布的消息;Listening for messages posted in instant messaging software applications;
对监听到的消息进行解析,得到消息内容;Parsing the intercepted message to get the message content;
将所述消息内容与预先建立的特征识别词典中的关键词进行匹配;Matching the message content with keywords in a pre-established feature recognition dictionary;
在匹配成功时,抓取匹配成功的消息内容,或者抓取匹配成功的消息内容和所述匹配成功的消息内容的相关内容作为特征描述信息,并将所述特征描述信息进行保存。When the matching is successful, the content of the message that matches the success is captured, or the content of the message that matches the success and the related content of the message content that is successfully matched are captured as the feature description information, and the feature description information is saved.
上述存储介质中存储的模块被所述设备执行时,在监听即时通信软件应用中发布的消息之前,还可包括:When the module stored in the foregoing storage medium is executed by the device, before monitoring the message published in the instant messaging software application, the method may further include:
在获取与所述即时通信软件应用对应的服务器的访问权限后,与所述服务
器建立连接;After obtaining the access right of the server corresponding to the instant messaging software application, and the service
Establish a connection;
向所述服务器发送对所述即时通信软件应用中的群组账号或个人用户账号的加入请求;Sending, to the server, a join request for a group account or a personal user account in the instant messaging software application;
所述监听即时通信软件应用中发布的消息,可具体包括:The message that is sent in the monitoring instant messaging software application may specifically include:
在接收到所述服务器返回的同意加入的响应消息后,监听加入的群组中的用户或加入的个人用户发布的消息。After receiving the response message of the consent to join returned by the server, the user in the joined group or the message posted by the joined individual user is monitored.
上述存储介质中存储的模块被所述设备执行时,建立所述特征识别词典,可具体包括:When the module stored in the foregoing storage medium is executed by the device, the feature recognition dictionary may be established, which may specifically include:
接收人工配置的特征识别词典中的关键词;或者,Receiving keywords in a manually configured feature recognition dictionary; or,
在所述即时通信软件的聊天历史记录中查找人工收录的典型语句,根据该典型语句的上下文共现关系,挖掘出表达相应特征的关键词并添加在特征识别词典中。A typical sentence of the manual inclusion is searched in the chat history of the instant messaging software, and keywords corresponding to the corresponding features are mined and added in the feature recognition dictionary according to the context co-occurrence relationship of the typical sentence.
上述存储介质中存储的模块被所述设备执行时,在抓取匹配成功的消息内容,或者抓取匹配成功的消息内容和所述匹配成功的消息内容的相关内容作为特征描述信息之后、将所述特征描述信息进行保存之前,还可包括:When the module stored in the storage medium is executed by the device, after capturing the content of the message that is successfully matched, or capturing the content of the message that is successfully matched and the content of the message that is successfully matched as the feature description information, Before the feature description information is saved, the method may further include:
将所述特征描述信息与预先建立的类别识别词典中的关键词进行匹配,根据匹配结果确定所述特征描述信息对应的类别;或,通过自然语言处理NLP模型确定所述特征描述信息对应的类别;或,采用预先根据已标注类别信息的特征描述文本训练出的概率模型确定所述特征描述信息对应的类别;Matching the feature description information with a keyword in a pre-established category recognition dictionary, and determining a category corresponding to the feature description information according to the matching result; or determining a category corresponding to the feature description information by using a natural language processing NLP model Or determining a category corresponding to the feature description information by using a probability model trained in advance according to the feature description text of the tagged category information;
将所述特征描述信息进行保存可包括:将确定的类别与所述特征描述信息进行关联保存。Saving the feature description information may include: associating the determined category with the feature description information.
上述存储介质中存储的模块被所述设备执行时,通过自然语言处理NLP模型确定所述特征描述信息对应的类别,可具体包括:When the module stored in the foregoing storage medium is executed by the device, determining a category corresponding to the feature description information by using a natural language processing NLP model may specifically include:
采用语义相似度算法模型和/或点击相似度算法模型,确定所述的特征描述信息对应的类别。The semantic similarity algorithm model and/or the click similarity algorithm model are used to determine the category corresponding to the feature description information.
上述存储介质中存储的模块被所述设备执行时,在确定所述特征描述信息对应的类别之后,还可包括:When the module stored in the foregoing storage medium is executed by the device, after determining the category corresponding to the feature description information, the method may further include:
根据所述类别确定所述特征描述信息的接收方的信息;
Determining information of a recipient of the feature description information according to the category;
根据所述接收方的信息将所述特征描述信息发送给所述接收方。And transmitting the feature description information to the receiver according to the information of the receiver.
上述存储介质中存储的模块被所述设备执行时,所述接收方的信息可以为设定网站的地址、设定接收用户的短信号码、邮箱地址或设定接收用户的即时通信软件账号。When the module stored in the storage medium is executed by the device, the information of the receiver may be an address of setting a website, setting a short message number of the receiving user, an email address, or setting an instant messaging software account of the receiving user.
上述存储介质中存储的模块被所述设备执行时,所述匹配成功的消息内容的相关内容可以包括:所述匹配成功的消息内容的上下文消息;和/或,在与发布所述消息内容的用户建立会话并向所述用户发送消息内容补充请求后,所述用户返回的补充内容。When the module stored in the storage medium is executed by the device, the related content of the message content that is successfully matched may include: the context message that matches the successful message content; and/or, in the content of the message content The supplemental content returned by the user after the user establishes a session and sends a message content supplement request to the user.
上述存储介质中存储的模块被所述设备执行时,所述特征识别词典中的关键词可包含反映产品缺陷的关键词,所述特征描述信息为描述产品缺陷的信息。When the module stored in the storage medium is executed by the device, the keyword in the feature recognition dictionary may include a keyword reflecting a defect of the product, and the feature description information is information describing a defect of the product.
实施例六Embodiment 6
请参阅图5,为本发明实施例六提供的一种执行信息挖掘方法的设备的硬件结构示意图。FIG. 5 is a schematic structural diagram of hardware of an apparatus for performing an information mining method according to Embodiment 6 of the present invention.
该设备包括:The device includes:
一个或者多个处理器510,图5中以一个处理器510为例;One or more processors 510, one processor 510 is taken as an example in FIG. 5;
存储器520;以及一个或者多个模块。 Memory 520; and one or more modules.
所述设备还可以包括:输入装置530和输出装置540。所述设备中的处理器510、存储器520、输入装置530和输出装置540可以通过总线或其他方式连接,图5中以通过总线连接为例。The device may also include an input device 530 and an output device 540. The processor 510, the memory 520, the input device 530, and the output device 540 in the device may be connected by a bus or other means, and the bus connection is taken as an example in FIG.
存储器520作为一种计算机可读存储介质,可用于存储软件程序、计算机可执行程序以及模块,如本发明实施例中的信息挖掘方法对应的程序指令/模块(例如,附图4所示的信息挖掘装置中的消息监听模块410,消息解析模块420、匹配模块430和特征描述信息处理模块440)。处理器510通过运行存储在存储器520中的软件程序、指令以及模块,从而执行服务器的各种功能应用以及数据处理,即实现上述方法实施例中的信息挖掘方法。The memory 520 is used as a computer readable storage medium for storing software programs, computer executable programs, and modules, such as the program instructions/modules corresponding to the information mining method in the embodiment of the present invention (for example, the information shown in FIG. 4). The message monitoring module 410 in the mining device, the message parsing module 420, the matching module 430, and the feature description information processing module 440). The processor 510 executes various functional applications and data processing of the server by executing software programs, instructions, and modules stored in the memory 520, that is, implementing the information mining method in the foregoing method embodiments.
存储器520可包括存储程序区和存储数据区,其中,存储程序区可存储操
作系统、至少一个功能所需的应用程序;存储数据区可存储根据终端设备的使用所创建的数据等。此外,存储器520可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他非易失性固态存储器件。在一些实例中,存储器520可进一步包括相对于处理器510远程设置的存储器,这些远程存储器可以通过网络连接至终端设备。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。The memory 520 can include a storage program area and a storage data area, wherein the storage program area can store operations
The system, at least one function required application; the storage data area can store data created according to the use of the terminal device, and the like. Further, the memory 520 may include a high speed random access memory, and may also include a nonvolatile memory such as at least one magnetic disk storage device, flash memory device, or other nonvolatile solid state storage device. In some examples, memory 520 can further include memory remotely located relative to processor 510, which can be connected to the terminal device over a network. Examples of such networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
输入装置530可用于接收输入的数字或字符信息,以及产生与终端的用户设置以及功能控制有关的键信号输入。输出装置540可包括显示屏等显示设备。 Input device 530 can be used to receive input digital or character information and to generate key signal inputs related to user settings and function control of the terminal. The output device 540 can include a display device such as a display screen.
所述一个或者多个模块存储在所述存储器520中,当被所述一个或者多个处理器510执行时,执行如下操作:The one or more modules are stored in the memory 520, and when executed by the one or more processors 510, perform the following operations:
监听即时通信软件应用中发布的消息;Listening for messages posted in instant messaging software applications;
对监听到的消息进行解析,得到消息内容;Parsing the intercepted message to get the message content;
将所述消息内容与预先建立的特征识别词典中的关键词进行匹配;Matching the message content with keywords in a pre-established feature recognition dictionary;
在匹配成功时,抓取匹配成功的消息内容,或者抓取匹配成功的消息内容和所述匹配成功的消息内容的相关内容作为特征描述信息,并将所述特征描述信息进行保存。When the matching is successful, the content of the message that matches the success is captured, or the content of the message that matches the success and the related content of the message content that is successfully matched are captured as the feature description information, and the feature description information is saved.
进一步地,在监听即时通信软件应用中发布的消息之前,还可包括:Further, before listening to the message published in the instant messaging software application, the method may further include:
在获取与所述即时通信软件应用对应的服务器的访问权限后,与所述服务器建立连接;After obtaining access rights of the server corresponding to the instant messaging software application, establishing a connection with the server;
向所述服务器发送对所述即时通信软件应用中的群组账号或个人用户账号的加入请求;Sending, to the server, a join request for a group account or a personal user account in the instant messaging software application;
所述监听即时通信软件应用中发布的消息,具体包括:The message that is sent in the monitoring instant messaging software application specifically includes:
在接收到所述服务器返回的同意加入的响应消息后,监听加入的群组中的用户或加入的个人用户发布的消息。After receiving the response message of the consent to join returned by the server, the user in the joined group or the message posted by the joined individual user is monitored.
进一步地,建立所述特征识别词典,可具体包括:Further, the establishing the feature recognition dictionary may specifically include:
接收人工配置的特征识别词典中的关键词;或者,Receiving keywords in a manually configured feature recognition dictionary; or,
在所述即时通信软件的聊天历史记录中查找人工收录的典型语句,根据该典型语句的上下文共现关系,挖掘出表达相应特征的关键词并添加在特征识别
词典中。Finding a typical sentence of the manual inclusion in the chat history record of the instant communication software, and excavating a keyword expressing the corresponding feature according to the context co-occurrence relationship of the typical sentence and adding the feature recognition
In the dictionary.
进一步地,在抓取匹配成功的消息内容,或者抓取匹配成功的消息内容和所述匹配成功的消息内容的相关内容作为特征描述信息之后、将所述特征描述信息进行保存之前,还可包括:Further, after the matching of the successfully matched message content or the matching of the successfully matched message content and the related content of the successfully matched message content as the feature description information, before the feature description information is saved, the method may further include :
将所述特征描述信息与预先建立的类别识别词典中的关键词进行匹配,根据匹配结果确定所述特征描述信息对应的类别;或,通过自然语言处理NLP模型确定所述特征描述信息对应的类别;或,采用预先根据已标注类别信息的特征描述文本训练出的概率模型确定所述特征描述信息对应的类别;Matching the feature description information with a keyword in a pre-established category recognition dictionary, and determining a category corresponding to the feature description information according to the matching result; or determining a category corresponding to the feature description information by using a natural language processing NLP model Or determining a category corresponding to the feature description information by using a probability model trained in advance according to the feature description text of the tagged category information;
将所述特征描述信息进行保存可包括:将确定的类别与所述特征描述信息进行关联保存。Saving the feature description information may include: associating the determined category with the feature description information.
进一步地,通过自然语言处理NLP模型确定所述特征描述信息对应的类别,可具体包括:Further, determining the category corresponding to the feature description information by using the natural language processing NLP model may specifically include:
采用语义相似度算法模型和/或点击相似度算法模型,确定所述的特征描述信息对应的类别。The semantic similarity algorithm model and/or the click similarity algorithm model are used to determine the category corresponding to the feature description information.
进一步地,在确定所述特征描述信息对应的类别之后,还可包括:Further, after determining the category corresponding to the feature description information, the method further includes:
根据所述类别确定所述特征描述信息的接收方的信息;Determining information of a recipient of the feature description information according to the category;
根据所述接收方的信息将所述特征描述信息发送给所述接收方。And transmitting the feature description information to the receiver according to the information of the receiver.
进一步地,所述接收方的信息可以为设定网站的地址、设定接收用户的短信号码、邮箱地址或设定接收用户的即时通信软件账号。Further, the information of the receiving party may be an address of setting a website, setting a short message number of the receiving user, an email address, or setting an instant messaging software account of the receiving user.
进一步地,所述匹配成功的消息内容的相关内容可以包括:所述匹配成功的消息内容的上下文消息;和/或,在与发布所述消息内容的用户建立会话并向所述用户发送消息内容补充请求后,所述用户返回的补充内容。Further, the related content of the successfully matched message content may include: the context message matching the successful message content; and/or establishing a session with the user who issues the message content and sending the message content to the user The supplementary content returned by the user after the request is supplemented.
进一步地,所述特征识别词典中的关键词可包含反映产品缺陷的关键词,所述特征描述信息为描述产品缺陷的信息。Further, the keywords in the feature recognition dictionary may include keywords reflecting product defects, and the feature description information is information describing product defects.
通过以上关于实施方式的描述,所属领域的技术人员可以清楚地了解到,本发明可借助软件及必需的通用硬件来实现,当然也可以通过硬件实现,但很多情况下前者是更佳的实施方式。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机
软件产品可以存储在计算机可读存储介质中,如计算机的软盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、闪存(FLASH)、硬盘或光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述的方法。Through the above description of the embodiments, those skilled in the art can clearly understand that the present invention can be implemented by software and necessary general hardware, and can also be implemented by hardware, but in many cases, the former is a better implementation. . Based on such understanding, the technical solution of the present invention, which is essential or contributes to the prior art, can be embodied in the form of a software product.
The software product can be stored in a computer readable storage medium, such as a computer floppy disk, read-only memory (ROM), random access memory (RAM), flash memory (FLASH), hard disk or optical disk, etc. A number of instructions are included to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform the methods described in various embodiments of the present invention.
值得注意的是,上述信息挖掘装置的实施例中,所包括的各个单元和模块只是按照功能逻辑进行划分的,但并不局限于上述的划分,只要能够实现相应的功能即可;另外,各功能单元的具体名称也只是为了便于相互区分,并不用于限制本发明的保护范围。It should be noted that, in the embodiment of the information mining apparatus, the units and modules included in the information mining apparatus are only divided according to functional logic, but are not limited to the foregoing division, as long as the corresponding functions can be implemented; The specific names of the functional units are also for convenience of distinguishing from each other and are not intended to limit the scope of the present invention.
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明披露的技术范围内,可轻易想到的变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应以所述权利要求的保护范围为准。
The above is only a specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily think of changes or substitutions within the technical scope of the present disclosure. All should be covered by the scope of the present invention. Therefore, the scope of the invention should be determined by the scope of the appended claims.
Claims (19)
- 一种信息挖掘方法,其特征在于,包括:An information mining method, comprising:监听即时通信软件应用中发布的消息;Listening for messages posted in instant messaging software applications;对监听到的消息进行解析,得到消息内容;Parsing the intercepted message to get the message content;将所述消息内容与预先建立的特征识别词典中的关键词进行匹配;Matching the message content with keywords in a pre-established feature recognition dictionary;在匹配成功时,抓取匹配成功的消息内容,或者抓取匹配成功的消息内容和所述匹配成功的消息内容的相关内容作为特征描述信息,并将所述特征描述信息进行保存。When the matching is successful, the content of the message that matches the success is captured, or the content of the message that matches the success and the related content of the message content that is successfully matched are captured as the feature description information, and the feature description information is saved.
- 如权利要求1所述的方法,其特征在于,在监听即时通信软件应用中发布的消息之前,还包括:The method of claim 1 further comprising: before listening to the message published in the instant messaging software application, further comprising:在获取与所述即时通信软件应用对应的服务器的访问权限后,与所述服务器建立连接;After obtaining access rights of the server corresponding to the instant messaging software application, establishing a connection with the server;向所述服务器发送对所述即时通信软件应用中的群组账号或个人用户账号的加入请求;Sending, to the server, a join request for a group account or a personal user account in the instant messaging software application;所述监听即时通信软件应用中发布的消息,具体包括:The message that is sent in the monitoring instant messaging software application specifically includes:在接收到所述服务器返回的同意加入的响应消息后,监听加入的群组中的用户或加入的个人用户发布的消息。After receiving the response message of the consent to join returned by the server, the user in the joined group or the message posted by the joined individual user is monitored.
- 如权利要求1或2所述的方法,其特征在于,建立所述特征识别词典,具体包括:The method according to claim 1 or 2, wherein the establishing the feature recognition dictionary comprises:接收人工配置的特征识别词典中的关键词;或者,Receiving keywords in a manually configured feature recognition dictionary; or,在所述即时通信软件的聊天历史记录中查找人工收录的典型语句,根据该典型语句的上下文共现关系,挖掘出表达相应特征的关键词并添加在特征识别词典中。A typical sentence of the manual inclusion is searched in the chat history of the instant messaging software, and keywords corresponding to the corresponding features are mined and added in the feature recognition dictionary according to the context co-occurrence relationship of the typical sentence.
- 如权利要求1-3任一所述的方法,其特征在于,在抓取匹配成功的消息内容,或者抓取匹配成功的消息内容和所述匹配成功的消息内容的相关内容作为特征描述信息之后、将所述特征描述信息进行保存之前,还包括:The method according to any one of claims 1 to 3, characterized in that, after the content of the message matching the success is captured, or the content of the message matching the success and the content of the message content of the matching success are captured as the feature description information Before saving the feature description information, the method further includes:将所述特征描述信息与预先建立的类别识别词典中的关键词进行匹配,根据匹配结果确定所述特征描述信息对应的类别;或,通过自然语言处理NLP模型确定所述特征描述信息对应的类别;或,采用预先根据已标注类别信息的特 征描述文本训练出的概率模型确定所述特征描述信息对应的类别;Matching the feature description information with a keyword in a pre-established category recognition dictionary, and determining a category corresponding to the feature description information according to the matching result; or determining a category corresponding to the feature description information by using a natural language processing NLP model Or, using a pre-specified information based on the category information The probability model trained by the characterization text determines a category corresponding to the characterization information;将所述特征描述信息进行保存包括:将确定的类别与所述特征描述信息进行关联保存。Saving the feature description information includes: associating the determined category with the feature description information.
- 如权利要求4所述的方法,其特征在于,通过自然语言处理NLP模型确定所述特征描述信息对应的类别,具体包括:The method according to claim 4, wherein the determining the category corresponding to the feature description information by the natural language processing NLP model comprises:采用语义相似度算法模型和/或点击相似度算法模型,确定所述的特征描述信息对应的类别。The semantic similarity algorithm model and/or the click similarity algorithm model are used to determine the category corresponding to the feature description information.
- 如权利要求4或5所述的方法,其特征在于,在确定所述特征描述信息对应的类别之后,还包括:The method according to claim 4 or 5, further comprising: after determining the category corresponding to the feature description information, further comprising:根据所述类别确定所述特征描述信息的接收方的信息;Determining information of a recipient of the feature description information according to the category;根据所述接收方的信息将所述特征描述信息发送给所述接收方。And transmitting the feature description information to the receiver according to the information of the receiver.
- 如权利要求6所述的方法,其特征在于,所述接收方的信息为设定网站的地址、设定接收用户的短信号码、邮箱地址或设定接收用户的即时通信软件账号。The method according to claim 6, wherein the information of the receiving party is an address of setting a website, setting a short message number of the receiving user, an email address, or setting an instant messaging software account of the receiving user.
- 如权利要求1-7任一所述的方法,其特征在于,所述匹配成功的消息内容的相关内容包括:所述匹配成功的消息内容的上下文消息;和/或,在与发布所述消息内容的用户建立会话并向所述用户发送消息内容补充请求后,所述用户返回的补充内容。The method according to any one of claims 1 to 7, wherein the related content of the successfully matched message content comprises: the context message matching the successful message content; and/or, in publishing the message The supplemental content returned by the user after the user of the content establishes a session and sends a message content supplement request to the user.
- 如权利要求1-8中任一所述的方法,其特征在于,所述特征识别词典中的关键词包含反映产品缺陷的关键词,所述特征描述信息为描述产品缺陷的信息。The method according to any one of claims 1 to 8, wherein the keywords in the feature recognition dictionary contain keywords reflecting product defects, and the feature description information is information describing product defects.
- 一种信息挖掘装置,其特征在于,包括:An information mining device, comprising:消息监听模块,用于监听即时通信软件应用中发布的消息;a message monitoring module for monitoring messages published in an instant messaging software application;消息解析模块,用于对监听到的消息进行解析,得到消息内容;a message parsing module, configured to parse the intercepted message to obtain a message content;匹配模块,用于将所述消息内容与预先建立的特征识别词典中的关键词进行匹配;a matching module, configured to match the content of the message with a keyword in a pre-established feature recognition dictionary;特征描述信息处理模块,用于在匹配成功时,抓取匹配成功的消息内容,或者抓取匹配成功的消息内容和所述匹配成功的消息内容的相关内容作为特征 描述信息,并将所述特征描述信息进行保存。a feature description information processing module, configured to: when the matching is successful, capture the content of the message that matches the success, or capture the content of the message that is successfully matched and the related content of the message content that is successfully matched as a feature Describe the information and save the feature description information.
- 如权利要求10所述的装置,其特征在于,所述装置还包括:The device of claim 10, wherein the device further comprises:连接建立模块,用于在监听即时通信软件应用中发布的消息之前,在获取与所述即时通信软件应用对应的服务器的访问权限后,与所述服务器建立连接;a connection establishing module, configured to establish a connection with the server after acquiring the access right of the server corresponding to the instant messaging software application before listening to the message published in the instant messaging software application;请求发送模块,用于向所述服务器发送对所述即时通信软件应用中的群组账号或个人用户账号的加入请求;a request sending module, configured to send, to the server, a join request for a group account or a personal user account in the instant messaging software application;所述消息监听模块具体用于:在接收到所述服务器返回的同意加入的响应消息后,监听加入的群组中的用户或加入的个人用户发布的消息。The message monitoring module is specifically configured to: after receiving the response message that the server returns to join, to listen to the message posted by the user in the joined group or the joined individual user.
- 如权利要求10或11所述的装置,其特征在于,所述装置还包括特征识别词典建立模块,用于接收人工配置的特征识别词典中的关键词;或者,The device according to claim 10 or 11, wherein the device further comprises a feature recognition dictionary establishing module, configured to receive keywords in the manually configured feature recognition dictionary; or用于在所述即时通信软件的聊天历史记录中查找人工收录的典型语句,根据该典型语句的上下文共现关系,挖掘出表达相应特征的关键词并添加在特征识别词典中。And searching for a typical sentence of the manual inclusion in the chat history of the instant messaging software, and excerpting the keyword expressing the corresponding feature according to the context co-occurrence relationship of the typical sentence and adding it to the feature recognition dictionary.
- 如权利要求10-12任一所述的装置,其特征在于,所述装置还包括:The device of any of claims 10-12, wherein the device further comprises:第一类别确定模块,用于在抓取匹配成功的消息内容,或者抓取匹配成功的消息内容和所述匹配成功的消息内容的相关内容作为特征描述信息之后、将所述特征描述信息进行保存之前,将所述特征描述信息与预先建立的类别识别词典中的关键词进行匹配,根据匹配结果确定所述特征描述信息对应的类别;或a first category determining module, configured to save the feature description information after capturing the successfully matched message content, or after capturing the successfully matched message content and the related content of the successfully matched message content as the feature description information Having previously matched the feature description information with a keyword in a pre-established category recognition dictionary, and determining a category corresponding to the feature description information according to the matching result; or第二类别确定模块,用于在抓取匹配成功的消息内容,或者抓取匹配成功的消息内容和所述匹配成功的消息内容的相关内容作为特征描述信息之后、将所述特征描述信息进行保存之前,通过自然语言处理NLP模型确定所述特征描述信息对应的类别;或a second category determining module, configured to save the feature description information after the content of the matching successful message content or the content of the successfully matched message content and the content of the successfully matched message content are captured as the feature description information Previously, determining a category corresponding to the feature description information by a natural language processing NLP model; or第三类别确定模块,用于在抓取匹配成功的消息内容,或者抓取匹配成功的消息内容和所述匹配成功的消息内容的相关内容作为特征描述信息之后、将所述特征描述信息进行保存之前,采用预先根据已标注类别信息的特征描述文本训练出的概率模型确定所述特征描述信息对应的类别; a third category determining module, configured to save the feature description information after the content of the successfully matched message content is captured, or the content of the successfully matched message content and the content of the successfully matched message content are captured as the feature description information Previously, determining a category corresponding to the feature description information by using a probability model trained in advance according to the feature description text of the tagged category information;所述特征描述信息处理模块具体用于:将确定的类别与所述特征描述信息进行关联保存。The feature description information processing module is specifically configured to: associate the determined category with the feature description information.
- 如权利要求13所述的装置,其特征在于,所述第二类别确定模块具体用于:采用语义相似度算法模型和/或点击相似度算法模型,确定所述的特征描述信息对应的类别。The apparatus according to claim 13, wherein the second category determining module is specifically configured to: determine a category corresponding to the feature description information by using a semantic similarity algorithm model and/or a click similarity algorithm model.
- 如权利要求13或14所述的装置,其特征在于,所述装置还包括:The device according to claim 13 or 14, wherein the device further comprises:接收方信息确定模块,用于在确定所述特征描述信息对应的类别之后,根据所述类别确定所述特征描述信息的接收方的信息;a receiver information determining module, configured to determine information of a recipient of the feature description information according to the category after determining a category corresponding to the feature description information;特征描述信息发送模块,用于根据所述接收方的信息将所述特征描述信息发送给所述接收方。The feature description information sending module is configured to send the feature description information to the receiver according to the information of the receiver.
- 如权利要求15所述的装置,其特征在于,所述接收方的信息为设定网站的地址、设定接收用户的短信号码、邮箱地址或设定接收用户的即时通信软件账号。The device according to claim 15, wherein the information of the receiving party is an address of setting a website, setting a short message number of the receiving user, an email address, or setting an instant messaging software account of the receiving user.
- 如权利要求10-16任一所述的装置,其特征在于,所述匹配成功的消息内容的相关内容包括:所述匹配成功的消息内容的上下文消息;和/或,在与发布所述消息内容的用户建立会话并向所述用户发送消息内容补充请求后,所述用户返回的补充内容。The device according to any one of claims 10-16, wherein the related content of the successfully matched message content comprises: the context message matching the successful message content; and/or, in publishing the message The supplemental content returned by the user after the user of the content establishes a session and sends a message content supplement request to the user.
- 如权利要求10-17中任一所述的装置,其特征在于,所述特征识别词典中的关键词包含反映产品缺陷的关键词,所述特征描述信息为描述产品缺陷的信息。The apparatus according to any one of claims 10-17, wherein the keyword in the feature recognition dictionary contains a keyword reflecting a defect of the product, and the feature description information is information describing a defect of the product.
- 一种非易失性计算机存储介质,所述计算机存储介质存储有一个或者多个模块,其特征在于,当所述一个或者多个模块被一个执行信息挖掘方法的设备执行时,使得所述设备执行如下操作:A non-volatile computer storage medium storing one or more modules, wherein when the one or more modules are executed by a device performing an information mining method, the device is caused Do the following:监听即时通信软件应用中发布的消息;Listening for messages posted in instant messaging software applications;对监听到的消息进行解析,得到消息内容;Parsing the intercepted message to get the message content;将所述消息内容与预先建立的特征识别词典中的关键词进行匹配;Matching the message content with keywords in a pre-established feature recognition dictionary;在匹配成功时,抓取匹配成功的消息内容,或者抓取匹配成功的消息内容和所述匹配成功的消息内容的相关内容作为特征描述信息,并将所述特征描述 信息进行保存。 When the matching is successful, the content of the message that matches the success is captured, or the content of the message that matches the success and the related content of the message content that is successfully matched are captured as feature description information, and the feature description is described. The information is saved.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410710424.7 | 2014-11-27 | ||
CN201410710424.7A CN104346480B (en) | 2014-11-27 | 2014-11-27 | information mining method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2016082575A1 true WO2016082575A1 (en) | 2016-06-02 |
Family
ID=52502071
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2015/086095 WO2016082575A1 (en) | 2014-11-27 | 2015-08-05 | Information mining method and apparatus, and storage medium |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN104346480B (en) |
WO (1) | WO2016082575A1 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20210110403A1 (en) * | 2019-10-15 | 2021-04-15 | Microsoft Technology Licensing, Llc | Semantic sweeping of metadata enriched service data |
CN113051476A (en) * | 2021-03-25 | 2021-06-29 | 北京百度网讯科技有限公司 | Method and apparatus for message transmission |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104346480B (en) * | 2014-11-27 | 2018-06-26 | 百度在线网络技术(北京)有限公司 | information mining method and device |
CN105282012A (en) * | 2015-10-23 | 2016-01-27 | 广东小天才科技有限公司 | Method and system for strengthening information reminding during group chat |
CN106649404B (en) * | 2015-11-04 | 2019-12-27 | 陈包容 | Method and device for creating session scene database |
CN108345582B (en) * | 2017-01-23 | 2021-08-24 | 腾讯科技(深圳)有限公司 | Method and device for identifying social group engaged business |
CN107491493A (en) * | 2017-07-22 | 2017-12-19 | 长沙兔子代跑网络科技有限公司 | A kind of intelligence obtains the method and device for running chat record in generation |
CN107526779A (en) * | 2017-07-22 | 2017-12-29 | 长沙兔子代跑网络科技有限公司 | A kind of method and device for excavating generation race client |
CN109063029A (en) * | 2018-07-10 | 2018-12-21 | 苏奇 | A kind of information filing management method based on instant communication software |
CN109582719B (en) * | 2018-10-19 | 2021-08-24 | 国电南瑞科技股份有限公司 | Method and system for automatically linking SCD file of intelligent substation to virtual terminal |
CN113765767A (en) * | 2020-06-02 | 2021-12-07 | 上海回声网络科技有限公司 | Enterprise WeChat supervision method and system |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020133477A1 (en) * | 2001-03-05 | 2002-09-19 | Glenn Abel | Method for profile-based notice and broadcast of multimedia content |
CN1987852A (en) * | 2005-12-21 | 2007-06-27 | 腾讯科技(深圳)有限公司 | Method and device for determining communication object attribute according to news content |
CN102323933A (en) * | 2011-08-31 | 2012-01-18 | 张潇 | Information embedding and interaction system facing real-time communication and method |
CN103605690A (en) * | 2013-11-04 | 2014-02-26 | 北京奇虎科技有限公司 | Device and method for recognizing advertising messages in instant messaging |
CN104346480A (en) * | 2014-11-27 | 2015-02-11 | 百度在线网络技术(北京)有限公司 | Information mining method and device |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101166160B (en) * | 2006-10-20 | 2010-09-15 | 阿里巴巴集团控股有限公司 | A method and system for filtering instant communication rubbish information |
CN102419778B (en) * | 2012-01-09 | 2013-03-20 | 中国科学院软件研究所 | Information searching method for discovering and clustering sub-topics of query statement |
CN103577416B (en) * | 2012-07-20 | 2017-09-22 | 阿里巴巴集团控股有限公司 | Expanding query method and system |
CN102970210A (en) * | 2012-11-02 | 2013-03-13 | 北京百度网讯科技有限公司 | Method and device for reminding group messages in instant chat tool |
-
2014
- 2014-11-27 CN CN201410710424.7A patent/CN104346480B/en active Active
-
2015
- 2015-08-05 WO PCT/CN2015/086095 patent/WO2016082575A1/en active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020133477A1 (en) * | 2001-03-05 | 2002-09-19 | Glenn Abel | Method for profile-based notice and broadcast of multimedia content |
CN1987852A (en) * | 2005-12-21 | 2007-06-27 | 腾讯科技(深圳)有限公司 | Method and device for determining communication object attribute according to news content |
CN102323933A (en) * | 2011-08-31 | 2012-01-18 | 张潇 | Information embedding and interaction system facing real-time communication and method |
CN103605690A (en) * | 2013-11-04 | 2014-02-26 | 北京奇虎科技有限公司 | Device and method for recognizing advertising messages in instant messaging |
CN104346480A (en) * | 2014-11-27 | 2015-02-11 | 百度在线网络技术(北京)有限公司 | Information mining method and device |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20210110403A1 (en) * | 2019-10-15 | 2021-04-15 | Microsoft Technology Licensing, Llc | Semantic sweeping of metadata enriched service data |
US11587095B2 (en) * | 2019-10-15 | 2023-02-21 | Microsoft Technology Licensing, Llc | Semantic sweeping of metadata enriched service data |
CN113051476A (en) * | 2021-03-25 | 2021-06-29 | 北京百度网讯科技有限公司 | Method and apparatus for message transmission |
CN113051476B (en) * | 2021-03-25 | 2023-06-13 | 北京百度网讯科技有限公司 | Method and device for sending message |
Also Published As
Publication number | Publication date |
---|---|
CN104346480A (en) | 2015-02-11 |
CN104346480B (en) | 2018-06-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2016082575A1 (en) | Information mining method and apparatus, and storage medium | |
US10958598B2 (en) | Method and apparatus for generating candidate reply message | |
US9424354B2 (en) | Providing crowdsourced answers to information needs presented by search engine and social networking application users | |
US9405771B2 (en) | Associating metadata with images in a personal image collection | |
WO2018081833A1 (en) | State machine methods and apparatus executing natural language communications, and al agents monitoring status and triggering transitions | |
US20150154249A1 (en) | Data ingestion module for event detection and increased situational awareness | |
US10936695B2 (en) | Collaboration system to share tailored information with social networks | |
US11683283B2 (en) | Method for electronic messaging | |
US20200067860A1 (en) | File sending in instant messaging application | |
US20210406292A1 (en) | Recognizing polling questions from a conference call discussion | |
US11665010B2 (en) | Intelligent meeting recording using artificial intelligence algorithms | |
US20200220741A1 (en) | System and Method for Modeling an Asynchronous Communication Channel | |
WO2021114634A1 (en) | Text annotation method, device, and storage medium | |
US20190197103A1 (en) | Asynchronous speech act detection in text-based messages | |
US20230162057A1 (en) | Identify recipient(s) based on context and prompt/suggest sender to add identified recipient(s) before sending message | |
US9948694B2 (en) | Addressing application program interface format modifications to ensure client compatibility | |
KR102151322B1 (en) | Information push method and device | |
US20230325715A1 (en) | Systems and methods for self-training a communication document parser | |
US11533279B2 (en) | Method for electronic messaging using image based noisy content | |
CN107491530B (en) | Social relationship mining analysis method based on file automatic marking information | |
KR20210009885A (en) | Method, device and computer readable storage medium for automatically generating content regarding offline object | |
CN113852835A (en) | Live broadcast audio processing method and device, electronic equipment and storage medium | |
US10938752B2 (en) | Online forum automated feedback generator | |
Sidiropoulos et al. | Framework of a collaborative audio analysis and visualization tool for data journalists | |
US20240037157A1 (en) | Increasing security of a computer program using unstructured text |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 15862482 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 15862482 Country of ref document: EP Kind code of ref document: A1 |