CN114492432A - Cooperative enterprise identification method and device - Google Patents
Cooperative enterprise identification method and device Download PDFInfo
- Publication number
- CN114492432A CN114492432A CN202210099908.7A CN202210099908A CN114492432A CN 114492432 A CN114492432 A CN 114492432A CN 202210099908 A CN202210099908 A CN 202210099908A CN 114492432 A CN114492432 A CN 114492432A
- Authority
- CN
- China
- Prior art keywords
- data
- enterprise
- hot search
- hot
- public opinion
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 44
- 238000012549 training Methods 0.000 claims abstract description 34
- 230000008451 emotion Effects 0.000 claims description 20
- 238000004590 computer program Methods 0.000 claims description 15
- 239000013598 vector Substances 0.000 claims description 14
- 230000008901 benefit Effects 0.000 abstract description 11
- 238000013473 artificial intelligence Methods 0.000 abstract description 4
- 238000004422 calculation algorithm Methods 0.000 description 11
- 238000004458 analytical method Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 6
- 238000011156 evaluation Methods 0.000 description 6
- 230000010354 integration Effects 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 5
- 238000012545 processing Methods 0.000 description 5
- 238000013145 classification model Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 238000010801 machine learning Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000013461 design Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000008520 organization Effects 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 230000003442 weekly effect Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000001556 precipitation Methods 0.000 description 1
- 238000007637 random forest analysis Methods 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000001502 supplementing effect Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/194—Calculation of difference between files
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a method and a device for identifying a cooperative enterprise, belongs to the technical field of artificial intelligence, and can be applied to the technical field of finance or other technical fields. The cooperative enterprise identification method comprises the following steps: determining a hot search keyword set and a retrieval keyword set according to the matching result of the vocabulary of each hot search abstract and the hot search word bank; integrating the hot search abstracts according to the similarity among the keyword sets of the hot search abstracts to obtain public opinion data corresponding to the integrated hot search abstracts; retrieving the retrieval keyword set to obtain enterprise data, and inputting the public opinion data and the enterprise data into a prediction model established based on the public opinion training data, the enterprise training data and the prediction result data to obtain an enterprise cooperation prediction result; and identifying the cooperative enterprises according to the enterprise cooperation prediction result. The invention can grasp the cooperation opportunity in time and effectively improve the cooperation benefit.
Description
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a method and a device for identifying a cooperative enterprise.
Background
Under the intense market competition, as a product marketing strategy, banks often select some cooperative organizations to release joint-name products, such as theme credit cards or theme deposit slips, so as to improve the value of the products by means of the influence of both parties. At present, when a cooperative enterprise is screened, visual judgment of an administrator on the enterprise is often relied on, a data support and a comprehensive and comprehensive system automatic evaluation scheme are lacked, and the method is not timely for developing a cooperative opportunity with an external organization by means of public opinion hotspots.
Disclosure of Invention
The embodiment of the invention mainly aims to provide a method and a device for identifying a cooperative enterprise, timely grasping a cooperative contract and effectively improving the cooperative benefit.
In order to achieve the above object, an embodiment of the present invention provides a method for identifying a collaborative enterprise, including:
determining a hot search keyword set and a retrieval keyword set according to the matching result of the vocabulary of each hot search abstract and the hot search word bank;
integrating the hot search abstracts according to the similarity among the keyword sets of the hot search abstracts to obtain public opinion data corresponding to the integrated hot search abstracts;
retrieving the retrieval keyword set to obtain enterprise data, and inputting the public opinion data and the enterprise data into a prediction model established based on the public opinion training data, the enterprise training data and the prediction result data to obtain an enterprise cooperation prediction result;
and identifying the cooperative enterprises according to the enterprise cooperation prediction result.
The embodiment of the invention also provides a device for identifying the cooperative enterprise, which comprises:
the set determining module is used for determining a hot search keyword set and a search keyword set according to the matching result of the vocabulary of each hot search abstract and the hot search word library;
the public opinion data module is used for integrating the hot search abstracts according to the similarity among the keyword sets of the hot search abstracts to acquire the public opinion data corresponding to the integrated hot search abstracts;
the prediction module is used for retrieving the search keyword set to obtain enterprise data, inputting the public opinion data and the enterprise data into a prediction model established based on the public opinion training data, the enterprise training data and the prediction result data, and obtaining an enterprise cooperation prediction result;
and the identification module is used for identifying the cooperative enterprises according to the enterprise cooperation prediction result.
The embodiment of the invention also provides computer equipment which comprises a memory, a processor and a computer program stored on the memory and running on the processor, wherein the processor realizes the steps of the cooperative enterprise identification method when executing the computer program.
An embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the cooperative enterprise identification method.
The method and the device for identifying the cooperative enterprises in the embodiment of the invention firstly determine the hot search keyword set and the retrieval keyword set according to the matching result of the vocabulary of each hot search abstract and the hot search lexicon, then integrate each hot search abstract according to the similarity between the keyword sets to obtain the corresponding public opinion data, then obtain the enterprise data according to the retrieval keyword set and obtain the enterprise cooperative prediction result according to the public opinion data and the enterprise data to identify the cooperative enterprises, can grasp the cooperative opportunity in time and effectively improve the cooperative benefit.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained based on these drawings without creative efforts.
FIG. 1 is a flow chart of a collaborative enterprise identification method in an embodiment of the present invention;
FIG. 2 is a flow chart of a collaborative enterprise identification method in accordance with another embodiment of the present invention;
FIG. 3 is a diagram of the correspondence of hot search headlines, news headlines, and news comments;
fig. 4 is a flowchart of S101 in the embodiment of the present invention;
FIG. 5 is a flowchart of obtaining public opinion data according to an embodiment of the present invention;
fig. 6 is a block diagram showing the construction of a cooperative enterprise identification apparatus in the embodiment of the present invention;
fig. 7 is a block diagram of a computer device in the embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As will be appreciated by one skilled in the art, embodiments of the present invention may be embodied as a system, apparatus, device, method, or computer program product. Accordingly, the present disclosure may be embodied in the form of: entirely hardware, entirely software (including firmware, resident software, micro-code, etc.), or a combination of hardware and software.
In view of the fact that the prior art often depends on intuitive judgment of managers on enterprises when screening cooperative enterprises, lacks data support and a comprehensive and comprehensive system automatic evaluation scheme, and is not timely in developing a cooperative chance with an external organization by means of public opinion hotspots, embodiments of the present invention provide a method and an apparatus for identifying a cooperative enterprise, which can perform overall modeling on multiple dimensions such as public opinion dynamics of news topics, market, risks of news-related organizations, and a relation with a bank by means of an artificial intelligence technology based on hotspot news data, so as to evaluate the benefit of a joint product brought to the bank. The present invention will be described in detail below with reference to the accompanying drawings.
FIG. 1 is a flow chart of a method for identifying a collaborative enterprise according to an embodiment of the present invention. Fig. 2 is a flowchart of a method for identifying a collaborative enterprise according to another embodiment of the present invention. As shown in fig. 1-2, the method for identifying a collaborative enterprise includes:
s101: and determining a hot search keyword set and a retrieval keyword set according to the matching result of the vocabulary of each hot search abstract and the hot search word bank.
Before executing S101, the system periodically collects news hot search lists released by various internal and external big news information platforms through a preset collection channel, a collection strategy and a collection element by using a crawler technology, wherein the collection channel is a target news platform needing to be collected. The acquisition strategy is to acquire various preset parameters of news in advance by applying a crawler technology, and comprises the acquisition period, the acquisition mode (including a webpage, a public number, an App and the like), the acquisition path (which access path is used for acquiring the news list after entering a news platform) and the acquisition of the number of the hot news in the search list. The acquisition elements refer to contents to be acquired aiming at each hot news search list, and the acquisition elements comprise acquisition time, an acquisition platform, a hot search title, a news text, news comments, a news praise number, a news badly-evaluated number, hot search words and the like. The regular period refers to collecting the information according to a certain time period, and the collecting period can be daily, weekly, monthly and the like (the proposal is weekly).
Fig. 3 is a diagram of correspondence between hot-search titles, news titles, and news reviews. As shown in FIG. 3, one or more news headlines are often associated with the same hot-searched headline, and the content of the news is highly relevant to the hot-searched headlines; and the same hot search news often has a plurality of news comments. The hot search title, the news title and the news comment are in a tree-like relationship, and h and j in fig. 3 are positive integers.
For example, a news hot-search list is marked as "# official opening of a garden" and multiple news related to the hot-search list can be obtained after clicking the title, and the news is marked as "# official opening of a garden, a ticket is subjected to hot robbery", "three-mouth family plays one round, and the number of the tickets is light and loose to 1 ten thousand … …", "big-track weekend fire explosion … … without tickets", and the like. Through each news headline, the news text of the headline can be viewed, as well as news comments.
And next, summarizing the news headlines acquired in a time period according to the incidence relation between the news headlines and the hot search headlines. For example, all news headlines associated with the same hot-search headline are collected into the same text (i.e., different news headlines are connected end to end) as the news headlines used in the subsequent steps, i.e., the news headlines mentioned in the subsequent steps refer to the news headlines after collection.
Similarly, the praise counts and the bad comment counts of all news associated with the same hot search title are summarized (i.e., the praise counts of different news titles are summed up, and the bad comment counts of different news titles are summed up), i.e., the praise counts and the bad comment counts of news mentioned in the subsequent steps refer to the summarized numerical values.
It should be noted that, because each news comment may express different emotions and needs to be analyzed and identified separately, the present invention does not summarize the news comments. In addition, the subsequent steps need to use hot search titles, news praise numbers, news badly-evaluated numbers and news comments, and information acquired by the rest crawlers (such as the acquisition time, the acquisition platform, the news text and other contents) can be stored in the system as news details for manual inquiry.
In one embodiment, before executing S101, the method further includes:
generating original hot searching data according to the hot searching titles and the corresponding news titles; and inputting the original hot search data into a summary identification model created based on the historical summary data to obtain the hot search summary.
In specific implementation, each hot search title and the associated news title are summarized into the same text (that is, the hot search title and the news title are connected end to end), and the text is defined as the original hot search data. The method comprises the steps of firstly carrying out word segmentation processing on original hot search data through an open source word segmentation library, and then judging whether the original hot search data belongs to business news or not by means of a summary identification model established based on historical summary data. Whether business news belongs to the two-classification problem or not, machine learning supervised learning algorithms commonly used in the industry can be used for training classification models, such as a naive Bayes algorithm, an SVM algorithm, an LSTM algorithm and the like. A summary recognition model for judging whether the news is commercial news or not can be obtained through model training. After new original hot-searching data are obtained, whether each piece of original hot-searching data is commercial news or not can be identified through the classification model. And for news which does not belong to the commercial news, the news is not processed in the subsequent step, and the system only keeps the original hot search data corresponding to the commercial news as a hot search abstract to perform subsequent processing.
Fig. 4 is a flowchart of S101 in the embodiment of the present invention. As shown in fig. 4, S101 includes:
s201: and determining the adjusting parameters of the vocabularies according to the matching results of the vocabularies of the hot search abstracts and the hot search lexicon.
The hot search words refer to hot search words, and the system removes the duplication of the hot search words and then reserves the hot search words in a local hot search word bank for processing in subsequent steps. In specific implementation, the hot search word bank can be matched with each word in the hot search abstract, if the word is matched with the word in the hot search word bank, the hot search word bank in the hit is considered to adopt the adjustment parameter alpha, otherwise, the hot search word bank in the hit is not adopted.
S202: and determining frequency data of each vocabulary according to the adjusting parameters, the word frequency and the frequency index of each vocabulary.
The invention calculates the frequency data of each word in the hot search abstract by a TF-IDF word frequency analysis method, and the frequency data is higher if the contribution of a certain word to the specific theme of the article is larger. In specific implementation, the frequency data may be determined by the following formula:
wherein, TFIDFnewThe frequency data of the vocabulary, TF is the word frequency of the vocabulary, and IDF is the inverse text frequency index of the vocabulary; alpha and alpha-1All are adjusting parameters, when the vocabulary is matched with the words in the hot searching lexicon, the adjusting parameter is alpha, and when the vocabulary is not matched with the words in the hot searching lexicon, the adjusting parameter is alpha-1Generally, α.gtoreq.1.
S203: and determining a hot search keyword set and a retrieval keyword set according to the frequency data of each vocabulary.
In specific implementation, TFIDF in each hot search summary can be extractednewThe top N vocabularies are used as the keyword set of the hot search summary (N can be 20-50 in actual use); extracting TFIDF in each hot search summarynewThe top Np words are used as a search keyword set (Np)<N, Np have a small value, e.g., 3).
S102: and integrating the hot search abstracts according to the similarity among the keyword sets of the hot search abstracts to obtain public opinion data corresponding to the integrated hot search abstracts.
The integration of the hot search summaries comprises the integration of the hot search summaries in the same period and the integration of the hot search summaries in the current period and the historical period.
In one embodiment, integrating the hot search summaries according to the similarity between the keyword sets of the hot search summaries comprises:
converting the keyword set of each hot search abstract into a word vector; and when the similarity between the word vectors meets a preset similarity condition, integrating the corresponding hot search abstracts.
In specific implementation, the keywords in the keyword set can be converted into K-dimensional word vectors (K may be a power series of 2, such as 128 or 256, or may be other integers in actual use) through the open-source pre-trained word vector model, and the word vector summary value of each abstract is obtained through summarization. And calculating the cosine similarity of the sum values of the two hot search abstract word vectors, wherein the larger the cosine similarity value is, the more similar the two hot search abstract words are. A threshold M (M may take a value between 0.6 and 0.9 in actual use) may be set, and it is determined that the hot search digests greater than M are the contents of the same topic and the integration process is performed, whereas it is determined that the hot search digests less than or equal to M are the contents of different topics and no integration is performed. N, K, M, Np are all adjustable parameters, in practical application, the initial value can be set according to manual experience, and then the adjustment is carried out according to the effect of practical model training.
For example, "today's official opening of the circle! "," +: tourists run "##" ", and"% "log off 2.2 hundred million shares to move back and forth together with investors. "three titles are hot-searched titles obtained from different platform hot-searched titles, but the first two titles belong to hot-searched titles related to" + ", belong to adjacent hot-searched titles, and can be integrated. The third is obviously different from the first two, and needs to be used as a separate hot search title and is not integrated with the first two.
Fig. 5 is a flowchart of obtaining public opinion data according to an embodiment of the present invention. As shown in fig. 5, acquiring public opinion data corresponding to the integrated hot search summary includes:
s301: and determining public opinion index data according to the emotion index data of the comments corresponding to the integrated hot search abstract.
The public opinion index data comprises public opinion parameter data and public opinion fluctuation data.
In one embodiment, S301 includes:
determining public opinion parameter data according to emotion index data of comments corresponding to the integrated hot search abstract;
and determining public opinion fluctuation data according to the emotion index data and the public opinion parameter data.
In specific implementation, the emotion index data of each piece of news comment before integration needs to be analyzed first. The emotion index data comprises five types of '0-strong support, 1-support, 2-neutral, 3-spit groove and 4-strong spit groove', so that the problems of the emotion index data actually belong to multi-classification problems, and emotion analysis supervision learning algorithms commonly used in the industry can be used for training classification models, such as a text convolutional neural network model (TextCNN), a BERT algorithm, a naive Bayesian algorithm and the like. Each public opinion evaluation can be regarded as an entity, each public opinion evaluation is subjected to emotion classification through an emotion analysis supervision learning algorithm to obtain the emotion index of each comment, and the result of each public opinion evaluation can be identified.
After the result of each public opinion evaluation is obtained, summarizing and counting can be carried out according to the hot search titles to which the news comments belong, so that the number of 0-strong support, 1-support, 2-neutral, 3-spit grooves and 4-strong spit grooves evaluated by each hot search title is calculated and is respectively recorded as Ni (i is 0-4). The public opinion parameter data is the average value of the emotion index data, and the public opinion fluctuation data is the variance of the emotion index data.
When the hot search abstracts in the current period and the historical period are integrated, similarity analysis is carried out on the hot search abstracts in the current period and the hot search abstracts in the historical period, and if a result with higher similarity to the hot search abstracts in the current period is not found in the hot search abstracts in the historical period, the hot search abstracts in the current period are taken as an example. On the contrary, if the similarity between the hot search summary of the current period and the hot search summary in a certain historical period is found to be high (if the similarity between the hot search summary of the current period and a plurality of historical hot search summaries is high, the highest historical hot search summary is taken), the emotion index data of the hot search summary of the current period is used as the data of a certain statistical time period of the historical hot search summaries.
TABLE 1
Table 1 is a public opinion index time series data table in the first embodiment. As shown in table 1, the historical hot-search news is ". star today's business", and the public opinion parameter data and the public opinion fluctuation data of T week are 4.2 and 0.02, respectively. The hot news of the T +1 th week is ". x. festival attacking and thinking", and the public opinion parameter data and the public opinion fluctuation data are 4.0 and 0.01, respectively. If the similarity between the two is found to be high through the similarity analysis, a piece of time series data is finally generated.
TABLE 2
Table 2 is a public opinion index time series data table in the second embodiment. As shown in table 2, the historical hot-search news is ". star today's business", and the public opinion parameter data and the public opinion fluctuation data of T week are 4.2 and 0.02, respectively. The hot news search at week T +1 is "xx campaign reservation", and its public opinion parameter data and public opinion fluctuation data are 4.4 and 0.01, respectively. If the similarity of the two is low through similarity analysis, and the similarity of other hot-searching news and the similarity of the other hot-searching news and the mark of the current business are also low, two pieces of time sequence data are generated by the mark and the X, and 0 supplementing operation needs to be carried out on the mark of the T +1 week; the "xxx" in week T was subjected to a 0-complementing operation.
If the similarity between the current hot search abstract and a certain historical hot search abstract is higher, the word vectors of the historical hot search abstract need to be further updated except for the summary processing, and the updating formula is as follows:
the updated word vector of the historical hot search summary is (word vector of the historical hot search summary + word vector of the current hot search summary)/2.
S302: and determining the news approval rate according to the approval data of the news corresponding to the integrated hot search abstract.
The praise data comprises the news praise number and the news badness number. The news praise numbers are obtained by summing up news praise numbers associated with hot search titles to be integrated of different platforms. Similarly, the bad news scores of different platforms are summed up according to the same method. The news approval rate is news approval/(news approval + news badness).
S303: and acquiring public opinion platform data corresponding to the integrated hot search abstract.
The public opinion platform data is the number of hot search summaries entering different platforms.
S103: and retrieving the search keyword set to obtain enterprise data, and inputting the public opinion data and the enterprise data into a prediction model established based on the public opinion training data, the enterprise training data and the prediction effect data to obtain an enterprise cooperation prediction result.
The public opinion training data is time sequence data, and the enterprise training data and the prediction effect data are non-time sequence data (namely historical time point data), so that the time sequence data needs to be converted into the non-time sequence data, and new statistical indexes can be established according to different time periods, such as the highest values of the public opinion training data of about 1 month, the public opinion training data of about 3 months, the public opinion index data of about 3 months and the like.
TABLE 3
TABLE 4
TABLE 5
Table 3 is a public opinion training data time sequence table, table 4 is a public opinion training specific data time sequence table, and table 5 is a public opinion training specific data non-time sequence table. As shown in tables 3-5, after converting time series data to non-time series data, the model needs to predict the enterprise collaborative prediction result by looking at the public opinion training data, the enterprise training data and the prediction result data. The problem belongs to a regression model of machine learning, and algorithms such as GBRFR (gradient enhanced random forest regression), ETR (EXTRA TREE regression) and the like commonly used in the industry can be used for training the regression model.
For example, for a certain hot news search, the model predicts the credit card issuance amount of 3 months in the future through public opinion statistics of the hot news search in the last 6 months and enterprise data of the current time point. For example, when the system acquires the "star" related hot news, the credit card issuance amount for the next 3 months of credit card issuance may be estimated.
The method can also be applied to the Wide and Deep neural network model and simultaneously carry out modeling based on time sequence data and non-time sequence data. In specific implementation, the time sequence data can be used as Deep characteristics, the Wide characteristics are associated with non-time sequence data through a shallow full-connection network, and the enterprise cooperation prediction result still adopts a single index value.
In one embodiment, retrieving the set of search keywords to obtain the enterprise data comprises:
searching and retrieving the keyword set in the cooperation system to obtain an enterprise name; and acquiring enterprise data according to the enterprise name and the enterprise cooperation prediction result definition.
In a specific implementation, the names of the businesses can be respectively retrieved in a business information retrieval system (such as a sky-eye check and a business registration website) based on the Np vocabularies. For example, TFIDF in hot search ". today's developmentnewThe top 3 words are ". times.", "price" and "queue", respectively, and the specific business name can be found by searching ". times.". In order to avoid retrieving multiple companies for different keywords, the terms may be manually aligned.
The enterprise data comprises the comprehensive influence basic data of the enterprise and the service relation data of the enterprise. The scale of the enterprise, the credit data of the enterprise and the current situation of the enterprise all relate to whether the bank chooses to cooperate with the bank. In order to avoid market risk and reputation risk brought to banks by joint-name cooperative products, enterprises with good credit and good operation condition are often selected for cooperation, and comprehensive influence basic data of the enterprises, such as enterprise scale, enterprise credit condition, judicial litigation number and the like, can be acquired in an enterprise information retrieval system according to enterprise names through a crawler technology.
TABLE 6
Table 6 is a service relationship data table. As shown in table 6, the service relationship elements of the enterprise include assets, intermediate revenue contributions, asset precipitation of corporate legal, and the like, and the data can be preprocessed to be suitable for the subsequent machine learning model, for example, the continuous variables are subjected to variable grouping to be converted into discrete variables, the discrete variables are subjected to normalization processing, and the like.
The definition of the enterprise cooperation prediction result can be determined according to the product operation core indexes of different joint name cooperation products, and proper enterprise data is selected according to the definition to be trained. For example, the joint-name cooperation product is a credit card product, and for a bank, the product operation core index is usually the credit card issuing amount or the total consumption amount of credit card customers, so the enterprise cooperation prediction result can be defined as the credit card issuing amount or the total consumption amount of credit card customers; if the product operation core index is a large-amount deposit receipt product, the enterprise cooperation prediction result can be defined as the total purchase amount of the deposit receipt client. In particular, the definition of the enterprise cooperation prediction result and the statistical period of the prediction need to be determined in advance. For example, the enterprise collaboration forecast is defined as the credit card issuance amount 3 months after the affiliation.
S104: and identifying the cooperative enterprises according to the enterprise cooperation prediction result.
In specific implementation, hot search news with the credit card issuing quantity larger than Nk in the future 3 months and enterprise names can be pushed to bank managers as cooperative enterprises by setting a threshold Nk (the Nk is a preset bank minimum marketing target value), so that decision basis is provided for the bank managers to develop the cooperation of joint-name products of the enterprises.
The execution subject of the collaborative enterprise identification method shown in fig. 1 may be a computer. As can be seen from the process shown in fig. 1, in the method for identifying a collaborative enterprise according to the embodiment of the present invention, a hot search keyword set and a search keyword set are determined according to the matching result between the vocabulary of each hot search abstract and the hot search lexicon, then corresponding public opinion data is obtained after each hot search abstract is integrated according to the similarity between the keyword sets, and then enterprise data is obtained according to the search keyword set and an enterprise collaboration prediction result is obtained according to the public opinion data and the enterprise data to identify the collaborative enterprise, so that a collaborative opportunity can be grasped in time, and the collaborative benefit is effectively improved.
The specific process of the embodiment of the invention is as follows:
1. and generating original hot searching data according to the hot searching titles and the corresponding news titles.
2. And inputting the original hot search data into a summary identification model created based on the historical summary data to obtain the hot search summary.
3. And determining the adjusting parameters of the vocabularies according to the matching results of the vocabularies of the hot search abstracts and the hot search lexicon.
4. And determining frequency data of each vocabulary according to the adjusting parameters, the word frequency and the frequency index of each vocabulary.
5. And determining a hot search keyword set and a retrieval keyword set according to the frequency data of each vocabulary.
6. And converting the keyword set of each hot search abstract into word vectors, and integrating the corresponding hot search abstract when the similarity among the word vectors meets the preset similarity condition.
7. And determining public opinion parameter data according to the emotion index data of the corresponding comments of the integrated hot search abstract, and determining public opinion fluctuation data according to the emotion index data and the public opinion parameter data.
8. And determining news praise rate according to the praise data of the news corresponding to the integrated hot search abstract, and acquiring public opinion platform data corresponding to the integrated hot search abstract.
9. And searching the search keyword set in the cooperation system to obtain the enterprise name, and defining and obtaining enterprise data according to the enterprise name and the enterprise cooperation prediction result.
10. And inputting the public opinion data and the enterprise data into a prediction model established based on the public opinion training data, the enterprise training data and the prediction result data to obtain an enterprise cooperation prediction result, and identifying the cooperative enterprises according to the enterprise cooperation prediction result.
In summary, the cooperative enterprise identification method provided by the embodiment of the invention is based on hot news data, and is used for integrally modeling multiple dimensions such as the public sentiment dynamic of news topics, the market, risks and bank relations of news association organizations and the like through an artificial intelligence algorithm technology, so that the benefit brought to banks by the associated products is evaluated, managers are helped to better grasp the opportunity of developing a cooperative contract with foreign organizations by using the public sentiment hot spots, and better cooperative benefit is brought.
Based on the same inventive concept, the embodiment of the invention also provides a device for identifying the cooperative enterprise, and as the problem solving principle of the device is similar to that of the method for identifying the cooperative enterprise, the implementation of the device can refer to the implementation of the method, and repeated parts are not repeated.
Fig. 6 is a block diagram showing the configuration of a cooperative enterprise identification apparatus in the embodiment of the present invention. As shown in fig. 6, the cooperative enterprise recognition apparatus includes:
the set determining module is used for determining a hot search keyword set and a search keyword set according to the matching result of the vocabulary of each hot search abstract and the hot search word bank;
the public opinion data module is used for integrating the hot search abstracts according to the similarity among the keyword sets of the hot search abstracts to obtain the public opinion data corresponding to the integrated hot search abstracts;
the prediction module is used for retrieving the search keyword set to obtain enterprise data, inputting the public opinion data and the enterprise data into a prediction model established based on the public opinion training data, the enterprise training data and the prediction result data, and obtaining an enterprise cooperation prediction result;
and the identification module is used for identifying the cooperative enterprises according to the enterprise cooperation prediction result.
To sum up, the cooperative enterprise recognition apparatus according to the embodiment of the present invention determines a hot search keyword set and a search keyword set according to a matching result between a vocabulary of each hot search abstract and a hot search lexicon, integrates each hot search abstract according to a similarity between the keyword sets to obtain corresponding public opinion data, obtains enterprise data according to the search keyword set, and obtains an enterprise cooperation prediction result according to the public opinion data and the enterprise data to recognize a cooperative enterprise, so that a cooperative opportunity can be grasped in time, and a cooperative benefit can be effectively improved.
The embodiment of the invention also provides a specific implementation mode of the computer equipment, which can realize all the steps in the cooperative enterprise identification method in the embodiment. Fig. 7 is a block diagram of a computer device in an embodiment of the present invention, and referring to fig. 7, the computer device specifically includes the following:
a processor (processor)701 and a memory (memory) 702.
The processor 701 is configured to call the computer program in the memory 702, and the processor implements all the steps in the collaborative enterprise identification method in the above embodiment when executing the computer program, for example, the processor implements the following steps when executing the computer program:
determining a hot search keyword set and a retrieval keyword set according to the matching result of the vocabulary of each hot search abstract and the hot search word bank;
integrating the hot search abstracts according to the similarity among the keyword sets of the hot search abstracts to obtain public opinion data corresponding to the integrated hot search abstracts;
retrieving the retrieval keyword set to obtain enterprise data, and inputting the public opinion data and the enterprise data into a prediction model established based on the public opinion training data, the enterprise training data and the prediction result data to obtain an enterprise cooperation prediction result;
and identifying the cooperative enterprises according to the enterprise cooperation prediction result.
To sum up, the computer device of the embodiment of the invention determines the hot search keyword set and the search keyword set according to the matching result of the vocabulary of each hot search abstract and the hot search lexicon, integrates each hot search abstract according to the similarity between the keyword sets to obtain the corresponding public opinion data, obtains the enterprise data according to the search keyword set, and obtains the enterprise cooperation prediction result according to the public opinion data and the enterprise data to identify the cooperative enterprises, so that the cooperative opportunity can be grasped in time, and the cooperative benefit is effectively improved.
An embodiment of the present invention further provides a computer-readable storage medium capable of implementing all the steps in the collaborative enterprise identification method in the foregoing embodiment, where the computer-readable storage medium stores a computer program, and the computer program, when executed by a processor, implements all the steps of the collaborative enterprise identification method in the foregoing embodiment, for example, when the processor executes the computer program, the processor implements the following steps:
determining a hot search keyword set and a retrieval keyword set according to the matching result of the vocabulary of each hot search abstract and the hot search word bank;
integrating the hot search abstracts according to the similarity among the keyword sets of the hot search abstracts to obtain public opinion data corresponding to the integrated hot search abstracts;
retrieving the retrieval keyword set to obtain enterprise data, and inputting the public opinion data and the enterprise data into a prediction model established based on the public opinion training data, the enterprise training data and the prediction result data to obtain an enterprise cooperation prediction result;
and identifying the cooperative enterprises according to the enterprise cooperation prediction result.
To sum up, the computer-readable storage medium of the embodiment of the present invention determines a hot search keyword set and a search keyword set according to a matching result between a vocabulary of each hot search abstract and a hot search lexicon, integrates each hot search abstract according to a similarity between the keyword sets to obtain corresponding public opinion data, obtains enterprise data according to the search keyword set, and obtains an enterprise cooperation prediction result according to the public opinion data and the enterprise data to identify a cooperative enterprise, so that a cooperative opportunity can be grasped in time, and a cooperative benefit can be effectively improved.
The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.
Those of skill in the art will further appreciate that the various illustrative logical blocks, units, and steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate the interchangeability of hardware and software, various illustrative components, elements, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design requirements of the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present embodiments.
The various illustrative logical blocks, or elements, or devices described in connection with the embodiments disclosed herein may be implemented or performed with a general purpose processor, a digital signal processor, an Application Specific Integrated Circuit (ASIC), a field programmable gate array or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A general-purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a digital signal processor and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a digital signal processor core, or any other similar configuration.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may be stored in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. For example, a storage medium may be coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC, which may be located in a user terminal. In the alternative, the processor and the storage medium may reside as discrete components in a user terminal.
In one or more exemplary designs, the functions described in the embodiments of the present invention may be implemented in hardware, software, firmware, or any combination of the three. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media that facilitate transfer of a computer program from one place to another. Storage media may be any available media that can be accessed by a general purpose or special purpose computer. For example, such computer-readable media can include, but is not limited to, RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to carry or store program code in the form of instructions or data structures and which can be read by a general-purpose or special-purpose computer, or a general-purpose or special-purpose processor. Additionally, any connection is properly termed a computer-readable medium, and, thus, is included if the software is transmitted from a website, server, or other remote source via a coaxial cable, fiber optic cable, twisted pair, Digital Subscriber Line (DSL), or wirelessly, e.g., infrared, radio, and microwave. Such discs (disk) and disks (disc) include compact disks, laser disks, optical disks, DVDs, floppy disks and blu-ray disks where disks usually reproduce data magnetically, while disks usually reproduce data optically with lasers. Combinations of the above may also be included in the computer-readable medium.
Claims (10)
1. A method for identifying a collaborative enterprise, comprising:
determining a hot search keyword set and a retrieval keyword set according to the matching result of the vocabulary of each hot search abstract and the hot search word bank;
integrating the hot search abstracts according to the similarity among the keyword sets of the hot search abstracts to obtain public opinion data corresponding to the integrated hot search abstracts;
retrieving the retrieval keyword set to obtain enterprise data, and inputting the public opinion data and the enterprise data into a prediction model established based on public opinion training data, enterprise training data and prediction result data to obtain an enterprise cooperation prediction result;
and identifying a cooperative enterprise according to the enterprise cooperation prediction result.
2. The method of claim 1, wherein determining the set of hot search keywords and the set of search keywords based on matching the vocabulary of each hot search summary with the vocabulary library of hot search keywords comprises:
determining the adjusting parameters of the vocabularies according to the matching results of the vocabularies of the hot search abstracts and the hot search lexicon;
determining frequency data of each vocabulary according to the adjusting parameters, the word frequency and the frequency index of each vocabulary;
and determining a hot search keyword set and a retrieval keyword set according to the frequency data of each vocabulary.
3. The method of claim 1, wherein integrating the hot search summaries according to the similarity between the keyword sets of the hot search summaries comprises:
converting the keyword set of each hot search abstract into a word vector;
and when the similarity between the word vectors meets a preset similarity condition, integrating the corresponding hot search abstracts.
4. The method of claim 1, wherein the obtaining public opinion data corresponding to the integrated hot search summary comprises:
determining public opinion index data according to the emotion index data of the comments corresponding to the integrated hot search abstract;
determining news approval rate according to approval data of the news corresponding to the integrated hot search abstract;
and acquiring public opinion platform data corresponding to the integrated hot search abstract.
5. The method as claimed in claim 4, wherein the public opinion index data includes public opinion parameter data and public opinion fluctuation data;
determining public opinion index data according to the emotion index data of the comments corresponding to the integrated hot search abstract comprises the following steps:
determining public opinion parameter data according to emotion index data of comments corresponding to the integrated hot search abstract;
and determining the public opinion fluctuation data according to the emotion index data and the public opinion parameter data.
6. The method of claim 1, wherein retrieving the set of search keywords to obtain enterprise data comprises:
searching the search keyword set in a cooperation system to obtain an enterprise name;
and defining and acquiring enterprise data according to the enterprise name and the enterprise cooperation prediction result.
7. The method of claim 1, further comprising:
generating original hot searching data according to the hot searching titles and the corresponding news titles;
and inputting the original hot search data into a summary identification model created based on historical summary data to obtain the hot search summary.
8. A collaborative enterprise identification apparatus, comprising:
the set determining module is used for determining a hot search keyword set and a search keyword set according to the matching result of the vocabulary of each hot search abstract and the hot search word bank;
the public opinion data module is used for integrating the hot search abstracts according to the similarity among the keyword sets of the hot search abstracts to obtain the public opinion data corresponding to the integrated hot search abstracts;
the prediction module is used for retrieving the retrieval keyword set to obtain enterprise data, inputting the public opinion data and the enterprise data into a prediction model established based on public opinion training data, enterprise training data and prediction result data, and obtaining an enterprise cooperation prediction result;
and the identification module is used for identifying the cooperative enterprises according to the enterprise cooperation prediction result.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executed on the processor, wherein the processor when executing the computer program implements the steps of the method for identifying a collaborative enterprise of any of claims 1 to 7.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the collaborative enterprise identification method according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210099908.7A CN114492432A (en) | 2022-01-27 | 2022-01-27 | Cooperative enterprise identification method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210099908.7A CN114492432A (en) | 2022-01-27 | 2022-01-27 | Cooperative enterprise identification method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114492432A true CN114492432A (en) | 2022-05-13 |
Family
ID=81475615
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210099908.7A Pending CN114492432A (en) | 2022-01-27 | 2022-01-27 | Cooperative enterprise identification method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114492432A (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109684481A (en) * | 2019-01-04 | 2019-04-26 | 深圳壹账通智能科技有限公司 | The analysis of public opinion method, apparatus, computer equipment and storage medium |
CN109992668A (en) * | 2019-04-04 | 2019-07-09 | 上海冰鉴信息科技有限公司 | A kind of enterprise's the analysis of public opinion method and apparatus based on from attention |
CN110689438A (en) * | 2019-08-26 | 2020-01-14 | 深圳壹账通智能科技有限公司 | Enterprise financial risk scoring method and device, computer equipment and storage medium |
US10552843B1 (en) * | 2016-12-05 | 2020-02-04 | Intuit Inc. | Method and system for improving search results by recency boosting customer support content for a customer self-help system associated with one or more financial management systems |
CN112581006A (en) * | 2020-12-25 | 2021-03-30 | 杭州衡泰软件有限公司 | Public opinion engine and method for screening public opinion information and monitoring enterprise main body risk level |
WO2021136009A1 (en) * | 2019-12-31 | 2021-07-08 | 阿里巴巴集团控股有限公司 | Search information processing method and apparatus, and electronic device |
CN113205409A (en) * | 2021-05-28 | 2021-08-03 | 中国工商银行股份有限公司 | Loan transaction processing method and device |
-
2022
- 2022-01-27 CN CN202210099908.7A patent/CN114492432A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10552843B1 (en) * | 2016-12-05 | 2020-02-04 | Intuit Inc. | Method and system for improving search results by recency boosting customer support content for a customer self-help system associated with one or more financial management systems |
CN109684481A (en) * | 2019-01-04 | 2019-04-26 | 深圳壹账通智能科技有限公司 | The analysis of public opinion method, apparatus, computer equipment and storage medium |
CN109992668A (en) * | 2019-04-04 | 2019-07-09 | 上海冰鉴信息科技有限公司 | A kind of enterprise's the analysis of public opinion method and apparatus based on from attention |
CN110689438A (en) * | 2019-08-26 | 2020-01-14 | 深圳壹账通智能科技有限公司 | Enterprise financial risk scoring method and device, computer equipment and storage medium |
WO2021136009A1 (en) * | 2019-12-31 | 2021-07-08 | 阿里巴巴集团控股有限公司 | Search information processing method and apparatus, and electronic device |
CN112581006A (en) * | 2020-12-25 | 2021-03-30 | 杭州衡泰软件有限公司 | Public opinion engine and method for screening public opinion information and monitoring enterprise main body risk level |
CN113205409A (en) * | 2021-05-28 | 2021-08-03 | 中国工商银行股份有限公司 | Loan transaction processing method and device |
Non-Patent Citations (2)
Title |
---|
NAJERA SANCHEZ等: "A Systematic Review of Sustainable Banking through a Co-Word Analysis", 《SUSTAINABILITY》, vol. 12, no. 1, 31 January 2020 (2020-01-31) * |
孙超: "面向产业合作的半监督关系抽取", 《中国优秀硕士学位论文全文数据库 信息科技辑》, 15 August 2020 (2020-08-15) * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20210272040A1 (en) | Systems and methods for language and speech processing with artificial intelligence | |
Day et al. | Deep learning for financial sentiment analysis on finance news providers | |
US11687218B1 (en) | User interface for use with a search engine for searching financial related documents | |
CN106503014B (en) | Real-time information recommendation method, device and system | |
WO2019175571A1 (en) | Combined methods and systems for online media content | |
CN109767318A (en) | Loan product recommended method, device, equipment and storage medium | |
CN112419029B (en) | Similar financial institution risk monitoring method, risk simulation system and storage medium | |
CN112418956A (en) | Financial product recommendation method and device | |
Fu et al. | A sentiment-aware trading volume prediction model for P2P market using LSTM | |
CN111695938A (en) | Product pushing method and system | |
CN109492097B (en) | Enterprise news data risk classification method | |
CN114819967A (en) | Data processing method and device, electronic equipment and computer readable storage medium | |
Pentland et al. | Does accuracy matter? Methodological considerations when using automated speech-to-text for social science research | |
CN114693409A (en) | Product matching method, device, computer equipment, storage medium and program product | |
Zhong et al. | Identification of opinion spammers using reviewer reputation and clustering analysis | |
CN114492432A (en) | Cooperative enterprise identification method and device | |
US11379929B2 (en) | Advice engine | |
WO2022271431A1 (en) | System and method that rank businesses in environmental, social and governance (esg) | |
CN116340644A (en) | Financial product recommendation method and device based on collaborative filtering algorithm | |
Zhu | [Retracted] Analysis of the Influence of Multimedia Information Fusion on the Psychological Emotion of Financial Investment Customers under the Background of e‐Commerce | |
Shiri et al. | Meme it Up: Patterns of Emoji Usage on Twitter | |
Gunarathne et al. | Racial Bias in Social Media Customer Service: Evidence from Twitter | |
Park et al. | Twitter sentiment analysis using machine learning | |
Law et al. | Assessing Public Opinions of Products Through Sentiment Analysis: Product Satisfaction Assessment by Sentiment Analysis | |
Singh et al. | Machine Learning and Artificial Intelligence based Analysis for Top Organization. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |