[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN118569612B - Work order duplicate checking method, system, equipment and storage medium - Google Patents

Work order duplicate checking method, system, equipment and storage medium Download PDF

Info

Publication number
CN118569612B
CN118569612B CN202411053509.2A CN202411053509A CN118569612B CN 118569612 B CN118569612 B CN 118569612B CN 202411053509 A CN202411053509 A CN 202411053509A CN 118569612 B CN118569612 B CN 118569612B
Authority
CN
China
Prior art keywords
similarity
query
work order
historical
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202411053509.2A
Other languages
Chinese (zh)
Other versions
CN118569612A (en
Inventor
罗兰
袁勋
陈虎兵
姜智明
许晓波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guotai Epoint Software Co Ltd
Original Assignee
Guotai Epoint Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guotai Epoint Software Co Ltd filed Critical Guotai Epoint Software Co Ltd
Priority to CN202411053509.2A priority Critical patent/CN118569612B/en
Publication of CN118569612A publication Critical patent/CN118569612A/en
Application granted granted Critical
Publication of CN118569612B publication Critical patent/CN118569612B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06311Scheduling, planning or task assignment for a person or group
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/254Fusion techniques of classification results, e.g. of results related to same input data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Human Resources & Organizations (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Strategic Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Economics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Operations Research (AREA)
  • Biomedical Technology (AREA)
  • Development Economics (AREA)
  • Computing Systems (AREA)
  • Educational Administration (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Molecular Biology (AREA)
  • Marketing (AREA)
  • Game Theory and Decision Science (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a work order duplicate checking method, a system, equipment and a storage medium. The work order re-checking method comprises the steps of obtaining query work order data and historical work order data, preprocessing the data to obtain a query work order list and a historical work order list, carrying out Word vector conversion and summation on the query work order list and the historical work order list through a Word2vector model trained in advance to obtain a query work order vector and a historical work order vector, calculating first similarity, calculating second similarity through Jaccard based on the query work order list and the historical work order list, extracting emotion feature vectors of the query work order data and the historical work order data through BERT-based, calculating third similarity, and determining whether the query work order data are repeated or not based on the first similarity, the second similarity and the third similarity. According to the invention, the similarity is calculated on the query work order data and the historical work order data from three aspects of semantics, word co-occurrence times and emotion, and the work order is checked and repeated through the similarity, so that the accuracy of checking the work order is improved.

Description

Work order duplicate checking method, system, equipment and storage medium
Technical Field
The application relates to the technical field of artificial intelligence, in particular to a work order duplicate checking method, a system, equipment and a storage medium.
Background
In the hot line service circulation, the process from receiving citizen appeal from telephone operators to forming work orders and distributing the work orders to corresponding departments for processing is single-ring dispatching. In the dispatch link, the operator can check the work order again after forming the work order, the department can check the work order again before processing the work order, and the action of merging processing can be performed after finding out the repeated work order, which is called as merging order. The work order searching and repeating is a front-end technology of the combined order, and the essence of the technology is a method for finding out repeated work orders according to the similarity of the work order texts. The method comprises the steps of firstly, dividing words of a work order, comparing word division results of different work orders, calculating the similarity of a current work order and a historical work order through cosine similarity, setting a similarity threshold value to confirm a repeated work order, judging that the similarity is increased when the work orders are repeated through the method, expressing different meanings when the words in the text are consistent, and mistakenly considering the repeated work order, namely, constructing an element mining model to mine elements in the work order to be checked and a target work order based on the similarity calculation of the semantics, and calculating the semantic similarity of the elements of the work order to be checked and the target work order element through a semantic analysis technology LSA to form the repeated work order. Both methods have limitations, and the accuracy rate of work order duplicate checking is low.
Disclosure of Invention
In view of the above, embodiments of the present application are directed to providing a method, a system, an apparatus, and a storage medium for checking a work order, which are used for solving the problems of limitations of the existing work order checking technology and low work order checking accuracy.
According to a first aspect of the present invention, the present invention provides a work order duplication checking method, including:
Acquiring inquiry work order data and historical work order data, and preprocessing to acquire an inquiry work order list and a historical work order list;
Word vector conversion and summation are respectively carried out on the query work Word list and the history work Word list through a pre-trained Word2vector model, so that a query work order vector and a history work order vector are obtained, and a first similarity is calculated;
Calculating a second similarity through Jaccard based on the query work word list and the historical work word list;
Extracting emotion feature vectors of the query worksheet data and the historical worksheet data through BERT-based, and calculating a third similarity;
And determining whether the query worksheet data is repeated or not through the first similarity, the second similarity and the third similarity.
Preferably, the acquiring and preprocessing the query worksheet data and the historical worksheet data to obtain the query worksheet table and the historical worksheet table includes:
Performing word segmentation through jieba by utilizing the historical worksheet data and constructing a related stop word list;
performing word segmentation and stop word processing on the query worksheet data and the historical worksheet data through the jieba and the related stop word list;
and splicing the title, the content and the title of the processed query worksheet data and the processed historical worksheet data to obtain a query worksheet table and a historical worksheet table.
Preferably, the training of the Word2vector model includes:
Taking a training sample as input of the Word2vector model, wherein a Word vector is output of the Word2vector model;
The training samples are constructed through training corpus, and each training sample comprises a center word and a context word.
Preferably, the Word vector conversion and summation are respectively performed on the query work Word table and the history work Word table through a pre-trained Word2vector model, so as to obtain a query work order vector and a history work order vector, and a first similarity is calculated, including;
Carrying out vector initialization on the historical work Word list through the Word2vector model;
inputting the historical work list into the Word2vector model to obtain Word vectors of each Word in the list, and summing to obtain the historical work list vectors;
inputting the query work Word list into the Word2vector model to obtain Word vectors of each Word in the Word list, and summing to obtain the query work list vector;
And calculating the first similarity through Cosine by using the historical work order vector and the inquiry work order vector.
Preferably, the calculating the second similarity by Jaccard based on the query word list and the history word list includes:
converting the query work word list and the history work word list into N-Gram sequences;
Counting the co-occurrence times of the N-Gram sequences of the query work word list and the historical work word list;
and calculating the second similarity through the Jaccard based on the co-occurrence times.
Preferably, the extracting the emotion feature vector of the query worksheet data and the historical worksheet data through the BERT-based and calculating a third similarity includes:
carrying out emotion analysis on the query worksheet data and the historical worksheet data through the BERT-based to obtain a query emotion tag and a historical emotion tag;
converting the query emotion label and the history emotion label into a query emotion vector and a history emotion vector respectively;
calculating a third similarity through the query emotion vector and the historical emotion vector;
The emotion vector is represented by the following formula:
;
Wherein: Is an emotion vector; Is an active emotion vector; Is a negative emotion vector; Is a neutral emotion vector;
The third similarity is represented by the following formula:
;
Wherein: For inquiring emotion vectors; Is a historical emotion vector.
Preferably, the determining whether the query worksheet data is repeated based on the first similarity, the second similarity, and the third similarity further includes:
weights are set for the first similarity, the second similarity and the third similarity;
Summing the weighted first similarity, the weighted second similarity and the weighted third similarity to obtain a query worker similarity;
Sorting the query worksheet data according to the query worksheet similarity;
and determining repeated query work order data according to the sequencing.
According to a third aspect of the present invention, there is also provided a work order duplication checking system, including:
The word list acquisition module is used for acquiring inquiry work order data and historical work order data and preprocessing the inquiry work order data and the historical work order data to acquire an inquiry work order list and a historical work order list;
The first similarity module is used for respectively carrying out Word vector conversion and summation on the query work Word list and the history work Word list through a pre-trained Word2vector model to obtain a query work order vector and a history work order vector and calculating first similarity;
The second similarity module is used for calculating second similarity through Jaccard by the query work word list and the history work word list;
the third similarity module is used for extracting emotion feature vectors of the query worksheet data and the historical worksheet data through BERT-based and calculating third similarity;
And the query module is used for determining whether the query work order data is repeated or not through the first similarity, the second similarity and the third similarity.
According to a fourth aspect of the present invention there is also provided an electronic device comprising a processor and a memory storing computer executable instructions executable by the processor, the processor executing the computer executable instructions to implement the work order repeat request method.
According to a fifth aspect of the present invention there is also provided a computer readable storage medium storing computer executable instructions which, when invoked and executed by a processor, cause the processor to implement the work order repeat method.
According to the first aspect of the invention, query work order data and historical work order data are obtained, first similarity is calculated through conversion into vectors and summation, whether semantics among the query work orders are similar or not is calculated through conversion into N-Gram sequences, the number of times of phrase co-occurrence among the query work orders is calculated, third similarity is calculated through BERT-based obtaining emotion feature vectors, whether emotion expressions among the query work orders are identical or not is calculated, even if words in the query work order form are identical with words in the historical work order form, the similarity obtained through emotion vector calculation can also be used for checking whether the work orders are repeated or not, otherwise, even if the two work order forms are not identical, the expressed emotion is identical, whether repeated work orders are repeatedly checked according to the calculation of the three similarities can be confirmed, and the accuracy of checking the work order is improved.
Further, since the first similarity, the second similarity and the third similarity represent different meanings respectively, the weight of each similarity is different, each similarity is weighted according to the importance degree of the three similarities, and the weighted three similarities are summed, so that the accuracy of checking the work order is improved.
Drawings
FIG. 1 shows a schematic flow chart of a work order duplication method according to one embodiment of the present invention;
FIG. 2 is a schematic flow chart diagram illustrating a method of obtaining the query work word list and the history work word list in step S100 shown in FIG. 1;
FIG. 3 is a schematic flow chart of a method for calculating the first similarity in step S200 shown in FIG. 1;
FIG. 4 is a schematic flow chart showing a method of calculating the second similarity in step S300 shown in FIG. 1;
FIG. 5 is a schematic flow chart of a method for calculating the third similarity in step S400 shown in FIG. 1;
FIG. 6 is a schematic flow chart diagram illustrating a method of ordering the query worksheets in step S500 of FIG. 1;
FIG. 7 illustrates a work order duplication method application diagram according to one embodiment of the present invention;
FIG. 8 illustrates a schematic diagram of a work order review system, according to one embodiment of the invention.
Detailed Description
In order that the above objects, features and advantages of the application will be readily understood, a more particular description of the application will be rendered by reference to the appended drawings. It is to be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application. It should be further noted that, for convenience of description, only some, but not all of the structures related to the present application are shown in the drawings. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The terms "comprising" and "having" and any variations thereof herein are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.
FIG. 1 shows a schematic flow chart of a work order duplication checking method according to one embodiment of the present invention, as shown in FIG. 1, including:
step S100, acquiring inquiry work order data and history work order data, preprocessing the inquiry work order data and the history work order data, and acquiring an inquiry work order list and a history work order list;
Step 200, word vector conversion and summation are respectively carried out on the query work Word list and the history work Word list through a Word2vector model trained in advance, so as to obtain a query work order vector and a history work order vector, and a first similarity is calculated;
step S300, calculating a second similarity through Jaccard (Jaccard distance) based on the query work word list and the history work word list;
S400, extracting emotion feature vectors of query work order data and historical work order data through BERT-based (emotion analysis model) and calculating a third similarity;
And S500, determining whether the query worksheet data is repeated or not through the first similarity, the second similarity and the third similarity.
The method comprises the steps of obtaining query work order data and historical work order data, obtaining and summing up the first similarity through conversion into vectors, inquiring whether semantics among work orders are similar, calculating second similarity through conversion into an N-Gram sequence (N-element model), inquiring the number of times of co-occurrence of phrases among the work orders, obtaining emotion feature vectors through BERT-based to calculate third similarity, obtaining whether emotion expressions among the query work orders are the same or not, obtaining whether the query work orders are repeated work orders according to the relation among the first similarity, the second similarity and the third similarity, comprehensively checking whether the query work orders are repeated or not according to calculation of the three similarities, and improving accuracy of work order duplicate checking.
Fig. 2 is a schematic flowchart of a method for obtaining the query work order table and the history work order table in step S100 shown in fig. 1, and in step S100, query work order data and history work order data are obtained and preprocessed, to obtain the query work order table and the history work order table, as shown in fig. 2, including:
step S110, word segmentation is carried out through jieba (knots) by utilizing historical worksheet data and a related stop word list is constructed;
step S120, performing word segmentation and stop word processing on the spliced query work order data and the history work order data through jieba and related stop word lists;
And step 130, splicing the title, the content and the title of the processed query worksheet data and the processed historical worksheet data to obtain a query worksheet table and a historical worksheet table.
Words with little influence on the meaning of the expression worksheet are selected by carrying out word segmentation on the historical worksheet data through jieba, a related stop word list is constructed through the words, words in the follow-up query worksheet data and the historical worksheet data are screened through the related stop word list, the words in the related stop word list are removed, and then the screened query worksheet data and the historical worksheet data are spliced through the formats of titles, contents and the titles are summarized, so that the weight of the titles is higher.
In step S200, training of the Word2vector model includes:
Taking a training sample as input of the Word2vector model, wherein a Word vector is output of the Word2vector model;
The training samples are constructed through training corpus, and each training sample comprises a center word and a context word.
Setting a training sample, wherein the sample comprises a center word and a context word, and preparing for the similarity of the subsequent calculation semantics, so that the model needs to be trained by combining the center word and the context word, and the semantics are obtained more accurately.
FIG. 3 is a schematic flowchart of the method for calculating the first similarity in the step S200 shown in FIG. 1, as shown in FIG. 3, in the step S200, word vector conversion and summation are performed on the query Word table and the history Word table through a Word2vector model trained in advance, so as to obtain a query work order vector and a history work order vector, and the first similarity is calculated, including;
step S210, carrying out vector initialization on a historical work Word list through a Word2vector model;
step S220, inputting the historical work list into a Word2vector model to obtain Word vectors of each Word in the list, and summing to obtain a historical work list vector;
Step S230, inputting the query work Word list into a Word2vector model to obtain Word vectors of each Word in the Word list, and summing the Word vectors to obtain the query work Word list vector;
Step S240, calculating the first similarity through Cosine by using the historical work order vector and the inquiry work order vector.
The model initializes the historical work word list, randomly initializes a word vector for each word, inputs the subsequent historical work word list and the query work word list into the model to obtain the vector of each word in each word list, calculates the vector of each word list through vector addition of each word in each word list, calculates the semantic similarity, and initializes 1 to be the same vector when the words in the query work word list do not obtain the vector.
Fig. 4 is a schematic flowchart of the method for calculating the second similarity in step S300 shown in fig. 1, and in step S300, as shown in fig. 4, the second similarity is calculated by Jaccard based on the query word table and the history word table, including:
step S310, converting the query work word list and the history work word list into N-Gram sequences;
Step S320, counting the co-occurrence times of N-Gram sequences of the query work word list and the history work word list;
Step S330, calculating the second similarity through Jaccard based on the co-occurrence times.
Restoring the query word list and the history word list into character strings, forming two words into a sequence through the N-Gram sequence, counting the co-occurrence times of the N-Gram sequence in the two word lists, obtaining similar contents in the two word lists, and finally calculating the second similarity through Jaccard.
Fig. 5 is a schematic flowchart showing a method for calculating the third similarity in step S400 shown in fig. 1, and in step S400, extracting emotion feature vectors of query worksheet data and historical worksheet data and calculating the third similarity by BERT-based includes:
s410, carrying out emotion analysis on query work order data and historical work order data through BERT-based to obtain a query emotion tag and a historical emotion tag;
step S420, respectively converting the query emotion label and the history emotion label into a query emotion vector and a history emotion vector;
Step S430, calculating a third similarity by inquiring the emotion vector and the historical emotion vector;
Emotion vector, as shown in the following formula:
;
Wherein: Is an emotion vector; Is an active emotion vector; Is a negative emotion vector; Is a neutral emotion vector;
A third similarity, as shown in the following formula:
;
Wherein: For inquiring emotion vectors; Is a historical emotion vector.
In the expression of the work order, the words are the same but the meanings to be expressed are different sometimes, so that the emotion similarity of the query work order data and the historical work order data needs to be calculated, emotion is divided into positive, negative and neutral, emotion analysis is carried out on words in the query work order table and the historical work order table, emotion labels of each word in the query work order table and the historical work order table are obtained respectively, the labels are converted into vectors, and the third similarity is calculated. Even if words in the query word list are consistent with words in the history word list, the similarity obtained through emotion vector calculation can also check that the work order is not repeated, otherwise, even if the two word lists do not have the same words, the semantics are the same, and the expressed emotion is the same, and the repeated work order can also be confirmed.
Fig. 6 is a schematic flowchart of the method for ordering the query worksheets in step S500 shown in fig. 1, where in step S500, it is determined whether the query worksheets are repeated based on the first similarity, the second similarity, and the third similarity, and further includes:
step S510, weights are set for the first similarity, the second similarity and the third similarity;
Step S520, summing the weighted first similarity, the weighted second similarity and the weighted third similarity to obtain a query worker single similarity;
step S530, sorting the query worksheet data according to the similarity of the query worksheets;
and S540, determining repeated query worksheets according to the sequences.
Three similarities are calculated through three aspects, but because the importance among the similarities is not always, the matching weight is added to each similarity, each similarity is weighted and then summed, and the query worksheets are ordered according to the similarity of the query worksheets, so that repeated query worksheet data are selected.
FIG. 7 is a schematic diagram showing the application of a work order review method according to one embodiment of the present invention, and as shown in FIG. 7, the work order review method can be applied to a hot line agent and a business entity, and the business processes are substantially similar.
The hot wire seat comprises a follow-up complaint, the follow-up complaint is checked again after the follow-up complaint is received, the conventional dispatch is carried out when the follow-up complaint is checked, and the merging processing is carried out when the follow-up complaint is checked. And suspending the repeated worksheets, then mounting the worksheets for junction handling, and finally returning for junction handling. And after the work order is suspended, the work order is combined with a similar historical work order to perform conventional dispatching or cancel the repeated work order.
The handling unit carries out signing tasks on work orders sent from the hot line agents in a conventional mode, checks the signing tasks again, carries out conventional processing when no repeated task orders are checked, carries out single-combining processing to suspend the tasks when the repeated tasks are checked, then carries out task feedback, and finally feeds back the tasks. Wherein the task can be withdrawn and ordered after suspending.
In one embodiment of the invention, related stop word lists are formed by manually performing jieba basic word segmentation according to historical work order data and then removing words with little influence on the meaning of the expressed work order in the comb management work order so as to reduce the interference influence generated by subsequent similar calculation. Exemplary words include related, department, wish, resort, help, platform, incoming, person, request, netizen, requirement, information, privacy, resolution, thank you, trouble, problem.
The method comprises the steps of inputting historical work order data, carrying out Word segmentation on the historical work order data through jieba, loading a stop Word library, carrying out stop Word removal processing, and training a Word2vector model, wherein the training process of the Word2vector model comprises the steps of firstly constructing a vocabulary, namely, the vocabulary obtained after the historical work order data is processed. The word vectors are then initialized, one for each word in the vocabulary. Training samples are then constructed, from the training corpus, each sample containing a center word and its context words. And then training the neural network, training a neural network model by using the training samples, and optimizing word vectors. Finally, extracting word vectors, and outputting word vectors corresponding to each word as a final result after training is completed. Word2vector cores are chosen because they are able to learn semantic relationships between words, generate high quality Word vectors, and are suitable for large-scale corpora.
Inputting two worksheet data to be compared, namely historical worksheet data and query worksheet data, respectively performing title + content + title splicing operation, title stitching is twice because titles with a high summary generalization in the worksheet data may be given a higher weight. Then carrying out jieba word segmentation and stop word processing respectively, and finally outputting two processed work word tables.
Loading a trained Word2vector model, converting the two processed work Word tables into Word vectors to obtain vectors of each Word, initializing the vectors into 1-dimension vectors if the vectors of each Word cannot be obtained, summing the vectors of each Word in each work Word table to obtain work vector quantities, and finally calculating the semantic similarity of the two work sheets by using Cosine similarity.
Recovering the work word list into character strings, converting the word list into N-Gram sequences, wherein N is 2, namely two continuous words form a sequence, counting the co-occurrence times of the N-Gram sequences in the two texts, calculating the similarity of the co-occurrence times data by using Jaccard distance, and finally outputting the similarity of the co-occurrence times of the words of the two work sheets.
And carrying out emotion analysis on the query work order data and the historical work order data through BERT-based to obtain query emotion labels and historical emotion labels, wherein in the expression of the work order, characters are the same but meanings to be expressed are different sometimes, so that emotion similarity of the query work order data and the historical work order data needs to be calculated, and emotion is classified into positive, negative and neutral. Respectively converting the query emotion label and the history emotion label into a query emotion vector and a history emotion vector, and calculating emotion similarity through the query emotion vector and the history emotion vector;
Emotion vector, as shown in the following formula:
;
Wherein: Is an emotion vector; Is an active emotion vector; Is a negative emotion vector; Is a neutral emotion vector;
for example, when a word is a positive emotion vector, the emotion vector ;
A third similarity, as shown in the following formula:
;
Wherein: For inquiring emotion vectors; Is a historical emotion vector.
And returning the final result to the front topN (front N) in descending order according to the sum and the order of the Cosines calculation result and the Jaccard calculation result, wherein the output fields are the work order title, the work order content and the similarity.
FIG. 8 illustrates a schematic diagram of a work order duplication checking system according to one embodiment of the present invention, as shown in FIG. 8, and further provides a work order duplication checking system, including:
The word list obtaining module 810 is configured to obtain and preprocess query work order data and historical work order data, and obtain a query work order list and a historical work order list;
The first similarity module 820 is configured to perform Word vector conversion and summation on the query work Word table and the history work Word table through a pre-trained Word2vector model, obtain a query work order vector and a history work order vector, and calculate a first similarity;
A second similarity module 830, configured to query the work word table and the history work word table to calculate a second similarity through Jaccard;
A third similarity module 840 for extracting emotion feature vectors of the query worksheet data and the historical worksheet data through the BERT-based and calculating a third similarity;
The query module 850 is configured to determine whether the query worksheet data is repeated according to the first similarity, the second similarity, and the third similarity.
In particular, there is also provided an electronic device comprising a processor and a memory storing computer executable instructions executable by the processor to implement a work order duplication method, for example, performing the above-described method steps S100 to S500 in fig. 1, and as further described in fig. 6, performing the above-described method steps S510 to S540.
In particular, there is also provided a computer readable storage medium storing computer executable instructions that, when invoked and executed by a processor, cause the processor to implement a work order review method, for example, performing the above-described method steps S100 to S500 in fig. 1, and as further performing the above-described method steps S510 to S540 in fig. 6.
The processor is used to implement various control logic for the apparatus, which may be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), a single-chip, ARM (Acorn RISCMachine) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination of these components. Also, the processor may be any conventional processor, microprocessor, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP and/or any other such configuration.
The memory may include a storage program area that may store an operating system, an application program required for at least one function, and a storage data area that may store data created according to device use, etc. In addition, the memory may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage device. In some embodiments, the memory optionally includes memory remotely located relative to the processor, the remote memory being connectable to the apparatus through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
By way of example, nonvolatile storage media can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically erasable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM), which acts as external cache memory. By way of illustration and not limitation, RAM may be available in many forms such as Synchronous RAM (SRAM), dynamic RAM, (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), SYNCHLINK DRAM (SLDRAM), and Direct Rambus RAM (DRRAM). The disclosed memory components or memories of the operating environments described herein are intended to comprise one or more of these and/or any other suitable types of memory.
Of course, those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-volatile computer readable storage medium, which when executed may comprise the steps of the above described method embodiments, to instruct related hardware (e.g., processors, controllers, etc.). The storage medium may be a memory, a magnetic disk, a floppy disk, a flash memory, an optical memory, etc.
The foregoing is merely exemplary of some embodiments of the application and other modifications may be made without departing from the spirit of the application.

Claims (8)

1. A work order duplication checking method, comprising:
Acquiring inquiry work order data and historical work order data, and preprocessing to acquire an inquiry work order list and a historical work order list;
Word vector conversion and summation are respectively carried out on the query work Word list and the history work Word list through a pre-trained Word2vector model, so that a query work order vector and a history work order vector are obtained, and a first similarity is calculated;
Calculating a second similarity through Jaccard based on the query work word list and the history work word list, including:
converting the query work word list and the history work word list into N-Gram sequences;
Counting the co-occurrence times of the N-Gram sequences of the query work word list and the historical work word list;
calculating the second similarity by the Jaccard based on the co-occurrence times;
Extracting emotion feature vectors of the query worksheet data and the historical worksheet data through BERT-based, and calculating a third similarity;
determining whether the query worksheet data is repeated based on the first similarity, the second similarity and the third similarity, including:
weights are set for the first similarity, the second similarity and the third similarity;
Summing the weighted first similarity, the weighted second similarity and the weighted third similarity to obtain a query worker similarity;
Sorting the query worksheet data according to the query worksheet similarity;
and determining repeated query work order data according to the sequencing.
2. The method of claim 1, wherein the obtaining and preprocessing the query worksheet data and the historical worksheet data to obtain the query worksheet table and the historical worksheet table comprises:
the historical worksheet data are subjected to word segmentation through jieba and a related stop word list is constructed;
performing word segmentation and stop word processing on the query worksheet data and the historical worksheet data through the jieba and the related stop word list;
and splicing the title, the content and the title of the processed query worksheet data and the processed historical worksheet data to obtain a query worksheet table and a historical worksheet table.
3. The method of claim 1, wherein the training of the Word2vector model comprises:
Taking a training sample as input of the Word2vector model, wherein a Word vector is output of the Word2vector model;
The training samples are constructed through training corpus, and each training sample comprises a center word and a context word.
4. The method of claim 3, wherein the performing Word vector conversion and summation on the query Word table and the history Word table through a pre-trained Word2vector model to obtain a query work order vector and a history work order vector and calculate a first similarity comprises;
The Word2vector model carries out vector initialization on the historical work Word list;
the historical work list is input into the Word2vector model to obtain Word vectors of each Word in the Word list and summed to obtain the historical work list vector;
the query work list is input into the Word2vector model to obtain Word vectors of each Word in the Word list and summed to obtain the query work list vector;
And calculating the first similarity by using the historical work order vector and the inquiry work order vector through Cosine.
5. The method of claim 1, wherein extracting emotion feature vectors of the query worksheet data and the historical worksheet data by BERT-based and calculating a third similarity comprises:
carrying out emotion analysis on the query worksheet data and the historical worksheet data through the BERT-based to obtain a query emotion tag and a historical emotion tag;
converting the query emotion label and the history emotion label into a query emotion vector and a history emotion vector respectively;
calculating a third similarity through the query emotion vector and the historical emotion vector;
The emotion vector is represented by the following formula:
Wherein: Is an emotion vector; Is an active emotion vector; Is a negative emotion vector; Is a neutral emotion vector;
The third similarity is represented by the following formula:
Wherein: For inquiring emotion vectors; Is a historical emotion vector.
6. A work order duplication checking system, comprising:
The word list acquisition module is used for acquiring inquiry work order data and historical work order data and preprocessing the inquiry work order data and the historical work order data to acquire an inquiry work order list and a historical work order list;
The first similarity module is used for respectively carrying out Word vector conversion and summation on the query work Word list and the history work Word list through a Word2vector model trained in advance to obtain a query work order vector and a history work order vector and calculating first similarity;
the second similarity module is used for calculating second similarity of the query work word list and the historical work word list through Jaccard, and comprises the following steps:
converting the query work word list and the history work word list into N-Gram sequences;
Counting the co-occurrence times of the N-Gram sequences of the query work word list and the historical work word list;
calculating the second similarity by the Jaccard based on the co-occurrence times;
The third similarity module is used for extracting emotion feature vectors of the query worksheet data and the historical worksheet data through BERT-based and calculating a third similarity;
the query module is used for determining whether the query worksheet data is repeated or not according to the first similarity, the second similarity and the third similarity, and comprises the following steps:
weights are set for the first similarity, the second similarity and the third similarity;
Summing the weighted first similarity, the weighted second similarity and the weighted third similarity to obtain a query worker similarity;
Sorting the query worksheet data according to the query worksheet similarity;
and determining repeated query work order data according to the sequencing.
7. An electronic device comprising a processor and a memory, the memory storing computer-executable instructions executable by the processor, the processor executing the computer-executable instructions to implement the worksheet duplication method of any of claims 1-5.
8. A computer readable storage medium storing computer executable instructions which, when invoked and executed by a processor, cause the processor to implement the work order repeat method of any one of claims 1 to 5.
CN202411053509.2A 2024-08-02 2024-08-02 Work order duplicate checking method, system, equipment and storage medium Active CN118569612B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411053509.2A CN118569612B (en) 2024-08-02 2024-08-02 Work order duplicate checking method, system, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411053509.2A CN118569612B (en) 2024-08-02 2024-08-02 Work order duplicate checking method, system, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN118569612A CN118569612A (en) 2024-08-30
CN118569612B true CN118569612B (en) 2024-12-06

Family

ID=92479608

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411053509.2A Active CN118569612B (en) 2024-08-02 2024-08-02 Work order duplicate checking method, system, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN118569612B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110956033A (en) * 2019-12-04 2020-04-03 北京中电普华信息技术有限公司 A text similarity calculation method and device
CN116562518A (en) * 2022-01-25 2023-08-08 顺丰科技有限公司 Work order recommendation method, device and storage medium

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107239574B (en) * 2017-06-29 2018-11-02 北京神州泰岳软件股份有限公司 A kind of intelligent Answer System knowledge-matched method and device of problem
CN109558484A (en) * 2018-10-24 2019-04-02 浙江华云信息科技有限公司 Electric power customer service work order emotion quantitative analysis method based on similarity word order matrix
CN109670167B (en) * 2018-10-24 2023-07-25 国网浙江省电力有限公司 Electric power customer service work order emotion quantitative analysis method based on similarity word order matrix
CN111325015B (en) * 2020-02-19 2024-01-30 南瑞集团有限公司 Document duplicate checking method and system based on semantic analysis
CN113935387A (en) * 2020-06-29 2022-01-14 中国电信股份有限公司 Text similarity determination method and device and computer readable storage medium
CN113239691A (en) * 2021-05-11 2021-08-10 中国石油大学(华东) Similar appeal work order screening method and device based on topic model
CN114610838A (en) * 2022-03-17 2022-06-10 平安科技(深圳)有限公司 Text emotion analysis method, device and equipment and storage medium
CN116384694A (en) * 2023-04-11 2023-07-04 成都秦川物联网科技股份有限公司 Intelligent gas industrial personal computer optimization method and Internet of things system
CN117350287B (en) * 2023-10-18 2024-09-20 山西首讯信息技术有限公司 Text emotion analysis method based on public opinion big data

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110956033A (en) * 2019-12-04 2020-04-03 北京中电普华信息技术有限公司 A text similarity calculation method and device
CN116562518A (en) * 2022-01-25 2023-08-08 顺丰科技有限公司 Work order recommendation method, device and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于词典扩充的电力客服工单情感倾向性分析;顾斌;彭涛;车伟;;现代电子技术;20170601(第11期);163-171 *

Also Published As

Publication number Publication date
CN118569612A (en) 2024-08-30

Similar Documents

Publication Publication Date Title
US11983269B2 (en) Deep neural network system for similarity-based graph representations
US10534863B2 (en) Systems and methods for automatic semantic token tagging
CN110163478B (en) Risk examination method and device for contract clauses
EP3926531A1 (en) Method and system for visio-linguistic understanding using contextual language model reasoners
US10650311B2 (en) Suggesting resources using context hashing
CN111222305A (en) Information structuring method and device
CN112579733A (en) Rule matching method, rule matching device, storage medium and electronic equipment
CN113468414A (en) Commodity searching method and device, computer equipment and storage medium
CN113112282A (en) Method, device, equipment and medium for processing consult problem based on client portrait
CN111522916A (en) Voice service quality detection method, model training method and device
CN112256863A (en) Method and device for determining corpus intentions and electronic equipment
CN112765357A (en) Text classification method and device and electronic equipment
CN112926308B (en) Method, device, equipment, storage medium and program product for matching text
CN117591657A (en) Intelligent dialogue management system and method based on AI
CN111428486B (en) Article information data processing method, device, medium and electronic equipment
CN118569612B (en) Work order duplicate checking method, system, equipment and storage medium
CN118113852A (en) Financial problem answering method, device, equipment, system, medium and product
US12050563B2 (en) Method and system for scalable acceleration of data processing pipeline
CN113139382A (en) Named entity identification method and device
CN117076946A (en) Short text similarity determination method, device and terminal
CN116525093A (en) Prediction method, device, equipment and storage medium for session ending
CN113157896B (en) Voice dialogue generation method and device, computer equipment and storage medium
Vishwanath et al. Deep reader: Information extraction from document images via relation extraction and natural language
CN112988993B (en) Question and answer method and computing device
CN112163585B (en) Text auditing method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant