[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN110457699A - A kind of stop words method for digging, device, electronic equipment and storage medium - Google Patents

A kind of stop words method for digging, device, electronic equipment and storage medium Download PDF

Info

Publication number
CN110457699A
CN110457699A CN201910721384.9A CN201910721384A CN110457699A CN 110457699 A CN110457699 A CN 110457699A CN 201910721384 A CN201910721384 A CN 201910721384A CN 110457699 A CN110457699 A CN 110457699A
Authority
CN
China
Prior art keywords
word
attentively
text
duration
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910721384.9A
Other languages
Chinese (zh)
Other versions
CN110457699B (en
Inventor
俞一鹏
孙子荀
王泽宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201910721384.9A priority Critical patent/CN110457699B/en
Publication of CN110457699A publication Critical patent/CN110457699A/en
Application granted granted Critical
Publication of CN110457699B publication Critical patent/CN110457699B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

This application involves field of computer technology, more particularly to a kind of stop words method for digging, device, electronic equipment and storage medium, obtain the reading behavior data that the user's eye traced into is directed to text, wherein, the reading behavior data include at least respectively watching position attentively, respectively watching watching duration attentively, watching position sequence attentively for position attentively on text;Position corresponding word on text is respectively watched in determination attentively respectively, and according to position sequence is watched attentively, determines corresponding sequence of terms;According to determine each word, the sequence of terms and respectively watch position attentively watch duration attentively, calculate separately the weighted value of each word;According to the weighted value of each word, deactivated word set is determined, in this way, calculating weighted value to the sensing capability or attention of text based on user, to collect stop words, the accuracy of stop words excavation can be improved.

Description

A kind of stop words method for digging, device, electronic equipment and storage medium
Technical field
This application involves field of computer technology more particularly to a kind of stop words method for digging, device, electronic equipment and deposit Storage media.
Background technique
In the inter-related tasks such as information retrieval and natural language processing, stop words can be filtered out, and stop words generally occurs within Comparison it is frequent and there is no practical significance, memory space can be saved by filtering out stop words, improve search efficiency and accuracy.
In the prior art, the excavation mode of stop words is mainly based upon word frequency statistics, is determined and is weighed according to word frequency statistics result Weight values, lesser constitute of weighting weight values deactivates word set, but this mode mainly utilizes word frequency statistics, and obtained stop words is accurate Property is lower.
Summary of the invention
The embodiment of the present application provides a kind of stop words method for digging, device, electronic equipment and storage medium, is deactivated with improving The accuracy that word excavates.
Specific technical solution provided by the embodiments of the present application is as follows:
The application one embodiment provides a kind of stop words method for digging, comprising:
Obtain the reading behavior data that the user's eye traced into is directed to text, wherein the reading behavior data are at least Including respectively watching position attentively, respectively watch watching duration attentively, watching position sequence attentively for position attentively on text;
Position corresponding word on text is respectively watched in determination attentively respectively, and according to position sequence is watched attentively, determines corresponding word Word order column;
According to determine each word, the sequence of terms and respectively watch position attentively watch duration attentively, calculate separately described each The weighted value of word;
According to the weighted value of each word, deactivated word set is determined.
Another embodiment provides for a kind of stop words excavating gears by the application, comprising:
Module is obtained, the reading behavior data for being directed to text for obtaining the user's eye traced into, wherein the reading Behavioral data includes at least respectively watching position attentively, respectively watching watching duration attentively, watching position sequence attentively for position attentively on text;
First determining module respectively watches position corresponding word on text attentively for determining respectively, and according to watching position attentively Sequence determines corresponding sequence of terms;
Computing module, for according to determine each word, the sequence of terms and respectively watch position attentively watch duration attentively, point The weighted value of each word is not calculated;
Second determining module determines deactivated word set for the weighted value according to each word.
In conjunction with another embodiment of the application, according to each word determined, sequence of terms and respectively watching attentively for position is watched attentively Duration, when calculating separately the weighted value of each word, computing module is specifically used for:
According to determine each word, the sequence of terms and respectively watch position attentively watch duration attentively, establish directed cyclic graph, In the directed cyclic graph each node be respectively determine each word, node size be watch attentively duration, each node according to Sequence of terms is attached, and the length on the connection side between the word being connected in the directed cyclic graph is corresponding described Distance in text;
It determines the interstitial content that each word is connected in the directed cyclic graph, and determines each word in institute State the text weight in text;
According to each word corresponding text weight, the interstitial content watching duration attentively, be connected, determine each word respectively Weighted value, wherein the text weight, it is described watch attentively duration, the interstitial content being connected with the weighted value at It is positively correlated.
In conjunction with another embodiment of the application, according to each word determined, the sequence of terms and respectively position is watched attentively Watch duration attentively, when establishing the directed cyclic graph, computing module is specifically used for:
It deletes and watches the word that duration is not less than preset duration attentively;
The word segmentation result of the text is obtained, and by the word segmentation result and respectively watches the corresponding word in position attentively and compares It is right, it is adjusted to the corresponding word in position is respectively watched attentively, so that word adjusted can match point in the word segmentation result Word;
According to deletion and each word adjusted, watch duration and the sequence of terms attentively, establishes the directed cyclic graph.
In conjunction with another embodiment of the application, by the word segmentation result and respectively watches the corresponding word in position attentively and is compared, To when respectively watching the corresponding word in position attentively and being adjusted, computing module is specifically used for:
Watch the corresponding word in position attentively if it is determined that adjacent and belong to a participle, then adjacent watches the corresponding word in position attentively for described Language merges;
If it is determined that any one watch attentively the corresponding word in position include multiple participles, then by it is described any one watch position attentively Corresponding word is split;
If it is determined that it is blank that any one, which watches the corresponding word in position attentively, then the corresponding word in position is watched attentively described in deletion.
In conjunction with another embodiment of the application, according to each word corresponding text weight, the node watching duration attentively, be connected Number, when determining the weighted value of each word respectively, computing module is specifically used for:
The product for determining the corresponding text weight of each word respectively, watching duration with the interstitial content being connected attentively, will Weighted value of the product as corresponding word.
In conjunction with another embodiment of the application, according to the weighted value of each word, when determining deactivated word set, second is determined Module is specifically used for:
The word that weighted value is less than setting value is filtered out, according to the word filtered out, determines deactivated word set.
The application is another embodiment provides for a kind of electronic equipment, including memory, processor and is stored in memory Computer program that is upper and can running on a processor, the processor realize any of the above-described kind of stop words when executing described program The step of method for digging.
The application is stored thereon with computer program another embodiment provides for a kind of computer readable storage medium, The computer program realizes the step of any of the above-described kind of stop words method for digging when being executed by processor.
In the embodiment of the present application, the reading behavior data that the user's eye traced into is directed to text, reading behavior number are obtained According to respectively watching position attentively, respectively watch watching duration attentively, watching position sequence attentively for position attentively on text is included at least, so that it is determined that corresponding Word, sequence of terms, calculate the weighted value of each word, according to the weighted value of each word, determine deactivated word set, in this way, can be with Stop words is collected automatically when user completes reading task, it is more efficient, it also reduces and manually marks cost in advance, and due to Reading behavior data can reflect user to the sensing capability of word or word, calculates weighted value according to reading behavior data and determination stops Word, therefore obtain deactivated word set and can have cognitive meaning, applicability is wider, also more accurate.
Detailed description of the invention
Fig. 1 is the application architecture schematic diagram of stop words method for digging in the embodiment of the present application;
Fig. 2 is the application principle schematic diagram of stop words method for digging in the embodiment of the present application;
Fig. 3 is stop words method for digging flow chart in the embodiment of the present application;
Fig. 4 is reading behavior data modeling effect diagram in the embodiment of the present application;
Fig. 5 is reading model schematic diagram in the embodiment of the present application;
Fig. 6 is stop words method for digging technological frame schematic diagram in the embodiment of the present application;
Fig. 7 is the structural schematic diagram of stop words excavating gear in the embodiment of the present application;
Fig. 8 is the structural schematic diagram of electronic equipment in the embodiment of the present application.
Specific embodiment
Below in conjunction with the attached drawing in the embodiment of the present application, technical solutions in the embodiments of the present application carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of embodiments of the present application, is not whole embodiments.It is based on Embodiment in the application, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall in the protection scope of this application.
First simply to be introduced several concepts below convenient for the understanding to the embodiment of the present application:
Stop words: in the inter-related tasks such as information retrieval and natural language processing, to save memory space and improving search Efficiency, certain words or word are fallen in meeting automatic fitration before or after handling natural language data or text, these words or word, that is, quilt Referred to as stop words.
Eye tracking device: can track the movable device of eye, i.e., track to eye action trail, such as can be with Tracing fixation duration watches position, fixation times attentively, watches position sequence, number of winks, eye electricity attentively etc., wherein eye tracking device For example, eye tracker, eye tracking device based on computer vision, eye tracking device based on EEG signals etc., the application In embodiment and it is not limited.
Word frequency-inverse document frequency (term frequency-inverse document frequency, TF- IDF): being a kind of common weighting technique for information retrieval and data mining, can be used to assess a words for one The significance level of file set or a copy of it file in a corpus.
Directed cyclic graph: indicating connection and directive figure, be made of one group of vertex (also referred to as node) and directed edge, It further include out-degree and in-degree for vertex in digraph, wherein out-degree indicates to be entered by the sum on the side a vertex Degree indicates the sum for being directed toward the side on a vertex.
Artificial intelligence (Artificial Intelligence, AI) is to utilize digital computer or digital computer control Machine simulation, extension and the intelligence for extending people of system, perception environment obtain knowledge and the reason using Knowledge Acquirement optimum By, method, technology and application system.In other words, artificial intelligence is a complex art of computer science, it attempts to understand The essence of intelligence, and produce a kind of new intelligence machine that can be made a response in such a way that human intelligence is similar.Artificial intelligence The design principle and implementation method for namely studying various intelligence machines make machine have the function of perception, reasoning and decision.
Artificial intelligence technology is an interdisciplinary study, is related to that field is extensive, and the technology of existing hardware view also has software layer The technology in face.Artificial intelligence basic technology generally comprise as sensor, Special artificial intelligent chip, cloud computing, distributed storage, The technologies such as big data processing technique, operation/interactive system, electromechanical integration.Artificial intelligence software's technology mainly includes computer Several general orientation such as vision technique, voice processing technology, natural language processing technique and machine learning/deep learning.
Computer vision technique (Computer Vision, CV) computer vision is how a research makes machine " seeing " Science further just refer to and the machines such as replace human eye to be identified, tracked to target with video camera and computer and measured Device vision, and graphics process is further done, so that computer is treated as the image for being more suitable for eye-observation or sending instrument detection to. As a branch of science, the relevant theory and technology of computer vision research, it is intended to which foundation can be from image or multidimensional number According to the middle artificial intelligence system for obtaining information.Computer vision technique generally includes image procossing, image recognition, image, semantic reason Solution, image retrieval, optical character identification (Optical Character Recognition, OCR), video processing, video semanteme Understanding, video content/Activity recognition, three-dimension object are rebuild, three-dimensional (3Dimensions, 3D) technology, virtual reality, are enhanced now The technologies such as reality, synchronous superposition further include the biometrics identification technologies such as common recognition of face, fingerprint recognition. For example, the computer vision technique in artificial intelligence is related generally in the embodiment of the present application, by eye tracking device, in conjunction with meter Calculation machine vision technique can track and identify reading behavior of user's eye to text, obtain reading behavior data, such as Including watching position attentively, watching duration attentively etc..
In practice, stop words is critically important for inter-related tasks such as information retrieval or natural language processings, because usually The comparison that stop words occurs is frequent and does not have practical significance, if all being handled, can reduce efficiency, therefore usually stop in processing Word can filter out, excavate accurate stop words be it is necessary, in the prior art, be mainly based upon word frequency statistics, Weighted value is determined according to word frequency statistics result, and lesser constitute of weighting weight values deactivates word set, but is in the prior art mainly base In word frequency statistics, there is no user is considered to the sensing capability of text, obtained stop words accuracy is lower, and due to being only Using word frequency statistics, the object of statistics is different, and obtained stop words is also different, and does not have universality, and application field is limited, The stop words of some application fields is not off word in other application field, and in addition this mode application scenarios have cold start-up Problem is being not aware which is off word at the beginning, in the inter-related tasks such as execution information retrieval or natural-sounding processing, It needs first to carry out stop words excavation.
Therefore, in view of the above-mentioned problems, providing a kind of stop words method for digging in the embodiment of the present application, consider user to text The sensing capability of word obtains the reading behavior data that user's eye is directed to text, according to reading behavior by eye tracking device Data determine weighted value, and then excavate deactivated word set, in this way, stop words is excavated to the sensing capability of text based on the mankind, It is more in line with that user is actually required, obtained stop words is more accurate, has cognitive meaning, both has universality or has task Particularity, applicability is wider, and does not need to excavate in advance, can user carry out other reading tasks when automatic screening, nothing It needs user deliberately to go to judge which word or word are off word, also reduces artificial mark cost.
As shown in fig.1, for the application architecture schematic diagram of stop words method for digging in the embodiment of the present application, including terminal 100, eye tracking device 200, server 300.
Terminal 100 can be any smart machine such as smart phone, tablet computer, portable personal computer, terminal 100 On various application programs (Application, APP), such as browser, reader etc. can be installed, user can be in terminal Text is read on 100, can be with online reading, the text that can also be locally stored with reading terminal 100.
Eye tracking device 200 can be eye tracker, eye tracking device based on computer vision, be based on EEG signals The equipment such as eye tracking device, can be used for tracking the action trail of user's eye, eye tracking dress in the embodiment of the present application Setting 200 can be used alone, and also can integrate in terminal 100, as a function device of terminal 100, as long as eye with Track device 200 can be used in combination with terminal 100, and the display device of eye tracking device 200 and terminal 100 is to be located at together One plane, for example, user reads certain text on the terminal 100, eye tracking device 200 can be directed to user's eye The reading behavior track of the text is tracked, and user's eye action trail is projected and is shown in the display device of terminal 100 Text on, reading behavior data are obtained, for example including watching position attentively, watch duration attentively, watch position sequence attentively etc..
Server 300 can provide various network services for terminal 100, for application program different in terminal 100, clothes Business device 300 is it is considered that be to provide the background server of corresponding network service, for example, if user's online reading on the terminal 100 Certain article, then server 300 can provide corresponding business service for it, return to article content to terminal 100, also, this Shen Server 300 it please can also communicate and be connected with eye tracking device 200 in embodiment, server 300 receives eye tracking and fills The reading behavior data of 200 transmissions are set, and are performed corresponding processing, excavate deactivated word set, and export and deactivate word set.
Wherein, server 300 can be in a server, the server cluster that several servers form or cloud computing The heart.
It should be noted that stop words method for digging is mainly executed by 300 side of server in the embodiment of the present application, it is specific to join It reads shown in Fig. 2, is the application principle schematic diagram of stop words method for digging in the embodiment of the present application, as shown in Fig. 2, in terminal 100 It can also include textual display device 110, for text exhibition for user's reading, user is reading textual display device 110 When the text of upper displaying, eye tracking device 200 tracks the reading behavior of user's eye, obtains user's eye for text This reading behavior data, and it is sent to server 300, server 300 carries out the collection of stop words according to reading behavior data And output.
It is interconnected between terminal 100 and server 300 and between eye tracking device 200 and server 300 with passing through Net is connected, and realizes mutual communication.Optionally, above-mentioned internet uses standard communication techniques and/or agreement.Internet Usually internet, it may also be any network, including but not limited to local area network (Local Area Network, LAN), city Domain net (Metropolitan Area Network, MAN), wide area network (Wide Area Network, WAN), movement, You Xianhuo Any combination of person's wireless network, dedicated network or Virtual Private Network.In some embodiments, using including hypertext mark Remember language (Hyper Text Mark-up Language, HTML), extensible markup language (Extensible Markup Language, XML) etc. technology and/or format represent the data by network exchange.It additionally can be used such as safe Socket layer (Secure Socket Layer, SSL), Transport Layer Security (Transport Layer Security, TLS), void Quasi- dedicated network (Virtual Private Network, VPN), Internet Protocol Security (Internet Protocol Security, IPsec) etc. conventional encryption techniques encrypt all or some links.In further embodiments, can also make Replace or supplement above-mentioned data communication technology with customization and/or the exclusive data communication technology.
It should be noted that the application architecture figure in the embodiment of the present application is to clearly illustrate that the application is implemented Technical solution in example, does not constitute the limitation to technical solution provided by the embodiments of the present application, is also not limited to stop words Application is excavated, for other application architectures and service application, technical solution provided by the embodiments of the present application asks similar Topic, it is equally applicable.Below in each embodiment of the application, being applied to application architecture shown in FIG. 1 with stop words method for digging is Example is schematically illustrated.
Based on the above embodiment, the stop words method for digging in the embodiment of the present application is illustrated below, refering to Fig. 3 institute Show, is stop words method for digging flow chart in the embodiment of the present application, this method comprises:
Step 300: obtaining the reading behavior data that the user's eye traced into is directed to text, wherein reading behavior data Including at least respectively watching position attentively, respectively watch watching duration attentively, watching position sequence attentively for position attentively on text.
It in the embodiment of the present application when collecting stop words, specially goes to mark without mark personnel, it can be completed in user User is considered when collecting automatically when its task, and collecting stop words to the sensing capability of text, therefore in the embodiment of the present application When user reads, the reading behavior data of acquisition user can be tracked.
Specifically, user's eye action trail is tracked by eye tracking device, obtains eye tracking device pair User's eye action trail carries out the reading behavior data for text of tracking acquisition.
Wherein, reading behavior data include at least respectively watching position attentively, respectively watching watching duration attentively, infusing for position attentively on text Depending on position sequence, in the embodiment of the present application and it is not limited.
Wherein, in such a way that eye tracking device obtains reading behavior data, mode in the prior art can be used, In the embodiment of the present application and it is not limited.
For example, by the available eyeball image of eye tracking device, utilization is bright using pupil-corneal reflection vector method The principle of pupil and dark pupil extracts the pupil in eyeball image, corrects eye tracking device and eye using corneal reflection method The relative position of ball, using corneal reflection point data as the basic point of the relative position of eye tracking device and eyeball, pupil center Position coordinates indicate the position of sight, and then sight focus can be projected in the display device of terminal, it can obtain Position is watched attentively on text, is arranged successively to obtain sequentially in time and is watched position sequence attentively, and can also obtain each It is a watch position attentively watch the information such as duration attentively.
In another example can learn using computer vision technique in artificial intelligence by artificial intelligence, user's eye is excavated Measurement to the attention of text word or word, i.e. the reading behavior data of user.
Step 310: determining respectively watch position corresponding word on text attentively respectively, and according to position sequence is watched attentively, determine Corresponding sequence of terms.
It can include text or word in text in the embodiment of the present application, user's eye is watched position attentively on text, then led to Some word in text can be often corresponded to, also or the blank position in corresponding text, and can be according to fixation position Sequence is set, determines corresponding sequence of terms.
For example, as shown in fig.4, for reading behavior data modeling effect diagram in the embodiment of the present application, such as Fig. 4 institute Show, the content of text of displaying is the text that user reads, and obtains user's eye readding for the text by eye tracking device Behavioral data is read, position is watched in circled positions expression attentively in Fig. 4, and circle diameter can indicate that this watches the duration of watching attentively of position attentively, even Position sequence is watched in the arrow expression for connecing each circle attentively, and then can determine the word for respectively watching position attentively, corresponding sequence of terms, example Such as, first circled positions corresponding word is " mankind " in Fig. 4, and it is intermediate that next circled positions correspond in text " packet " and " containing " Blank position, reconnect that next circled positions are corresponding to be waited and so on for " function word ".
Step 320: according to determine each word, sequence of terms and respectively watch position attentively watch duration attentively, calculate separately each The weighted value of word.
Step 330: according to the weighted value of each word, determining deactivated word set.
It when executing step 330, specifically includes: filtering out the word that weighted value is less than setting value, according to the word filtered out, Determine deactivated word set.
Even weighted value is higher, and illustrating the word, importance is higher in the text, is more unlikely to be stop words, otherwise weight A possibility that value is lower, is off word is bigger, in the embodiment of the present application, can preset a setting value, will be less than setting The word of value is considered stop words, deactivates word set so as to collect out from text.
The specific embodiment of above-mentioned steps 320 is illustrated below, when executing step 320 in the embodiment of the present application, It specifically includes:
S1, each word according to determining, sequence of terms and that respectively watches position attentively watch duration attentively, establish directed cyclic graph, In directed cyclic graph each node be respectively determine each word, node size be watch duration, each node attentively according to word Sequence is attached, and the length on the connection side between the word being connected in directed cyclic graph is corresponding in the text Distance.
Wherein, directed cyclic graph is referred to as the reading model established in the embodiment of the present application.
In the embodiment of the present application, can the reading behavior data to user model, obtain a directed cyclic graph, example Such as it can be successively attached in sequence using each circle in such as figure 4 above as the node in a directed cyclic graph, but It is to watch attentively that word may be corresponding on position, may is blank position, it is also possible to corresponding is not an independent word, If the content that all circles are watched attentively to position establishes reading model as a node, complexity is increased, is reduced Efficiency therefore, can be to watching the corresponding word in position attentively in the embodiment of the present application when establishing reading model, i.e. directed cyclic graph Language is pre-processed, then establishes reading model specifically to above-mentioned steps S1, provides a kind of possible embodiment, specifically Include:
S1.1, deletion watch the word that duration is not less than preset duration attentively.
Wherein, preset duration can be configured based on practical experience, in the embodiment of the present application and be not limited.
If be mainly in view of in the embodiment of the present application certain watch attentively position to watch duration attentively too long, it may be possible to the user is stupefied Or other things are done, there is no text is read, if considering, this watches the duration of watching attentively of position attentively, this is finally calculated and watches position attentively Word weighted value can inaccuracy, also will affect to whether other words are therefore the judgement of stop words is pre-processed When, it can delete and watch the too long circle of duration attentively, that is, delete the corresponding word of the circle.
S1.2, the word segmentation result for obtaining text, and by word segmentation result and respectively watch the corresponding word in position attentively and be compared, it is right Respectively watch the corresponding word in position attentively to be adjusted, so that word adjusted can match the participle in word segmentation result.
It specifically includes: (1) obtaining the word segmentation result of text.
Wherein, text is segmented, preset segmentation methods can be used, in the embodiment of the present application and without limit System, such as N-gram model, jieba segmentation methods etc. include multiple participles in obtained word segmentation result, each participle can be with It is considered an independent word, it can be as the word of a minimum complete meaning.
(2) watch the corresponding word in position attentively by word segmentation result and respectively to be compared, to respectively watch attentively the corresponding word in position into Row adjustment, so that word adjusted can match the participle in word segmentation result.
It may be an independent word that is, respectively watching the corresponding word in position in the embodiment of the present application attentively, such as " noun ", it is also possible to not be an independent word, such as " noun ", it is also possible to which corresponding is blank, if therefore directly will be each The corresponding content in position is watched attentively as a node and establishes reading model, and weighted value calculating is carried out to each node, it may be final The stop words availability being collected into is lower, in order to guarantee the stop words finally collected all is one independent in the embodiment of the present application Word, therefore can be combined with the word segmentation result of text, is adjusted and standardization processing, reading model can be simplified, Denoising, can specifically there is following several situations:
1) if it is determined that adjacent watch the corresponding word in position attentively and belong to a participle, then the corresponding word in position is watched attentively by adjacent Merge.
For example, including participle: " function word " in word segmentation result, it is " function " that one, which is watched attentively the corresponding word in position, another It is a watch attentively the corresponding word in position be " word ", and the two watch attentively position be it is adjacent, then can watch the two attentively position Merge, merge into one and watch position attentively, i.e., corresponding word " function " and " word " is merged, merge into " function word ".
Further, after merging in the embodiment of the present application watch position attentively or word is corresponding watches duration attentively, can determine For this it is adjacent watch the corresponding average value for watching duration attentively in position attentively, can also be maximized, in the embodiment of the present application and without Limitation, can be configured according to the actual situation.
2) if it is determined that it includes multiple participles that any one, which watches the corresponding word in position attentively, then watch any one attentively position pair The word answered is split.
In the embodiment of the present application, watches attentively after the corresponding word in position is compared with word segmentation result, may not be minimum Participle then can watch this attentively position corresponding word according to participle in word segmentation result and split, for example, wrapping in word segmentation result Containing participle " noun ", but there is no " noun ", watching the corresponding word in position attentively is " noun ", therefore can be by the fixation position Set and split, be split as " noun " and " ", the word after fractionation all can serve as a node in reading model.
It further, can be with if the word after splitting can be with the participle in adjacent word or word composition word segmentation result The word after fractionation is adjusted again, so that the word and adjacent word or word after splitting merge into a participle.
And after further, being split in the embodiment of the present application watch position or word attentively watch duration attentively, can determine Position is corresponding to watch duration attentively to watch attentively before the fractionation, is also not limited to this.
3) it if it is determined that it is blank that any one, which watches the corresponding word in position attentively, then deletes and watches the corresponding word in position attentively.
That is, the circle for watching position attentively indicates blank position if watching the corresponding word in position attentively is blank, then say It is invalid that bright this, which watches position attentively, it is impossible to be used in stop words screening reduces calculation amount to reduce reading model complexity, can be with Watch this attentively position to delete, not as a node in reading model.
S1.3, according to deleting and each word adjusted, watching duration and sequence of terms attentively, establish reading model.
In the embodiment of the present application, by taking reading model is directed cyclic graph as an example, the word after adjustment and delete processing is made For a node of directed cyclic graph, each node is connected according to sequence of terms, a length of node size when watching attentively, two nodes The length for connecting side is the distance of word in the text, so that reading model is constructed, for example, as shown in fig.5, real for the application Reading model schematic diagram in example is applied, as shown in figure 5, each circle represents a word in Fig. 5, size is when watching attentively in circle It is long, such as watch duration attentively in seconds, the arrow direction for connecting two circles is to be determined according to sequence of terms, connection two I_j on the connection side of a circle indicates i-th of sequence, and j indicates the distance between former and later two words of sequence, i.e., in the text Distance, such as the 1_3 that marks on first circle to the connection side of second circle in Fig. 5 indicates first sequence of terms, The distance of former and later two words in the text is 3, it is understood that is first in the whole sequence of terms big for one Word is the distance between to second word.
S2, according to reading model, calculate separately the weighted value of each word.
It specifically includes: 1) determining the text weight of each word in the text.
Specifically, it is determined that the text weight of word, can be determined, specifically according to the word frequency statistics of word in the text Mode is simultaneously not limited, such as text weight is TF-IDF value.
2) interstitial content that each word is connected in directed cyclic graph is determined.
Determine the first time and the in-degree of node of each word in reading model.
3) according to each word corresponding text weight, the interstitial content watching duration attentively, be connected, determine each word respectively Weighted value.
Wherein, text weight, the interstitial content watching duration attentively, being connected are directly proportional to weighted value.
A kind of possible embodiment is provided in the embodiment of the present application: determine respectively the corresponding text weight of each word, The product for watching duration with the interstitial content being connected attentively, using product as the weighted value of corresponding word.
For example, weighted value is w, by taking text weight is TF-IDF as an example, then.
Wherein, S watches duration attentively to be normalized, and D is the sum of the out-degree and in-degree after normalization, that is, the number of nodes being connected Mesh.
It further, not only can be according to the reading behavior data of user, to collect stop words, also in the embodiment of the present application It can extend to other application, for example, generating article abstract, such as some sentence note according to the reading behavior data of user It is longer between apparent time, it can be used as abstract, in another example, extracting keywords or other words relevant to cognitive activities etc. are used for, It in the embodiment of the present application and is not limited, as long as belonging to the inventive concept in the embodiment of the present application, all should belong to the application's Protection scope.
In the embodiment of the present application, the reading behavior data that the user's eye traced into is directed to text are obtained, are gone according to reading Respectively to watch position in data attentively, respectively watching watching duration attentively, watching position sequence attentively for position attentively, the weight of each word can be calculated Value determines deactivated word set according to the weighted value of each word, in this way, in conjunction with the reading behavior data of user, the power of the word of calculating Weight values consider sensing capability of the user to the attention of word or word and user to text, rather than according only to word in text Frequency statistics obtains, so that the deactivated word set finally collected is more accurate, has cognitive meaning, has not only had universality but also has had The particularity of task, since the deactivated word set being collected into is more accurate, by deactivated word set be applied to other natural language processings or When the inter-related tasks such as recommendation, performance, and the stop words method for digging in the application implementation can be greatly improved, it can be in user It completes to have collected stop words automatically when other reading tasks, does not need user and deliberately go to judge, reduce artificial mark cost.
Based on the above embodiment, the overall technical architecture of stop words method for digging in the embodiment of the present application is said below It is bright, it is stop words method for digging technological frame schematic diagram in the embodiment of the present application referring particularly to shown in Fig. 6.
1) as shown in fig.6, in the embodiment of the present application can according to eye tracking device track reading behavior data, with And text related information, establish reading model, wherein reading model is directed cyclic graph, when establishing reading model, Ke Yigen It according to text related information, is adjusted to position is respectively watched attentively, reading model is simplified in realization, reduces calculation amount and complexity.
Wherein, text related information may include the word segmentation result of text, content of text etc., and reading behavior data are at least wrapped Include respectively watching position attentively, respectively watching watching duration attentively, watching position sequence attentively for position attentively on text.
2) weighted value calculates.Specifically: it can be based on reading model and text weight, calculate the weighted value of each word, In, text weight is the weight according to the determining word of word frequency statistics in the text.
3) it according to weighted value, determines and exports stop words.
Specifically: filtering out the word that weighted value is less than setting value, as stop words, deactivate word set to generate.
In the embodiment of the present application, in conjunction with the reading behavior data and text related information of user, the weight of each word is determined Value, so that it is determined that deactivating word set, obtained stop words had both been based on text weight, had also been mentioned based on user to the sensing capability of text The accuracy that high stop words excavates.
Based on the same inventive concept, a kind of stop words excavating gear is additionally provided in the embodiment of the present application, which digs Pick device for example can be the server in previous embodiment, the stop words excavating gear can be hardware configuration, software module, Or hardware configuration adds software module.Based on the above embodiment, as shown in fig.7, stop words excavating gear in the embodiment of the present application, It specifically includes:
Module 70 is obtained, the reading behavior data for being directed to text for obtaining the user's eye traced into, wherein described to read It reads behavioral data and includes at least respectively watching position attentively, respectively watching watching duration attentively, watching position sequence attentively for position attentively on text;
First determining module 71 respectively watches position corresponding word on text attentively for determination respectively, and according to fixation position Sequence is set, determines corresponding sequence of terms;
Computing module 72, for according to determine each word, the sequence of terms and respectively watch position attentively watch duration attentively, Calculate separately the weighted value of each word;
Second determining module 73 determines deactivated word set for the weighted value according to each word.
Optionally, according to determine each word, sequence of terms and respectively watch position attentively watch duration attentively, calculate separately described When the weighted value of each word, computing module 72 is specifically used for:
According to determine each word, the sequence of terms and respectively watch position attentively watch duration attentively, establish directed cyclic graph, In the directed cyclic graph each node be respectively determine each word, node size be watch attentively duration, each node according to Sequence of terms is attached, and the length on the connection side between the word being connected in the directed cyclic graph is corresponding described Distance in text;
It determines the interstitial content that each word is connected in the directed cyclic graph, and determines each word in institute State the text weight in text;
According to each word corresponding text weight, the interstitial content watching duration attentively, be connected, determine each word respectively Weighted value, wherein the text weight, it is described watch attentively duration, the interstitial content being connected with the weighted value at It is positively correlated.
Optionally, according to each word determined, the sequence of terms and respectively watch the duration of watching attentively of position attentively, described in foundation When directed cyclic graph, computing module 72 is specifically used for:
It deletes and watches the word that duration is not less than preset duration attentively;
The word segmentation result of the text is obtained, and by the word segmentation result and respectively watches the corresponding word in position attentively and compares It is right, it is adjusted to the corresponding word in position is respectively watched attentively, so that word adjusted can match point in the word segmentation result Word;
According to deletion and each word adjusted, watch duration and the sequence of terms attentively, establishes the directed cyclic graph.
Optionally, watch the corresponding word in position attentively by the word segmentation result and respectively to be compared, it is corresponding to position is respectively watched attentively Word when being adjusted, computing module 72 is specifically used for:
Watch the corresponding word in position attentively if it is determined that adjacent and belong to a participle, then adjacent watches the corresponding word in position attentively for described Language merges;
If it is determined that any one watch attentively the corresponding word in position include multiple participles, then by it is described any one watch position attentively Corresponding word is split;
If it is determined that it is blank that any one, which watches the corresponding word in position attentively, then the corresponding word in position is watched attentively described in deletion.
Optionally, according to each word corresponding text weight, the interstitial content watching duration attentively, be connected, determine institute respectively When stating the weighted value of each word, computing module 72 is specifically used for:
The product for determining the corresponding text weight of each word respectively, watching duration with the interstitial content being connected attentively, will Weighted value of the product as corresponding word.
Optionally, according to the weighted value of each word, when determining deactivated word set, the second determining module 73 is specifically used for:
The word that weighted value is less than setting value is filtered out, according to the word filtered out, determines deactivated word set.
It is schematical, only a kind of logical function partition to the division of module in the embodiment of the present application, it is practical to realize When there may be another division manner, in addition, each functional module in the embodiment of the present application can integrate in a processor In, it is also possible to physically exist alone, can also be integrated in two or more modules in a module.Above-mentioned integrated mould Block both can take the form of hardware realization, can also be realized in the form of software function module.
Based on the above embodiment, the structural schematic diagram of electronic equipment in the embodiment of the present application is shown refering to Fig. 8.
The embodiment of the present application provides a kind of electronic equipment, which may include 810 (Center of processor Processing Unit, CPU), memory 820, input equipment 830 and output equipment 840 etc., input equipment 830 may include Keyboard, mouse, touch screen etc., output equipment 840 may include display equipment, such as liquid crystal display (Liquid Crystal Display, LCD), cathode-ray tube (Cathode Ray Tube, CRT) etc..
Memory 820 may include read-only memory (ROM) and random access memory (RAM), and mention to processor 810 For the program instruction and data stored in memory 820.In the embodiment of the present application, memory 820 can be used for storing this Shen Please in embodiment any stop words method for digging program.
Processor 810 is by the program instruction for calling memory 820 to store, and processor 810 is for the program according to acquisition Any stop words method for digging in instruction execution the embodiment of the present application.
Based on the above embodiment, in the embodiment of the present application, a kind of computer readable storage medium is provided, is stored thereon with Computer program, the computer program realize the stop words excavation side in above-mentioned any means embodiment when being executed by processor Method.
It should be understood by those skilled in the art that, embodiments herein can provide as method, system or computer program Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the application Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the application, which can be used in one or more, The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) produces The form of product.
The application is referring to method, the process of equipment (system) and computer program product according to the embodiment of the present application Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates, Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one The step of function of being specified in a box or multiple boxes.
Although the preferred embodiment of the application has been described, it is created once a person skilled in the art knows basic Property concept, then additional changes and modifications may be made to these embodiments.So it includes excellent that the following claims are intended to be interpreted as It selects embodiment and falls into all change and modification of the application range.
Obviously, those skilled in the art can carry out various modification and variations without departing from this Shen to the embodiment of the present application Please embodiment spirit and scope.In this way, if these modifications and variations of the embodiment of the present application belong to the claim of this application And its within the scope of equivalent technologies, then the application is also intended to include these modifications and variations.

Claims (10)

1. a kind of stop words method for digging characterized by comprising
Obtain the reading behavior data that the user's eye traced into is directed to text, wherein the reading behavior data include at least Respectively watching position attentively, respectively watching watching duration attentively, watching position sequence attentively for position attentively on text;
Position corresponding word on text is respectively watched in determination attentively respectively, and according to position sequence is watched attentively, determines corresponding word sequence Column;
According to determine each word, the sequence of terms and respectively watch position attentively watch duration attentively, calculate separately each word Weighted value;
According to the weighted value of each word, deactivated word set is determined.
2. the method as described in claim 1, which is characterized in that according to each word, sequence of terms and each fixation position determined That sets watches duration attentively, calculates separately the weighted value of each word, specifically includes:
According to determine each word, the sequence of terms and respectively watch position attentively watch duration attentively, establish directed cyclic graph, it is described In directed cyclic graph each node be respectively determine each word, node size be watch duration, each node attentively according to word Sequence is attached, and the length on the connection side between the word being connected in the directed cyclic graph is corresponding in the text In distance;
It determines the interstitial content that each word is connected in the directed cyclic graph, and determines each word in the text Text weight in this;
According to each word corresponding text weight, the interstitial content watching duration attentively, be connected, determine the power of each word respectively Weight values, wherein the text weight described watches duration, the interstitial content being connected attentively with the weighted value at positive It closes.
3. method according to claim 2, which is characterized in that according to each word, the sequence of terms and each note determined Duration is watched attentively depending on position, is established the directed cyclic graph, is specifically included:
It deletes and watches the word that duration is not less than preset duration attentively;
The word segmentation result of the text is obtained, and by the word segmentation result and respectively watches the corresponding word in position attentively and is compared, it is right Respectively watch the corresponding word in position attentively to be adjusted, so that word adjusted can match the participle in the word segmentation result;
According to deletion and each word adjusted, watch duration and the sequence of terms attentively, establishes the directed cyclic graph.
4. method as claimed in claim 3, which is characterized in that by the word segmentation result and respectively watch attentively the corresponding word in position into Row compares, and is adjusted, specifically includes to the corresponding word in position is respectively watched attentively:
Watch the corresponding word in position attentively if it is determined that adjacent and belong to a participle, then adjacent watch the corresponding word in position attentively by described and close And;
If it is determined that it includes multiple participles that any one, which watches the corresponding word in position attentively, then by it is described any one to watch position attentively corresponding Word split;
If it is determined that it is blank that any one, which watches the corresponding word in position attentively, then the corresponding word in position is watched attentively described in deletion.
5. method according to claim 2, which is characterized in that according to the corresponding text weight of each word, watch duration attentively, connected The interstitial content connect determines the weighted value of each word respectively, specifically includes:
The product for determining the corresponding text weight of each word respectively, watching duration with the interstitial content being connected attentively, will be described Weighted value of the product as corresponding word.
6. the method as described in claim 1, which is characterized in that according to the weighted value of each word, determine deactivated word set, tool Body includes:
The word that weighted value is less than setting value is filtered out, according to the word filtered out, determines deactivated word set.
7. a kind of stop words excavating gear characterized by comprising
Module is obtained, the reading behavior data for being directed to text for obtaining the user's eye traced into, wherein the reading behavior Data include at least respectively watching position attentively, respectively watching watching duration attentively, watching position sequence attentively for position attentively on text;
First determining module respectively watches position corresponding word on text attentively for determining respectively, and according to watching position sequence attentively, Determine corresponding sequence of terms;
Computing module, each word determined for basis, the sequence of terms and the duration of watching attentively for respectively watching position attentively, is counted respectively Calculate the weighted value of each word;
Second determining module determines deactivated word set for the weighted value according to each word.
8. device as claimed in claim 7, which is characterized in that according to each word, sequence of terms and each fixation position determined That sets watches duration attentively, and when calculating separately the weighted value of each word, computing module is specifically used for:
According to determine each word, the sequence of terms and respectively watch position attentively watch duration attentively, establish directed cyclic graph, it is described In directed cyclic graph each node be respectively determine each word, node size be watch duration, each node attentively according to word Sequence is attached, and the length on the connection side between the word being connected in the directed cyclic graph is corresponding in the text In distance;
It determines the interstitial content that each word is connected in the directed cyclic graph, and determines each word in the text Text weight in this;
According to each word corresponding text weight, the interstitial content watching duration attentively, be connected, determine the power of each word respectively Weight values, wherein the text weight described watches duration, the interstitial content being connected attentively with the weighted value at positive It closes.
9. a kind of electronic equipment including memory, processor and stores the calculating that can be run on a memory and on a processor Machine program, which is characterized in that the processor realizes the step of any one of claim 1-6 the method when executing described program Suddenly.
10. a kind of computer readable storage medium, is stored thereon with computer program, it is characterised in that: the computer program The step of any one of claim 1-6 the method is realized when being executed by processor.
CN201910721384.9A 2019-08-06 2019-08-06 Method and device for mining stop words, electronic equipment and storage medium Active CN110457699B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910721384.9A CN110457699B (en) 2019-08-06 2019-08-06 Method and device for mining stop words, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910721384.9A CN110457699B (en) 2019-08-06 2019-08-06 Method and device for mining stop words, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN110457699A true CN110457699A (en) 2019-11-15
CN110457699B CN110457699B (en) 2023-07-04

Family

ID=68485058

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910721384.9A Active CN110457699B (en) 2019-08-06 2019-08-06 Method and device for mining stop words, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN110457699B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111130846A (en) * 2019-11-26 2020-05-08 腾讯科技(深圳)有限公司 Target object determination method and device and storage medium
CN111680503A (en) * 2020-06-08 2020-09-18 腾讯科技(深圳)有限公司 Text processing method, device and equipment and computer readable storage medium
CN112954209A (en) * 2021-02-08 2021-06-11 维沃移动通信(杭州)有限公司 Photographing method and device, electronic equipment and medium
CN113537116A (en) * 2021-07-27 2021-10-22 重庆国翔创新教学设备有限公司 Reading material-matched auxiliary learning system, method, equipment and storage medium
CN114625857A (en) * 2022-03-23 2022-06-14 南京硅基智能科技有限公司 Prompter, English text tracking method, storage medium and electronic equipment
CN115238683A (en) * 2022-08-09 2022-10-25 平安科技(深圳)有限公司 Method, device, equipment and medium for recognizing stop words circularly and automatically paying attention
CN115292477A (en) * 2022-07-18 2022-11-04 盐城金堤科技有限公司 Method and device for judging pushing similar articles, storage medium and electronic equipment

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102609422A (en) * 2011-01-25 2012-07-25 阿里巴巴集团控股有限公司 Class misplacing identification method and device
CN103902552A (en) * 2012-12-25 2014-07-02 深圳市世纪光速信息技术有限公司 Stop word mining method and device, searching method and device, and evaluating method and device
US20160062458A1 (en) * 2014-09-02 2016-03-03 Tobii Ab Gaze based text input systems and methods
US20170139899A1 (en) * 2015-11-18 2017-05-18 Le Holdings (Beijing) Co., Ltd. Keyword extraction method and electronic device
WO2017157200A1 (en) * 2016-03-17 2017-09-21 阿里巴巴集团控股有限公司 Characteristic keyword extraction method and device
CN109408826A (en) * 2018-11-07 2019-03-01 北京锐安科技有限公司 A kind of text information extracting method, device, server and storage medium
US20190080623A1 (en) * 2017-09-14 2019-03-14 Massachusetts Institute Of Technology Eye Tracking As A Language Proficiency Test
CN109800434A (en) * 2019-01-25 2019-05-24 陕西师范大学 Abstract text header generation method based on eye movement attention
CN109948121A (en) * 2017-12-20 2019-06-28 北京京东尚科信息技术有限公司 Article similarity method for digging, system, equipment and storage medium
CN110059311A (en) * 2019-03-27 2019-07-26 银江股份有限公司 A kind of keyword extracting method and system towards judicial style data

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102609422A (en) * 2011-01-25 2012-07-25 阿里巴巴集团控股有限公司 Class misplacing identification method and device
CN103902552A (en) * 2012-12-25 2014-07-02 深圳市世纪光速信息技术有限公司 Stop word mining method and device, searching method and device, and evaluating method and device
US20160062458A1 (en) * 2014-09-02 2016-03-03 Tobii Ab Gaze based text input systems and methods
US20170139899A1 (en) * 2015-11-18 2017-05-18 Le Holdings (Beijing) Co., Ltd. Keyword extraction method and electronic device
WO2017157200A1 (en) * 2016-03-17 2017-09-21 阿里巴巴集团控股有限公司 Characteristic keyword extraction method and device
US20190080623A1 (en) * 2017-09-14 2019-03-14 Massachusetts Institute Of Technology Eye Tracking As A Language Proficiency Test
CN109948121A (en) * 2017-12-20 2019-06-28 北京京东尚科信息技术有限公司 Article similarity method for digging, system, equipment and storage medium
CN109408826A (en) * 2018-11-07 2019-03-01 北京锐安科技有限公司 A kind of text information extracting method, device, server and storage medium
CN109800434A (en) * 2019-01-25 2019-05-24 陕西师范大学 Abstract text header generation method based on eye movement attention
CN110059311A (en) * 2019-03-27 2019-07-26 银江股份有限公司 A kind of keyword extracting method and system towards judicial style data

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
张俊伟;杨柳;王硕宁;王忠建;: "基于文本挖掘的商品推荐", 哈尔滨商业大学学报(自然科学版), no. 04 *
张婷婷;王伟军;黄英辉;刘凯;胡祥恩;: "基于屏幕视觉热区的中文短文本关键词实时提取方法", 情报学报, no. 12 *
王继钢;: "文本挖掘重点技术研究", 漯河职业技术学院学报, no. 05 *
赵永威;周苑;李弼程;柯圣财;: "基于近义词自适应软分配和卡方模型的图像目标分类方法", 电子学报, no. 09 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111130846A (en) * 2019-11-26 2020-05-08 腾讯科技(深圳)有限公司 Target object determination method and device and storage medium
CN111130846B (en) * 2019-11-26 2021-09-14 腾讯科技(深圳)有限公司 Target object determination method and device and storage medium
CN111680503A (en) * 2020-06-08 2020-09-18 腾讯科技(深圳)有限公司 Text processing method, device and equipment and computer readable storage medium
CN112954209A (en) * 2021-02-08 2021-06-11 维沃移动通信(杭州)有限公司 Photographing method and device, electronic equipment and medium
CN113537116A (en) * 2021-07-27 2021-10-22 重庆国翔创新教学设备有限公司 Reading material-matched auxiliary learning system, method, equipment and storage medium
CN114625857A (en) * 2022-03-23 2022-06-14 南京硅基智能科技有限公司 Prompter, English text tracking method, storage medium and electronic equipment
CN114625857B (en) * 2022-03-23 2023-08-25 南京硅基智能科技有限公司 Prompter, english text tracking method, storage medium and electronic equipment
CN115292477A (en) * 2022-07-18 2022-11-04 盐城金堤科技有限公司 Method and device for judging pushing similar articles, storage medium and electronic equipment
CN115292477B (en) * 2022-07-18 2024-04-16 盐城天眼察微科技有限公司 Method and device for judging push similar articles, storage medium and electronic equipment
CN115238683A (en) * 2022-08-09 2022-10-25 平安科技(深圳)有限公司 Method, device, equipment and medium for recognizing stop words circularly and automatically paying attention
CN115238683B (en) * 2022-08-09 2023-06-20 平安科技(深圳)有限公司 Method, device, equipment and medium for recognizing stop words of circulating self-attention

Also Published As

Publication number Publication date
CN110457699B (en) 2023-07-04

Similar Documents

Publication Publication Date Title
CN110457699A (en) A kind of stop words method for digging, device, electronic equipment and storage medium
RU2714096C1 (en) Method, equipment and electronic device for detecting a face vitality
CN109766445B (en) Knowledge graph construction method and data processing device
CN113257383B (en) Matching information determination method, display method, device, equipment and storage medium
JP2021514087A (en) Connected kiosk for real-time assessment of fall risk
US20120106793A1 (en) Method and system for improving the quality and utility of eye tracking data
WO2022161234A1 (en) Image processing method and apparatus, and electronic device and storage medium
CN103309960B (en) The method and device that a kind of multidimensional information of network public sentiment event is extracted
CN112104642B (en) Abnormal account number determination method and related device
Ehlers et al. Advancing digital earth: beyond the next generation
CN108281197A (en) A method of relationship between analysis environmental factor and juvenile shortsightedness
CN108415653A (en) Screen locking method and device for terminal device
CN117011859A (en) Picture processing method and related device
CN118035945B (en) Label recognition model processing method and related device
CN112037305B (en) Method, device and storage medium for reconstructing tree-like organization in image
CN118860156A (en) A VR interaction method and device based on metaverse virtual reality technology
CN112163095A (en) Data processing method, apparatus, equipment and storage medium
CN111797175A (en) Data storage method and device, storage medium and electronic equipment
CN117274448A (en) Method, device, electronic equipment and medium for generating action animation of virtual model
CN110147464A (en) Video recommendation method, device, electronic equipment and readable storage medium storing program for executing
Alqahtani et al. An agent-based intelligent HCI information system in mixed reality
CN114445757B (en) Nomination acquisition method, network training method, device, storage medium and equipment
CN111950575A (en) Device and method for fall detection
CN117576245B (en) Method and device for converting style of image, electronic equipment and storage medium
CN113658713B (en) Infection tendency prediction method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant