CN110457699A - A kind of stop words method for digging, device, electronic equipment and storage medium - Google Patents
A kind of stop words method for digging, device, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN110457699A CN110457699A CN201910721384.9A CN201910721384A CN110457699A CN 110457699 A CN110457699 A CN 110457699A CN 201910721384 A CN201910721384 A CN 201910721384A CN 110457699 A CN110457699 A CN 110457699A
- Authority
- CN
- China
- Prior art keywords
- word
- attentively
- text
- duration
- sequence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/335—Filtering based on additional data, e.g. user or group profiles
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
This application involves field of computer technology, more particularly to a kind of stop words method for digging, device, electronic equipment and storage medium, obtain the reading behavior data that the user's eye traced into is directed to text, wherein, the reading behavior data include at least respectively watching position attentively, respectively watching watching duration attentively, watching position sequence attentively for position attentively on text;Position corresponding word on text is respectively watched in determination attentively respectively, and according to position sequence is watched attentively, determines corresponding sequence of terms;According to determine each word, the sequence of terms and respectively watch position attentively watch duration attentively, calculate separately the weighted value of each word;According to the weighted value of each word, deactivated word set is determined, in this way, calculating weighted value to the sensing capability or attention of text based on user, to collect stop words, the accuracy of stop words excavation can be improved.
Description
Technical field
This application involves field of computer technology more particularly to a kind of stop words method for digging, device, electronic equipment and deposit
Storage media.
Background technique
In the inter-related tasks such as information retrieval and natural language processing, stop words can be filtered out, and stop words generally occurs within
Comparison it is frequent and there is no practical significance, memory space can be saved by filtering out stop words, improve search efficiency and accuracy.
In the prior art, the excavation mode of stop words is mainly based upon word frequency statistics, is determined and is weighed according to word frequency statistics result
Weight values, lesser constitute of weighting weight values deactivates word set, but this mode mainly utilizes word frequency statistics, and obtained stop words is accurate
Property is lower.
Summary of the invention
The embodiment of the present application provides a kind of stop words method for digging, device, electronic equipment and storage medium, is deactivated with improving
The accuracy that word excavates.
Specific technical solution provided by the embodiments of the present application is as follows:
The application one embodiment provides a kind of stop words method for digging, comprising:
Obtain the reading behavior data that the user's eye traced into is directed to text, wherein the reading behavior data are at least
Including respectively watching position attentively, respectively watch watching duration attentively, watching position sequence attentively for position attentively on text;
Position corresponding word on text is respectively watched in determination attentively respectively, and according to position sequence is watched attentively, determines corresponding word
Word order column;
According to determine each word, the sequence of terms and respectively watch position attentively watch duration attentively, calculate separately described each
The weighted value of word;
According to the weighted value of each word, deactivated word set is determined.
Another embodiment provides for a kind of stop words excavating gears by the application, comprising:
Module is obtained, the reading behavior data for being directed to text for obtaining the user's eye traced into, wherein the reading
Behavioral data includes at least respectively watching position attentively, respectively watching watching duration attentively, watching position sequence attentively for position attentively on text;
First determining module respectively watches position corresponding word on text attentively for determining respectively, and according to watching position attentively
Sequence determines corresponding sequence of terms;
Computing module, for according to determine each word, the sequence of terms and respectively watch position attentively watch duration attentively, point
The weighted value of each word is not calculated;
Second determining module determines deactivated word set for the weighted value according to each word.
In conjunction with another embodiment of the application, according to each word determined, sequence of terms and respectively watching attentively for position is watched attentively
Duration, when calculating separately the weighted value of each word, computing module is specifically used for:
According to determine each word, the sequence of terms and respectively watch position attentively watch duration attentively, establish directed cyclic graph,
In the directed cyclic graph each node be respectively determine each word, node size be watch attentively duration, each node according to
Sequence of terms is attached, and the length on the connection side between the word being connected in the directed cyclic graph is corresponding described
Distance in text;
It determines the interstitial content that each word is connected in the directed cyclic graph, and determines each word in institute
State the text weight in text;
According to each word corresponding text weight, the interstitial content watching duration attentively, be connected, determine each word respectively
Weighted value, wherein the text weight, it is described watch attentively duration, the interstitial content being connected with the weighted value at
It is positively correlated.
In conjunction with another embodiment of the application, according to each word determined, the sequence of terms and respectively position is watched attentively
Watch duration attentively, when establishing the directed cyclic graph, computing module is specifically used for:
It deletes and watches the word that duration is not less than preset duration attentively;
The word segmentation result of the text is obtained, and by the word segmentation result and respectively watches the corresponding word in position attentively and compares
It is right, it is adjusted to the corresponding word in position is respectively watched attentively, so that word adjusted can match point in the word segmentation result
Word;
According to deletion and each word adjusted, watch duration and the sequence of terms attentively, establishes the directed cyclic graph.
In conjunction with another embodiment of the application, by the word segmentation result and respectively watches the corresponding word in position attentively and is compared,
To when respectively watching the corresponding word in position attentively and being adjusted, computing module is specifically used for:
Watch the corresponding word in position attentively if it is determined that adjacent and belong to a participle, then adjacent watches the corresponding word in position attentively for described
Language merges;
If it is determined that any one watch attentively the corresponding word in position include multiple participles, then by it is described any one watch position attentively
Corresponding word is split;
If it is determined that it is blank that any one, which watches the corresponding word in position attentively, then the corresponding word in position is watched attentively described in deletion.
In conjunction with another embodiment of the application, according to each word corresponding text weight, the node watching duration attentively, be connected
Number, when determining the weighted value of each word respectively, computing module is specifically used for:
The product for determining the corresponding text weight of each word respectively, watching duration with the interstitial content being connected attentively, will
Weighted value of the product as corresponding word.
In conjunction with another embodiment of the application, according to the weighted value of each word, when determining deactivated word set, second is determined
Module is specifically used for:
The word that weighted value is less than setting value is filtered out, according to the word filtered out, determines deactivated word set.
The application is another embodiment provides for a kind of electronic equipment, including memory, processor and is stored in memory
Computer program that is upper and can running on a processor, the processor realize any of the above-described kind of stop words when executing described program
The step of method for digging.
The application is stored thereon with computer program another embodiment provides for a kind of computer readable storage medium,
The computer program realizes the step of any of the above-described kind of stop words method for digging when being executed by processor.
In the embodiment of the present application, the reading behavior data that the user's eye traced into is directed to text, reading behavior number are obtained
According to respectively watching position attentively, respectively watch watching duration attentively, watching position sequence attentively for position attentively on text is included at least, so that it is determined that corresponding
Word, sequence of terms, calculate the weighted value of each word, according to the weighted value of each word, determine deactivated word set, in this way, can be with
Stop words is collected automatically when user completes reading task, it is more efficient, it also reduces and manually marks cost in advance, and due to
Reading behavior data can reflect user to the sensing capability of word or word, calculates weighted value according to reading behavior data and determination stops
Word, therefore obtain deactivated word set and can have cognitive meaning, applicability is wider, also more accurate.
Detailed description of the invention
Fig. 1 is the application architecture schematic diagram of stop words method for digging in the embodiment of the present application;
Fig. 2 is the application principle schematic diagram of stop words method for digging in the embodiment of the present application;
Fig. 3 is stop words method for digging flow chart in the embodiment of the present application;
Fig. 4 is reading behavior data modeling effect diagram in the embodiment of the present application;
Fig. 5 is reading model schematic diagram in the embodiment of the present application;
Fig. 6 is stop words method for digging technological frame schematic diagram in the embodiment of the present application;
Fig. 7 is the structural schematic diagram of stop words excavating gear in the embodiment of the present application;
Fig. 8 is the structural schematic diagram of electronic equipment in the embodiment of the present application.
Specific embodiment
Below in conjunction with the attached drawing in the embodiment of the present application, technical solutions in the embodiments of the present application carries out clear, complete
Site preparation description, it is clear that described embodiments are only a part of embodiments of the present application, is not whole embodiments.It is based on
Embodiment in the application, it is obtained by those of ordinary skill in the art without making creative efforts every other
Embodiment shall fall in the protection scope of this application.
First simply to be introduced several concepts below convenient for the understanding to the embodiment of the present application:
Stop words: in the inter-related tasks such as information retrieval and natural language processing, to save memory space and improving search
Efficiency, certain words or word are fallen in meeting automatic fitration before or after handling natural language data or text, these words or word, that is, quilt
Referred to as stop words.
Eye tracking device: can track the movable device of eye, i.e., track to eye action trail, such as can be with
Tracing fixation duration watches position, fixation times attentively, watches position sequence, number of winks, eye electricity attentively etc., wherein eye tracking device
For example, eye tracker, eye tracking device based on computer vision, eye tracking device based on EEG signals etc., the application
In embodiment and it is not limited.
Word frequency-inverse document frequency (term frequency-inverse document frequency, TF-
IDF): being a kind of common weighting technique for information retrieval and data mining, can be used to assess a words for one
The significance level of file set or a copy of it file in a corpus.
Directed cyclic graph: indicating connection and directive figure, be made of one group of vertex (also referred to as node) and directed edge,
It further include out-degree and in-degree for vertex in digraph, wherein out-degree indicates to be entered by the sum on the side a vertex
Degree indicates the sum for being directed toward the side on a vertex.
Artificial intelligence (Artificial Intelligence, AI) is to utilize digital computer or digital computer control
Machine simulation, extension and the intelligence for extending people of system, perception environment obtain knowledge and the reason using Knowledge Acquirement optimum
By, method, technology and application system.In other words, artificial intelligence is a complex art of computer science, it attempts to understand
The essence of intelligence, and produce a kind of new intelligence machine that can be made a response in such a way that human intelligence is similar.Artificial intelligence
The design principle and implementation method for namely studying various intelligence machines make machine have the function of perception, reasoning and decision.
Artificial intelligence technology is an interdisciplinary study, is related to that field is extensive, and the technology of existing hardware view also has software layer
The technology in face.Artificial intelligence basic technology generally comprise as sensor, Special artificial intelligent chip, cloud computing, distributed storage,
The technologies such as big data processing technique, operation/interactive system, electromechanical integration.Artificial intelligence software's technology mainly includes computer
Several general orientation such as vision technique, voice processing technology, natural language processing technique and machine learning/deep learning.
Computer vision technique (Computer Vision, CV) computer vision is how a research makes machine " seeing "
Science further just refer to and the machines such as replace human eye to be identified, tracked to target with video camera and computer and measured
Device vision, and graphics process is further done, so that computer is treated as the image for being more suitable for eye-observation or sending instrument detection to.
As a branch of science, the relevant theory and technology of computer vision research, it is intended to which foundation can be from image or multidimensional number
According to the middle artificial intelligence system for obtaining information.Computer vision technique generally includes image procossing, image recognition, image, semantic reason
Solution, image retrieval, optical character identification (Optical Character Recognition, OCR), video processing, video semanteme
Understanding, video content/Activity recognition, three-dimension object are rebuild, three-dimensional (3Dimensions, 3D) technology, virtual reality, are enhanced now
The technologies such as reality, synchronous superposition further include the biometrics identification technologies such as common recognition of face, fingerprint recognition.
For example, the computer vision technique in artificial intelligence is related generally in the embodiment of the present application, by eye tracking device, in conjunction with meter
Calculation machine vision technique can track and identify reading behavior of user's eye to text, obtain reading behavior data, such as
Including watching position attentively, watching duration attentively etc..
In practice, stop words is critically important for inter-related tasks such as information retrieval or natural language processings, because usually
The comparison that stop words occurs is frequent and does not have practical significance, if all being handled, can reduce efficiency, therefore usually stop in processing
Word can filter out, excavate accurate stop words be it is necessary, in the prior art, be mainly based upon word frequency statistics,
Weighted value is determined according to word frequency statistics result, and lesser constitute of weighting weight values deactivates word set, but is in the prior art mainly base
In word frequency statistics, there is no user is considered to the sensing capability of text, obtained stop words accuracy is lower, and due to being only
Using word frequency statistics, the object of statistics is different, and obtained stop words is also different, and does not have universality, and application field is limited,
The stop words of some application fields is not off word in other application field, and in addition this mode application scenarios have cold start-up
Problem is being not aware which is off word at the beginning, in the inter-related tasks such as execution information retrieval or natural-sounding processing,
It needs first to carry out stop words excavation.
Therefore, in view of the above-mentioned problems, providing a kind of stop words method for digging in the embodiment of the present application, consider user to text
The sensing capability of word obtains the reading behavior data that user's eye is directed to text, according to reading behavior by eye tracking device
Data determine weighted value, and then excavate deactivated word set, in this way, stop words is excavated to the sensing capability of text based on the mankind,
It is more in line with that user is actually required, obtained stop words is more accurate, has cognitive meaning, both has universality or has task
Particularity, applicability is wider, and does not need to excavate in advance, can user carry out other reading tasks when automatic screening, nothing
It needs user deliberately to go to judge which word or word are off word, also reduces artificial mark cost.
As shown in fig.1, for the application architecture schematic diagram of stop words method for digging in the embodiment of the present application, including terminal
100, eye tracking device 200, server 300.
Terminal 100 can be any smart machine such as smart phone, tablet computer, portable personal computer, terminal 100
On various application programs (Application, APP), such as browser, reader etc. can be installed, user can be in terminal
Text is read on 100, can be with online reading, the text that can also be locally stored with reading terminal 100.
Eye tracking device 200 can be eye tracker, eye tracking device based on computer vision, be based on EEG signals
The equipment such as eye tracking device, can be used for tracking the action trail of user's eye, eye tracking dress in the embodiment of the present application
Setting 200 can be used alone, and also can integrate in terminal 100, as a function device of terminal 100, as long as eye with
Track device 200 can be used in combination with terminal 100, and the display device of eye tracking device 200 and terminal 100 is to be located at together
One plane, for example, user reads certain text on the terminal 100, eye tracking device 200 can be directed to user's eye
The reading behavior track of the text is tracked, and user's eye action trail is projected and is shown in the display device of terminal 100
Text on, reading behavior data are obtained, for example including watching position attentively, watch duration attentively, watch position sequence attentively etc..
Server 300 can provide various network services for terminal 100, for application program different in terminal 100, clothes
Business device 300 is it is considered that be to provide the background server of corresponding network service, for example, if user's online reading on the terminal 100
Certain article, then server 300 can provide corresponding business service for it, return to article content to terminal 100, also, this Shen
Server 300 it please can also communicate and be connected with eye tracking device 200 in embodiment, server 300 receives eye tracking and fills
The reading behavior data of 200 transmissions are set, and are performed corresponding processing, excavate deactivated word set, and export and deactivate word set.
Wherein, server 300 can be in a server, the server cluster that several servers form or cloud computing
The heart.
It should be noted that stop words method for digging is mainly executed by 300 side of server in the embodiment of the present application, it is specific to join
It reads shown in Fig. 2, is the application principle schematic diagram of stop words method for digging in the embodiment of the present application, as shown in Fig. 2, in terminal 100
It can also include textual display device 110, for text exhibition for user's reading, user is reading textual display device 110
When the text of upper displaying, eye tracking device 200 tracks the reading behavior of user's eye, obtains user's eye for text
This reading behavior data, and it is sent to server 300, server 300 carries out the collection of stop words according to reading behavior data
And output.
It is interconnected between terminal 100 and server 300 and between eye tracking device 200 and server 300 with passing through
Net is connected, and realizes mutual communication.Optionally, above-mentioned internet uses standard communication techniques and/or agreement.Internet
Usually internet, it may also be any network, including but not limited to local area network (Local Area Network, LAN), city
Domain net (Metropolitan Area Network, MAN), wide area network (Wide Area Network, WAN), movement, You Xianhuo
Any combination of person's wireless network, dedicated network or Virtual Private Network.In some embodiments, using including hypertext mark
Remember language (Hyper Text Mark-up Language, HTML), extensible markup language (Extensible Markup
Language, XML) etc. technology and/or format represent the data by network exchange.It additionally can be used such as safe
Socket layer (Secure Socket Layer, SSL), Transport Layer Security (Transport Layer Security, TLS), void
Quasi- dedicated network (Virtual Private Network, VPN), Internet Protocol Security (Internet Protocol
Security, IPsec) etc. conventional encryption techniques encrypt all or some links.In further embodiments, can also make
Replace or supplement above-mentioned data communication technology with customization and/or the exclusive data communication technology.
It should be noted that the application architecture figure in the embodiment of the present application is to clearly illustrate that the application is implemented
Technical solution in example, does not constitute the limitation to technical solution provided by the embodiments of the present application, is also not limited to stop words
Application is excavated, for other application architectures and service application, technical solution provided by the embodiments of the present application asks similar
Topic, it is equally applicable.Below in each embodiment of the application, being applied to application architecture shown in FIG. 1 with stop words method for digging is
Example is schematically illustrated.
Based on the above embodiment, the stop words method for digging in the embodiment of the present application is illustrated below, refering to Fig. 3 institute
Show, is stop words method for digging flow chart in the embodiment of the present application, this method comprises:
Step 300: obtaining the reading behavior data that the user's eye traced into is directed to text, wherein reading behavior data
Including at least respectively watching position attentively, respectively watch watching duration attentively, watching position sequence attentively for position attentively on text.
It in the embodiment of the present application when collecting stop words, specially goes to mark without mark personnel, it can be completed in user
User is considered when collecting automatically when its task, and collecting stop words to the sensing capability of text, therefore in the embodiment of the present application
When user reads, the reading behavior data of acquisition user can be tracked.
Specifically, user's eye action trail is tracked by eye tracking device, obtains eye tracking device pair
User's eye action trail carries out the reading behavior data for text of tracking acquisition.
Wherein, reading behavior data include at least respectively watching position attentively, respectively watching watching duration attentively, infusing for position attentively on text
Depending on position sequence, in the embodiment of the present application and it is not limited.
Wherein, in such a way that eye tracking device obtains reading behavior data, mode in the prior art can be used,
In the embodiment of the present application and it is not limited.
For example, by the available eyeball image of eye tracking device, utilization is bright using pupil-corneal reflection vector method
The principle of pupil and dark pupil extracts the pupil in eyeball image, corrects eye tracking device and eye using corneal reflection method
The relative position of ball, using corneal reflection point data as the basic point of the relative position of eye tracking device and eyeball, pupil center
Position coordinates indicate the position of sight, and then sight focus can be projected in the display device of terminal, it can obtain
Position is watched attentively on text, is arranged successively to obtain sequentially in time and is watched position sequence attentively, and can also obtain each
It is a watch position attentively watch the information such as duration attentively.
In another example can learn using computer vision technique in artificial intelligence by artificial intelligence, user's eye is excavated
Measurement to the attention of text word or word, i.e. the reading behavior data of user.
Step 310: determining respectively watch position corresponding word on text attentively respectively, and according to position sequence is watched attentively, determine
Corresponding sequence of terms.
It can include text or word in text in the embodiment of the present application, user's eye is watched position attentively on text, then led to
Some word in text can be often corresponded to, also or the blank position in corresponding text, and can be according to fixation position
Sequence is set, determines corresponding sequence of terms.
For example, as shown in fig.4, for reading behavior data modeling effect diagram in the embodiment of the present application, such as Fig. 4 institute
Show, the content of text of displaying is the text that user reads, and obtains user's eye readding for the text by eye tracking device
Behavioral data is read, position is watched in circled positions expression attentively in Fig. 4, and circle diameter can indicate that this watches the duration of watching attentively of position attentively, even
Position sequence is watched in the arrow expression for connecing each circle attentively, and then can determine the word for respectively watching position attentively, corresponding sequence of terms, example
Such as, first circled positions corresponding word is " mankind " in Fig. 4, and it is intermediate that next circled positions correspond in text " packet " and " containing "
Blank position, reconnect that next circled positions are corresponding to be waited and so on for " function word ".
Step 320: according to determine each word, sequence of terms and respectively watch position attentively watch duration attentively, calculate separately each
The weighted value of word.
Step 330: according to the weighted value of each word, determining deactivated word set.
It when executing step 330, specifically includes: filtering out the word that weighted value is less than setting value, according to the word filtered out,
Determine deactivated word set.
Even weighted value is higher, and illustrating the word, importance is higher in the text, is more unlikely to be stop words, otherwise weight
A possibility that value is lower, is off word is bigger, in the embodiment of the present application, can preset a setting value, will be less than setting
The word of value is considered stop words, deactivates word set so as to collect out from text.
The specific embodiment of above-mentioned steps 320 is illustrated below, when executing step 320 in the embodiment of the present application,
It specifically includes:
S1, each word according to determining, sequence of terms and that respectively watches position attentively watch duration attentively, establish directed cyclic graph,
In directed cyclic graph each node be respectively determine each word, node size be watch duration, each node attentively according to word
Sequence is attached, and the length on the connection side between the word being connected in directed cyclic graph is corresponding in the text
Distance.
Wherein, directed cyclic graph is referred to as the reading model established in the embodiment of the present application.
In the embodiment of the present application, can the reading behavior data to user model, obtain a directed cyclic graph, example
Such as it can be successively attached in sequence using each circle in such as figure 4 above as the node in a directed cyclic graph, but
It is to watch attentively that word may be corresponding on position, may is blank position, it is also possible to corresponding is not an independent word,
If the content that all circles are watched attentively to position establishes reading model as a node, complexity is increased, is reduced
Efficiency therefore, can be to watching the corresponding word in position attentively in the embodiment of the present application when establishing reading model, i.e. directed cyclic graph
Language is pre-processed, then establishes reading model specifically to above-mentioned steps S1, provides a kind of possible embodiment, specifically
Include:
S1.1, deletion watch the word that duration is not less than preset duration attentively.
Wherein, preset duration can be configured based on practical experience, in the embodiment of the present application and be not limited.
If be mainly in view of in the embodiment of the present application certain watch attentively position to watch duration attentively too long, it may be possible to the user is stupefied
Or other things are done, there is no text is read, if considering, this watches the duration of watching attentively of position attentively, this is finally calculated and watches position attentively
Word weighted value can inaccuracy, also will affect to whether other words are therefore the judgement of stop words is pre-processed
When, it can delete and watch the too long circle of duration attentively, that is, delete the corresponding word of the circle.
S1.2, the word segmentation result for obtaining text, and by word segmentation result and respectively watch the corresponding word in position attentively and be compared, it is right
Respectively watch the corresponding word in position attentively to be adjusted, so that word adjusted can match the participle in word segmentation result.
It specifically includes: (1) obtaining the word segmentation result of text.
Wherein, text is segmented, preset segmentation methods can be used, in the embodiment of the present application and without limit
System, such as N-gram model, jieba segmentation methods etc. include multiple participles in obtained word segmentation result, each participle can be with
It is considered an independent word, it can be as the word of a minimum complete meaning.
(2) watch the corresponding word in position attentively by word segmentation result and respectively to be compared, to respectively watch attentively the corresponding word in position into
Row adjustment, so that word adjusted can match the participle in word segmentation result.
It may be an independent word that is, respectively watching the corresponding word in position in the embodiment of the present application attentively, such as
" noun ", it is also possible to not be an independent word, such as " noun ", it is also possible to which corresponding is blank, if therefore directly will be each
The corresponding content in position is watched attentively as a node and establishes reading model, and weighted value calculating is carried out to each node, it may be final
The stop words availability being collected into is lower, in order to guarantee the stop words finally collected all is one independent in the embodiment of the present application
Word, therefore can be combined with the word segmentation result of text, is adjusted and standardization processing, reading model can be simplified,
Denoising, can specifically there is following several situations:
1) if it is determined that adjacent watch the corresponding word in position attentively and belong to a participle, then the corresponding word in position is watched attentively by adjacent
Merge.
For example, including participle: " function word " in word segmentation result, it is " function " that one, which is watched attentively the corresponding word in position, another
It is a watch attentively the corresponding word in position be " word ", and the two watch attentively position be it is adjacent, then can watch the two attentively position
Merge, merge into one and watch position attentively, i.e., corresponding word " function " and " word " is merged, merge into " function word ".
Further, after merging in the embodiment of the present application watch position attentively or word is corresponding watches duration attentively, can determine
For this it is adjacent watch the corresponding average value for watching duration attentively in position attentively, can also be maximized, in the embodiment of the present application and without
Limitation, can be configured according to the actual situation.
2) if it is determined that it includes multiple participles that any one, which watches the corresponding word in position attentively, then watch any one attentively position pair
The word answered is split.
In the embodiment of the present application, watches attentively after the corresponding word in position is compared with word segmentation result, may not be minimum
Participle then can watch this attentively position corresponding word according to participle in word segmentation result and split, for example, wrapping in word segmentation result
Containing participle " noun ", but there is no " noun ", watching the corresponding word in position attentively is " noun ", therefore can be by the fixation position
Set and split, be split as " noun " and " ", the word after fractionation all can serve as a node in reading model.
It further, can be with if the word after splitting can be with the participle in adjacent word or word composition word segmentation result
The word after fractionation is adjusted again, so that the word and adjacent word or word after splitting merge into a participle.
And after further, being split in the embodiment of the present application watch position or word attentively watch duration attentively, can determine
Position is corresponding to watch duration attentively to watch attentively before the fractionation, is also not limited to this.
3) it if it is determined that it is blank that any one, which watches the corresponding word in position attentively, then deletes and watches the corresponding word in position attentively.
That is, the circle for watching position attentively indicates blank position if watching the corresponding word in position attentively is blank, then say
It is invalid that bright this, which watches position attentively, it is impossible to be used in stop words screening reduces calculation amount to reduce reading model complexity, can be with
Watch this attentively position to delete, not as a node in reading model.
S1.3, according to deleting and each word adjusted, watching duration and sequence of terms attentively, establish reading model.
In the embodiment of the present application, by taking reading model is directed cyclic graph as an example, the word after adjustment and delete processing is made
For a node of directed cyclic graph, each node is connected according to sequence of terms, a length of node size when watching attentively, two nodes
The length for connecting side is the distance of word in the text, so that reading model is constructed, for example, as shown in fig.5, real for the application
Reading model schematic diagram in example is applied, as shown in figure 5, each circle represents a word in Fig. 5, size is when watching attentively in circle
It is long, such as watch duration attentively in seconds, the arrow direction for connecting two circles is to be determined according to sequence of terms, connection two
I_j on the connection side of a circle indicates i-th of sequence, and j indicates the distance between former and later two words of sequence, i.e., in the text
Distance, such as the 1_3 that marks on first circle to the connection side of second circle in Fig. 5 indicates first sequence of terms,
The distance of former and later two words in the text is 3, it is understood that is first in the whole sequence of terms big for one
Word is the distance between to second word.
S2, according to reading model, calculate separately the weighted value of each word.
It specifically includes: 1) determining the text weight of each word in the text.
Specifically, it is determined that the text weight of word, can be determined, specifically according to the word frequency statistics of word in the text
Mode is simultaneously not limited, such as text weight is TF-IDF value.
2) interstitial content that each word is connected in directed cyclic graph is determined.
Determine the first time and the in-degree of node of each word in reading model.
3) according to each word corresponding text weight, the interstitial content watching duration attentively, be connected, determine each word respectively
Weighted value.
Wherein, text weight, the interstitial content watching duration attentively, being connected are directly proportional to weighted value.
A kind of possible embodiment is provided in the embodiment of the present application: determine respectively the corresponding text weight of each word,
The product for watching duration with the interstitial content being connected attentively, using product as the weighted value of corresponding word.
For example, weighted value is w, by taking text weight is TF-IDF as an example, then.
Wherein, S watches duration attentively to be normalized, and D is the sum of the out-degree and in-degree after normalization, that is, the number of nodes being connected
Mesh.
It further, not only can be according to the reading behavior data of user, to collect stop words, also in the embodiment of the present application
It can extend to other application, for example, generating article abstract, such as some sentence note according to the reading behavior data of user
It is longer between apparent time, it can be used as abstract, in another example, extracting keywords or other words relevant to cognitive activities etc. are used for,
It in the embodiment of the present application and is not limited, as long as belonging to the inventive concept in the embodiment of the present application, all should belong to the application's
Protection scope.
In the embodiment of the present application, the reading behavior data that the user's eye traced into is directed to text are obtained, are gone according to reading
Respectively to watch position in data attentively, respectively watching watching duration attentively, watching position sequence attentively for position attentively, the weight of each word can be calculated
Value determines deactivated word set according to the weighted value of each word, in this way, in conjunction with the reading behavior data of user, the power of the word of calculating
Weight values consider sensing capability of the user to the attention of word or word and user to text, rather than according only to word in text
Frequency statistics obtains, so that the deactivated word set finally collected is more accurate, has cognitive meaning, has not only had universality but also has had
The particularity of task, since the deactivated word set being collected into is more accurate, by deactivated word set be applied to other natural language processings or
When the inter-related tasks such as recommendation, performance, and the stop words method for digging in the application implementation can be greatly improved, it can be in user
It completes to have collected stop words automatically when other reading tasks, does not need user and deliberately go to judge, reduce artificial mark cost.
Based on the above embodiment, the overall technical architecture of stop words method for digging in the embodiment of the present application is said below
It is bright, it is stop words method for digging technological frame schematic diagram in the embodiment of the present application referring particularly to shown in Fig. 6.
1) as shown in fig.6, in the embodiment of the present application can according to eye tracking device track reading behavior data, with
And text related information, establish reading model, wherein reading model is directed cyclic graph, when establishing reading model, Ke Yigen
It according to text related information, is adjusted to position is respectively watched attentively, reading model is simplified in realization, reduces calculation amount and complexity.
Wherein, text related information may include the word segmentation result of text, content of text etc., and reading behavior data are at least wrapped
Include respectively watching position attentively, respectively watching watching duration attentively, watching position sequence attentively for position attentively on text.
2) weighted value calculates.Specifically: it can be based on reading model and text weight, calculate the weighted value of each word,
In, text weight is the weight according to the determining word of word frequency statistics in the text.
3) it according to weighted value, determines and exports stop words.
Specifically: filtering out the word that weighted value is less than setting value, as stop words, deactivate word set to generate.
In the embodiment of the present application, in conjunction with the reading behavior data and text related information of user, the weight of each word is determined
Value, so that it is determined that deactivating word set, obtained stop words had both been based on text weight, had also been mentioned based on user to the sensing capability of text
The accuracy that high stop words excavates.
Based on the same inventive concept, a kind of stop words excavating gear is additionally provided in the embodiment of the present application, which digs
Pick device for example can be the server in previous embodiment, the stop words excavating gear can be hardware configuration, software module,
Or hardware configuration adds software module.Based on the above embodiment, as shown in fig.7, stop words excavating gear in the embodiment of the present application,
It specifically includes:
Module 70 is obtained, the reading behavior data for being directed to text for obtaining the user's eye traced into, wherein described to read
It reads behavioral data and includes at least respectively watching position attentively, respectively watching watching duration attentively, watching position sequence attentively for position attentively on text;
First determining module 71 respectively watches position corresponding word on text attentively for determination respectively, and according to fixation position
Sequence is set, determines corresponding sequence of terms;
Computing module 72, for according to determine each word, the sequence of terms and respectively watch position attentively watch duration attentively,
Calculate separately the weighted value of each word;
Second determining module 73 determines deactivated word set for the weighted value according to each word.
Optionally, according to determine each word, sequence of terms and respectively watch position attentively watch duration attentively, calculate separately described
When the weighted value of each word, computing module 72 is specifically used for:
According to determine each word, the sequence of terms and respectively watch position attentively watch duration attentively, establish directed cyclic graph,
In the directed cyclic graph each node be respectively determine each word, node size be watch attentively duration, each node according to
Sequence of terms is attached, and the length on the connection side between the word being connected in the directed cyclic graph is corresponding described
Distance in text;
It determines the interstitial content that each word is connected in the directed cyclic graph, and determines each word in institute
State the text weight in text;
According to each word corresponding text weight, the interstitial content watching duration attentively, be connected, determine each word respectively
Weighted value, wherein the text weight, it is described watch attentively duration, the interstitial content being connected with the weighted value at
It is positively correlated.
Optionally, according to each word determined, the sequence of terms and respectively watch the duration of watching attentively of position attentively, described in foundation
When directed cyclic graph, computing module 72 is specifically used for:
It deletes and watches the word that duration is not less than preset duration attentively;
The word segmentation result of the text is obtained, and by the word segmentation result and respectively watches the corresponding word in position attentively and compares
It is right, it is adjusted to the corresponding word in position is respectively watched attentively, so that word adjusted can match point in the word segmentation result
Word;
According to deletion and each word adjusted, watch duration and the sequence of terms attentively, establishes the directed cyclic graph.
Optionally, watch the corresponding word in position attentively by the word segmentation result and respectively to be compared, it is corresponding to position is respectively watched attentively
Word when being adjusted, computing module 72 is specifically used for:
Watch the corresponding word in position attentively if it is determined that adjacent and belong to a participle, then adjacent watches the corresponding word in position attentively for described
Language merges;
If it is determined that any one watch attentively the corresponding word in position include multiple participles, then by it is described any one watch position attentively
Corresponding word is split;
If it is determined that it is blank that any one, which watches the corresponding word in position attentively, then the corresponding word in position is watched attentively described in deletion.
Optionally, according to each word corresponding text weight, the interstitial content watching duration attentively, be connected, determine institute respectively
When stating the weighted value of each word, computing module 72 is specifically used for:
The product for determining the corresponding text weight of each word respectively, watching duration with the interstitial content being connected attentively, will
Weighted value of the product as corresponding word.
Optionally, according to the weighted value of each word, when determining deactivated word set, the second determining module 73 is specifically used for:
The word that weighted value is less than setting value is filtered out, according to the word filtered out, determines deactivated word set.
It is schematical, only a kind of logical function partition to the division of module in the embodiment of the present application, it is practical to realize
When there may be another division manner, in addition, each functional module in the embodiment of the present application can integrate in a processor
In, it is also possible to physically exist alone, can also be integrated in two or more modules in a module.Above-mentioned integrated mould
Block both can take the form of hardware realization, can also be realized in the form of software function module.
Based on the above embodiment, the structural schematic diagram of electronic equipment in the embodiment of the present application is shown refering to Fig. 8.
The embodiment of the present application provides a kind of electronic equipment, which may include 810 (Center of processor
Processing Unit, CPU), memory 820, input equipment 830 and output equipment 840 etc., input equipment 830 may include
Keyboard, mouse, touch screen etc., output equipment 840 may include display equipment, such as liquid crystal display (Liquid Crystal
Display, LCD), cathode-ray tube (Cathode Ray Tube, CRT) etc..
Memory 820 may include read-only memory (ROM) and random access memory (RAM), and mention to processor 810
For the program instruction and data stored in memory 820.In the embodiment of the present application, memory 820 can be used for storing this Shen
Please in embodiment any stop words method for digging program.
Processor 810 is by the program instruction for calling memory 820 to store, and processor 810 is for the program according to acquisition
Any stop words method for digging in instruction execution the embodiment of the present application.
Based on the above embodiment, in the embodiment of the present application, a kind of computer readable storage medium is provided, is stored thereon with
Computer program, the computer program realize the stop words excavation side in above-mentioned any means embodiment when being executed by processor
Method.
It should be understood by those skilled in the art that, embodiments herein can provide as method, system or computer program
Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the application
Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the application, which can be used in one or more,
The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) produces
The form of product.
The application is referring to method, the process of equipment (system) and computer program product according to the embodiment of the present application
Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions
The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs
Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce
A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real
The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy
Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates,
Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or
The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting
Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or
The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one
The step of function of being specified in a box or multiple boxes.
Although the preferred embodiment of the application has been described, it is created once a person skilled in the art knows basic
Property concept, then additional changes and modifications may be made to these embodiments.So it includes excellent that the following claims are intended to be interpreted as
It selects embodiment and falls into all change and modification of the application range.
Obviously, those skilled in the art can carry out various modification and variations without departing from this Shen to the embodiment of the present application
Please embodiment spirit and scope.In this way, if these modifications and variations of the embodiment of the present application belong to the claim of this application
And its within the scope of equivalent technologies, then the application is also intended to include these modifications and variations.
Claims (10)
1. a kind of stop words method for digging characterized by comprising
Obtain the reading behavior data that the user's eye traced into is directed to text, wherein the reading behavior data include at least
Respectively watching position attentively, respectively watching watching duration attentively, watching position sequence attentively for position attentively on text;
Position corresponding word on text is respectively watched in determination attentively respectively, and according to position sequence is watched attentively, determines corresponding word sequence
Column;
According to determine each word, the sequence of terms and respectively watch position attentively watch duration attentively, calculate separately each word
Weighted value;
According to the weighted value of each word, deactivated word set is determined.
2. the method as described in claim 1, which is characterized in that according to each word, sequence of terms and each fixation position determined
That sets watches duration attentively, calculates separately the weighted value of each word, specifically includes:
According to determine each word, the sequence of terms and respectively watch position attentively watch duration attentively, establish directed cyclic graph, it is described
In directed cyclic graph each node be respectively determine each word, node size be watch duration, each node attentively according to word
Sequence is attached, and the length on the connection side between the word being connected in the directed cyclic graph is corresponding in the text
In distance;
It determines the interstitial content that each word is connected in the directed cyclic graph, and determines each word in the text
Text weight in this;
According to each word corresponding text weight, the interstitial content watching duration attentively, be connected, determine the power of each word respectively
Weight values, wherein the text weight described watches duration, the interstitial content being connected attentively with the weighted value at positive
It closes.
3. method according to claim 2, which is characterized in that according to each word, the sequence of terms and each note determined
Duration is watched attentively depending on position, is established the directed cyclic graph, is specifically included:
It deletes and watches the word that duration is not less than preset duration attentively;
The word segmentation result of the text is obtained, and by the word segmentation result and respectively watches the corresponding word in position attentively and is compared, it is right
Respectively watch the corresponding word in position attentively to be adjusted, so that word adjusted can match the participle in the word segmentation result;
According to deletion and each word adjusted, watch duration and the sequence of terms attentively, establishes the directed cyclic graph.
4. method as claimed in claim 3, which is characterized in that by the word segmentation result and respectively watch attentively the corresponding word in position into
Row compares, and is adjusted, specifically includes to the corresponding word in position is respectively watched attentively:
Watch the corresponding word in position attentively if it is determined that adjacent and belong to a participle, then adjacent watch the corresponding word in position attentively by described and close
And;
If it is determined that it includes multiple participles that any one, which watches the corresponding word in position attentively, then by it is described any one to watch position attentively corresponding
Word split;
If it is determined that it is blank that any one, which watches the corresponding word in position attentively, then the corresponding word in position is watched attentively described in deletion.
5. method according to claim 2, which is characterized in that according to the corresponding text weight of each word, watch duration attentively, connected
The interstitial content connect determines the weighted value of each word respectively, specifically includes:
The product for determining the corresponding text weight of each word respectively, watching duration with the interstitial content being connected attentively, will be described
Weighted value of the product as corresponding word.
6. the method as described in claim 1, which is characterized in that according to the weighted value of each word, determine deactivated word set, tool
Body includes:
The word that weighted value is less than setting value is filtered out, according to the word filtered out, determines deactivated word set.
7. a kind of stop words excavating gear characterized by comprising
Module is obtained, the reading behavior data for being directed to text for obtaining the user's eye traced into, wherein the reading behavior
Data include at least respectively watching position attentively, respectively watching watching duration attentively, watching position sequence attentively for position attentively on text;
First determining module respectively watches position corresponding word on text attentively for determining respectively, and according to watching position sequence attentively,
Determine corresponding sequence of terms;
Computing module, each word determined for basis, the sequence of terms and the duration of watching attentively for respectively watching position attentively, is counted respectively
Calculate the weighted value of each word;
Second determining module determines deactivated word set for the weighted value according to each word.
8. device as claimed in claim 7, which is characterized in that according to each word, sequence of terms and each fixation position determined
That sets watches duration attentively, and when calculating separately the weighted value of each word, computing module is specifically used for:
According to determine each word, the sequence of terms and respectively watch position attentively watch duration attentively, establish directed cyclic graph, it is described
In directed cyclic graph each node be respectively determine each word, node size be watch duration, each node attentively according to word
Sequence is attached, and the length on the connection side between the word being connected in the directed cyclic graph is corresponding in the text
In distance;
It determines the interstitial content that each word is connected in the directed cyclic graph, and determines each word in the text
Text weight in this;
According to each word corresponding text weight, the interstitial content watching duration attentively, be connected, determine the power of each word respectively
Weight values, wherein the text weight described watches duration, the interstitial content being connected attentively with the weighted value at positive
It closes.
9. a kind of electronic equipment including memory, processor and stores the calculating that can be run on a memory and on a processor
Machine program, which is characterized in that the processor realizes the step of any one of claim 1-6 the method when executing described program
Suddenly.
10. a kind of computer readable storage medium, is stored thereon with computer program, it is characterised in that: the computer program
The step of any one of claim 1-6 the method is realized when being executed by processor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910721384.9A CN110457699B (en) | 2019-08-06 | 2019-08-06 | Method and device for mining stop words, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910721384.9A CN110457699B (en) | 2019-08-06 | 2019-08-06 | Method and device for mining stop words, electronic equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110457699A true CN110457699A (en) | 2019-11-15 |
CN110457699B CN110457699B (en) | 2023-07-04 |
Family
ID=68485058
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910721384.9A Active CN110457699B (en) | 2019-08-06 | 2019-08-06 | Method and device for mining stop words, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110457699B (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111130846A (en) * | 2019-11-26 | 2020-05-08 | 腾讯科技(深圳)有限公司 | Target object determination method and device and storage medium |
CN111680503A (en) * | 2020-06-08 | 2020-09-18 | 腾讯科技(深圳)有限公司 | Text processing method, device and equipment and computer readable storage medium |
CN112954209A (en) * | 2021-02-08 | 2021-06-11 | 维沃移动通信(杭州)有限公司 | Photographing method and device, electronic equipment and medium |
CN113537116A (en) * | 2021-07-27 | 2021-10-22 | 重庆国翔创新教学设备有限公司 | Reading material-matched auxiliary learning system, method, equipment and storage medium |
CN114625857A (en) * | 2022-03-23 | 2022-06-14 | 南京硅基智能科技有限公司 | Prompter, English text tracking method, storage medium and electronic equipment |
CN115238683A (en) * | 2022-08-09 | 2022-10-25 | 平安科技(深圳)有限公司 | Method, device, equipment and medium for recognizing stop words circularly and automatically paying attention |
CN115292477A (en) * | 2022-07-18 | 2022-11-04 | 盐城金堤科技有限公司 | Method and device for judging pushing similar articles, storage medium and electronic equipment |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102609422A (en) * | 2011-01-25 | 2012-07-25 | 阿里巴巴集团控股有限公司 | Class misplacing identification method and device |
CN103902552A (en) * | 2012-12-25 | 2014-07-02 | 深圳市世纪光速信息技术有限公司 | Stop word mining method and device, searching method and device, and evaluating method and device |
US20160062458A1 (en) * | 2014-09-02 | 2016-03-03 | Tobii Ab | Gaze based text input systems and methods |
US20170139899A1 (en) * | 2015-11-18 | 2017-05-18 | Le Holdings (Beijing) Co., Ltd. | Keyword extraction method and electronic device |
WO2017157200A1 (en) * | 2016-03-17 | 2017-09-21 | 阿里巴巴集团控股有限公司 | Characteristic keyword extraction method and device |
CN109408826A (en) * | 2018-11-07 | 2019-03-01 | 北京锐安科技有限公司 | A kind of text information extracting method, device, server and storage medium |
US20190080623A1 (en) * | 2017-09-14 | 2019-03-14 | Massachusetts Institute Of Technology | Eye Tracking As A Language Proficiency Test |
CN109800434A (en) * | 2019-01-25 | 2019-05-24 | 陕西师范大学 | Abstract text header generation method based on eye movement attention |
CN109948121A (en) * | 2017-12-20 | 2019-06-28 | 北京京东尚科信息技术有限公司 | Article similarity method for digging, system, equipment and storage medium |
CN110059311A (en) * | 2019-03-27 | 2019-07-26 | 银江股份有限公司 | A kind of keyword extracting method and system towards judicial style data |
-
2019
- 2019-08-06 CN CN201910721384.9A patent/CN110457699B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102609422A (en) * | 2011-01-25 | 2012-07-25 | 阿里巴巴集团控股有限公司 | Class misplacing identification method and device |
CN103902552A (en) * | 2012-12-25 | 2014-07-02 | 深圳市世纪光速信息技术有限公司 | Stop word mining method and device, searching method and device, and evaluating method and device |
US20160062458A1 (en) * | 2014-09-02 | 2016-03-03 | Tobii Ab | Gaze based text input systems and methods |
US20170139899A1 (en) * | 2015-11-18 | 2017-05-18 | Le Holdings (Beijing) Co., Ltd. | Keyword extraction method and electronic device |
WO2017157200A1 (en) * | 2016-03-17 | 2017-09-21 | 阿里巴巴集团控股有限公司 | Characteristic keyword extraction method and device |
US20190080623A1 (en) * | 2017-09-14 | 2019-03-14 | Massachusetts Institute Of Technology | Eye Tracking As A Language Proficiency Test |
CN109948121A (en) * | 2017-12-20 | 2019-06-28 | 北京京东尚科信息技术有限公司 | Article similarity method for digging, system, equipment and storage medium |
CN109408826A (en) * | 2018-11-07 | 2019-03-01 | 北京锐安科技有限公司 | A kind of text information extracting method, device, server and storage medium |
CN109800434A (en) * | 2019-01-25 | 2019-05-24 | 陕西师范大学 | Abstract text header generation method based on eye movement attention |
CN110059311A (en) * | 2019-03-27 | 2019-07-26 | 银江股份有限公司 | A kind of keyword extracting method and system towards judicial style data |
Non-Patent Citations (4)
Title |
---|
张俊伟;杨柳;王硕宁;王忠建;: "基于文本挖掘的商品推荐", 哈尔滨商业大学学报(自然科学版), no. 04 * |
张婷婷;王伟军;黄英辉;刘凯;胡祥恩;: "基于屏幕视觉热区的中文短文本关键词实时提取方法", 情报学报, no. 12 * |
王继钢;: "文本挖掘重点技术研究", 漯河职业技术学院学报, no. 05 * |
赵永威;周苑;李弼程;柯圣财;: "基于近义词自适应软分配和卡方模型的图像目标分类方法", 电子学报, no. 09 * |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111130846A (en) * | 2019-11-26 | 2020-05-08 | 腾讯科技(深圳)有限公司 | Target object determination method and device and storage medium |
CN111130846B (en) * | 2019-11-26 | 2021-09-14 | 腾讯科技(深圳)有限公司 | Target object determination method and device and storage medium |
CN111680503A (en) * | 2020-06-08 | 2020-09-18 | 腾讯科技(深圳)有限公司 | Text processing method, device and equipment and computer readable storage medium |
CN112954209A (en) * | 2021-02-08 | 2021-06-11 | 维沃移动通信(杭州)有限公司 | Photographing method and device, electronic equipment and medium |
CN113537116A (en) * | 2021-07-27 | 2021-10-22 | 重庆国翔创新教学设备有限公司 | Reading material-matched auxiliary learning system, method, equipment and storage medium |
CN114625857A (en) * | 2022-03-23 | 2022-06-14 | 南京硅基智能科技有限公司 | Prompter, English text tracking method, storage medium and electronic equipment |
CN114625857B (en) * | 2022-03-23 | 2023-08-25 | 南京硅基智能科技有限公司 | Prompter, english text tracking method, storage medium and electronic equipment |
CN115292477A (en) * | 2022-07-18 | 2022-11-04 | 盐城金堤科技有限公司 | Method and device for judging pushing similar articles, storage medium and electronic equipment |
CN115292477B (en) * | 2022-07-18 | 2024-04-16 | 盐城天眼察微科技有限公司 | Method and device for judging push similar articles, storage medium and electronic equipment |
CN115238683A (en) * | 2022-08-09 | 2022-10-25 | 平安科技(深圳)有限公司 | Method, device, equipment and medium for recognizing stop words circularly and automatically paying attention |
CN115238683B (en) * | 2022-08-09 | 2023-06-20 | 平安科技(深圳)有限公司 | Method, device, equipment and medium for recognizing stop words of circulating self-attention |
Also Published As
Publication number | Publication date |
---|---|
CN110457699B (en) | 2023-07-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110457699A (en) | A kind of stop words method for digging, device, electronic equipment and storage medium | |
RU2714096C1 (en) | Method, equipment and electronic device for detecting a face vitality | |
CN109766445B (en) | Knowledge graph construction method and data processing device | |
CN113257383B (en) | Matching information determination method, display method, device, equipment and storage medium | |
JP2021514087A (en) | Connected kiosk for real-time assessment of fall risk | |
US20120106793A1 (en) | Method and system for improving the quality and utility of eye tracking data | |
WO2022161234A1 (en) | Image processing method and apparatus, and electronic device and storage medium | |
CN103309960B (en) | The method and device that a kind of multidimensional information of network public sentiment event is extracted | |
CN112104642B (en) | Abnormal account number determination method and related device | |
Ehlers et al. | Advancing digital earth: beyond the next generation | |
CN108281197A (en) | A method of relationship between analysis environmental factor and juvenile shortsightedness | |
CN108415653A (en) | Screen locking method and device for terminal device | |
CN117011859A (en) | Picture processing method and related device | |
CN118035945B (en) | Label recognition model processing method and related device | |
CN112037305B (en) | Method, device and storage medium for reconstructing tree-like organization in image | |
CN118860156A (en) | A VR interaction method and device based on metaverse virtual reality technology | |
CN112163095A (en) | Data processing method, apparatus, equipment and storage medium | |
CN111797175A (en) | Data storage method and device, storage medium and electronic equipment | |
CN117274448A (en) | Method, device, electronic equipment and medium for generating action animation of virtual model | |
CN110147464A (en) | Video recommendation method, device, electronic equipment and readable storage medium storing program for executing | |
Alqahtani et al. | An agent-based intelligent HCI information system in mixed reality | |
CN114445757B (en) | Nomination acquisition method, network training method, device, storage medium and equipment | |
CN111950575A (en) | Device and method for fall detection | |
CN117576245B (en) | Method and device for converting style of image, electronic equipment and storage medium | |
CN113658713B (en) | Infection tendency prediction method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |