CN108268438A - A kind of content of pages extracting method, device and client - Google Patents
A kind of content of pages extracting method, device and client Download PDFInfo
- Publication number
- CN108268438A CN108268438A CN201611260567.8A CN201611260567A CN108268438A CN 108268438 A CN108268438 A CN 108268438A CN 201611260567 A CN201611260567 A CN 201611260567A CN 108268438 A CN108268438 A CN 108268438A
- Authority
- CN
- China
- Prior art keywords
- alternative word
- character
- word
- alternative
- sentence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/22—Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
- G06V10/235—Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition based on user input or interaction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/048—Interaction techniques based on graphical user interfaces [GUI]
- G06F3/0481—Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/048—Interaction techniques based on graphical user interfaces [GUI]
- G06F3/0484—Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/28—Character recognition specially adapted to the type of the alphabet, e.g. Latin alphabet
- G06V30/287—Character recognition specially adapted to the type of the alphabet, e.g. Latin alphabet of Kanji, Hiragana or Katakana characters
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Multimedia (AREA)
- Machine Translation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention provides a kind of content of pages extracting method, device and client, the method includes:Obtain the selected region in the page;The character in the selected region is identified one by one, obtains the first sentence for including the character, splits first sentence to obtain alternative word;The alternative word is ranked up using at least one attribute of the alternative word, obtains ranking results;The highest alternative word that sorts in the ranking results is chosen as target alternative word, and extracts the target alternative word.The present invention by first sentence splits and can fast and effeciently be extracted using the sequence of alternative word attribute the content of pages of client's selection, and the content of extraction is more accurate, and user is avoided to also need to manually adjust after selection, saves the time, improves user experience.
Description
Technical field
The present invention relates to a kind of Internet technical field more particularly to content of pages extracting method, device and clients.
Background technology
With the fast development of mobile Internet, daily life is closely coupled with internet so that produces internet
The data information of magnanimity has been given birth to, has become the main source of acquisition of information, this has penetrated into the every field of network extensively.
Gradually, people are more and more for the demand of information analysis and information processing, wherein, user is using client
When equipment reads web page text, often have duplication text and carry out the demand of other operations, for example retrieved or pasted
It is further edited to dialog box;Since people are higher and higher for accuracy and the promptness requirement of information analysis, so user
The completion text for wanting to efficiently and accurately replicates.
In the prior art, for user in text selection and when replicating, it is slow that some will appear signature velocity, when causing the operation to complete
Between it is long;The content that some occurs wanting to replicate is not in the selection of acquiescence, it is impossible to which the content replicated, user's body are wanted in correct selection
Test difference;Some will appear needs repeatedly adjustment selection flasher could choose in addition will appear repeatedly adjustment after, still cannot be correct
The situation of word that user wants is replicated, operating efficiency is low.
Invention content
In order to solve the above-mentioned technical problem, the present invention proposes a kind of content of pages extracting method, device and client.
In a first aspect, a kind of content of pages extracting method is provided, the method includes:It obtains selected in the page
Middle region;The character in the selected region is identified one by one, obtains the first sentence for including the character, splits first sentence to obtain
To alternative word;The alternative word is ranked up using at least one attribute of the alternative word, obtains ranking results;According to institute
It states ranking results and chooses target alternative word, and extract the target alternative word..
Second aspect provides content of pages extraction element, the method includes:Region acquisition module, for obtaining
State the selected region in the page;Alternative word generation module for identifying the character in the selected region one by one, obtains packet
First sentence is split as alternative word by first sentence containing the character;Attribute sorting module, for according to the multiple of the alternative word
Attribute is ranked up the alternative word, obtains ranking results;Content of pages extraction module, for being selected according to the ranking results
Target alternative word is taken, and extracts the target alternative word.
The third aspect, provides a kind of client, and the client includes aforementioned content of pages extraction element, the client
End is installed in user terminal, for extracting content of pages according to the input of user.
The advantageous effect that technical solution provided in an embodiment of the present invention is brought includes:Based on first sentence is split as alternative word
Alternative word, which is ranked up, at least one attribute using alternative word quickly and accurately to extract in user's chosen area
Hold, the operations such as user facilitated to be replicated, is searched for, greatly promoting user experience.
Description of the drawings
To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment
Attached drawing is briefly described, it should be apparent that, the accompanying drawings in the following description is only some embodiments of the present invention, for
For those of ordinary skill in the art, without creative efforts, other are can also be obtained according to these attached drawings
Attached drawing.
Fig. 1 is application scenarios schematic diagram provided in an embodiment of the present invention.
Fig. 2 is the method flow diagram of content of pages extracting method provided in an embodiment of the present invention;
Fig. 3 is the method flow diagram of content of pages extracting method provided in an embodiment of the present invention;
Fig. 4 is the method flow diagram of content of pages extracting method provided in an embodiment of the present invention;
Fig. 5 is the method flow diagram of content of pages extracting method provided in an embodiment of the present invention;
Fig. 6 is the method flow diagram of content of pages extracting method provided in an embodiment of the present invention;
Fig. 7 is the method flow diagram of content of pages extracting method provided in an embodiment of the present invention;
Fig. 8 is the method flow diagram of content of pages extracting method provided in an embodiment of the present invention;
Fig. 9 is the principle of device block diagram of content of pages extraction element provided in an embodiment of the present invention;
Figure 10 is the principle of device block diagram of content of pages extraction element provided in an embodiment of the present invention;
Figure 11 is terminal structure schematic diagram provided in an embodiment of the present invention.
Specific embodiment
In order to which those skilled in the art is made to more fully understand the present invention program, below in conjunction in the embodiment of the present invention
The technical solution in the embodiment of the present invention is clearly and completely described in attached drawing, it is clear that described embodiment is only
The embodiment of a part of the invention, instead of all the embodiments.Based on the embodiments of the present invention, ordinary skill people
Member's all other embodiments obtained without making creative work should all belong to the model that the present invention protects
It encloses.
An embodiment of the present invention provides a kind of methods of mobile equipment evaluation and test, please refer to Fig.1, it illustrates the present invention to implement
The structure diagram of the implementation environment involved by content of pages extracting method that example provides.The implementation environment includes being configured with to be evaluated
User equipment 101, the user equipment 101 of survey, which can be shown, includes the page to be extracted, and user can choose content of pages
Operation.User equipment can show the content of selection according to the selection of user.
In one embodiment of the invention, a kind of content of pages extracting method is provided, as shown in Fig. 2, the method packet
It includes:
S210 obtains the selected region in the page.
Specifically, client can obtain selected areas of the user's operation in the page by man-machine interface.For example, this is selected
Middle region can be that user is pressed on the region chosen on touch interface by finger.For example, the selected region can also
It is the region using input tools such as writing pencils in interface by drawing or click is chosen.
S220 identifies the character in the selected region one by one, obtains the first sentence for including the character, by first sentence
It is split as alternative word.
Specifically, client can identify the character included in selected region, these characters may be all by comprising
Complete character in selected region, it is also possible to the incomplete character being partially contained in selected region.Selected area
Domain refers to user by pressing, touching, the selected areas that modes are formed on a user interface such as sliding, if character completely includes
In selected region, then for selected areas, which is complete, if character is located exactly at selected areas
Boundary, partly in selected areas, partly outside selected areas, then for selected areas, which is
Incomplete, either complete character or incomplete character can all be identified as the character in selected region,
In identification process, the complete character and incomplete character are distinguish using flag.
In one example, for the complete character being all contained in selected region, carry out table using flag 1
Show, for incomplete character, represented using flag 0.
In another example, the integrated degree of character can also be represented using the mark bit value of quantization, for complete
Portion is comprised in the complete character in selected region, is represented using mark bit value 1, for incomplete character, uses mark
Bit value X is known to represent, X is a numerical value between 0 to 1, which represents that incomplete character is accounted for comprising corresponding
Complete character area.
In one example, first sentence comprising character is by being obtained in the corresponding location retrieval of content of pages.First sentence
It is character string where the character, being divided by adjacent punctuate, such as content of pages " AAAAAA, BBBBBBB,
CCCCCCCCCCCDDDDDDD、EEE、FF;G, HHHHHHH;IIIIIIIIII”.It is wherein included member sentence be respectively
“AAAAAA”、“BBBBBBB”、“CCCCCCCCCCC”、“DDDDDDD”、“EEE”、“FF”、“G”、“HHHHHHH”、
“IIIIIIIIII”.Wherein A, B, C, D, E, F, G, H, I represents the character in each first sentence, character can it is identical can not also
Together.
Specifically, first sentence is split as alternative word by client, and using different participle techniques, existing skill may be used
Participle technique in art can also use the modified participle technique in such as the present embodiment.Word is minimum, can independently live
Dynamic, significant language element;It is using space as nature delimiter between English word, however Chinese is using word as base
This grapheme is no apparent separator, and Chinese lexical analysis is Chinese information processing between the word of Chinese
The basis of technology and key.Therefore, it needs to select ripe participle technique when to Chinese information processing.The client of the present embodiment
End splits every words in the object statement using ripe participle technique, and every words are split as an alternative word
Phrase, wherein each alternative word phrase includes multiple alternative words.
In one example, alternative word is split to include:Setting split alternative word maximum particle size, granularity be split out it is standby
Select the number of character that word is included.Read the continuation character string in first sentence;According to sequence from left to right by the company
Continuous character string is matched with default vocabulary;When the character string of the first length in the continuation character string is matched with default vocabulary,
Judge that the first length adds whether the character string of 1 length matches with default vocabulary;If it is not, using the character string of first length as standby
Word is selected, and the character string of first length is cut off from the continuation character string, is continued using the continuation character string after excision
Matching;If so, the first length is added 1 as being updated to the first length, and continue to judge the character that first length adds 1 length
The step of whether string matches with default vocabulary.
In one example, alternative word is split to include:Setting split alternative word maximum particle size, granularity be split out it is standby
Select the number of character that word is included.Read the continuation character string in first sentence;According to sequence from right to left by the company
Continuous character string is matched with default vocabulary;When the character string of the first length in the continuation character string is matched with default vocabulary,
Judge that the first length adds whether the character string of 1 length matches with default vocabulary;If it is not, using the character string of first length as standby
Word is selected, and the character string of first length is cut off from the continuation character string, is continued using the continuation character string after excision
Matching;If so, the first length is added 1 as being updated to the first length, and continue to judge the character that first length adds 1 length
The step of whether string matches with default vocabulary.
In another example, the split process in the first two example is repeated, and according to granularity maximum principle and fractionation
Word quantity minimum principle come select output split result.Such as first sentence " we Safari Park play ", according to from a left side to
The right alternative word that splits out of matching is split out alternative for " we/out of office/lively/object/garden/object for appreciation " according to the right sequence to a left side
Word is " we/in/Safari Park/object for appreciation ", according to granularity maximum principle, select " we/are in/Safari Park/object for appreciation " as
Export result.
As it can be seen that the alternative word method for splitting of the present embodiment can improve the accuracy selected for alternative word, so as to improve
The accuracy of content of pages extraction.
S230 is ranked up the alternative word according to multiple attributes of the alternative word, obtains ranking results.
Alternative word includes a variety of attributes, for example, the use temperature of alternative word, the part of speech of alternative word, the word that alternative word includes
Number etc. is accorded with, differentiation and the importance sorting that content is chosen for user can be realized using the attribute of word, so as to be more prone to
Identify and extract the content of pages selected by user.
In the present embodiment, using the magnetism of the integrated degree of character, the temperature of alternative word and alternative word in alternative word
The alternative word after fractionation is ranked up, hereinafter referred to as the first attribute, the second attribute and third attribute.
During alternative word attribute is used to be ranked up alternative word, first with the integrity degree attribute in alternative word
Value carries out the first minor sort, and chosen position when alternative word integrity degree attribute reflection user selects content of pages, is pair
The most important index of content of pages extraction.As previously mentioned, it can represent the complete journey of character using the mark bit value of quantization
Degree, for the complete character being all contained in selected region, is represented using flag 1, for incomplete character, made
It is represented with flag X, X is a numerical value between 0 to 1, which represents that incomplete character is accounted for comprising corresponding
Complete character area.The integrity degree property value of so one word is exactly the average value of the sum of each character integrity degree in the word.
For example, the integrity degree of each character is respectively X1, X2, X3, X4 and X5 in alternative word " Safari Park ", then the integrity degree of the word
For (X1+X2+X3+X4+X5) divided by 5.Summarizing integrity degree formula is:
Wherein, I represents the character ordinal number in alternative word, and n represents the number of alternative word, and XI represents the complete of i-th character
Degree.
In the above examples, if the value of X1-X5 is respectively 0.6,1,1,1,0.8, then integrity degree formula is:
It is also wrapped after the alternative word is ranked up
It includes:Judge whether the integrity degree of the alternative word is more than the first predetermined threshold value, if the integrity degree of alternative word is too low, show that the word is inclined
Center from user's chosen area, can be screened by the threshold value and non-user selects word.In one example, if described in setting
First predetermined threshold value is the 50% of the chosen area area, then in the multiple alternative word, has integrity degree to be more than the choosing
If 50% alternative word for taking region area, such alternative word is just stored in the first alternative phrase by client, described
Alternative word in first alternative phrase is exactly to carry out the object of the second minor sort.
In one example, client obtains the integrity degree ranking results of the alternative word, and will sort highest alternative word
As target alternative word.
In one example, using the temperature and part of speech of the alternative word, to described first, alternative phrase carries out again client
Minor sort obtains the ranking results.Wherein, there are priority, the words of highest priority for the alternative word in the ranking results
Language is exactly target word, and the temperature of the alternative word is the number that the alternative word is searched in hot word service;The alternative word
Part of speech the characteristics of being the word for Part of Speech Division, wherein, hot word service has for search engine or input method etc. with hot word
The service of pass.
S240 chooses target alternative word, and extract the target alternative word according to the ranking results.
Specifically, each alternative word in the ranking results there will necessarily be sequence, can by the sequence of each alternative word come
It chooses coverage higher and meets one or several alternative words of temperature and part of speech as target alternative word.
In one example, it is one by the target alternative word that ranking results select, in page saliency, the target is standby
Word is selected, while replicates the target alternative word, user can be directed to the target word that client replicates and carry out target word relevant operation, than
It such as affixes to chat conversations frame and carries out coordinate indexing into edlin or to the target word of duplication.
In one example, the target alternative word selected by ranking results is multiple, in the multiple mesh of page saliency
Alternative word is marked, and user is waited for carry out selection operation;Client selects to replicate the target alternative word according to user, and user can be with needle
Target word relevant operation is carried out to the target word that client replicates, for example affixes to chat conversations frame into edlin or to replicating
Target word carry out coordinate indexing.
The alternative word on the text is highlighted and is marked by client, and the word for being highlighted label is exactly user
Target word, further client replicate the target word
In conclusion the present embodiment provides split by first sentence and can fast and effeciently be carried using the sequence of alternative word attribute
The content of pages that client chooses is taken out, the content of extraction is more accurate, and user is avoided to also need to manually adjust after selection, saves
Time improves user experience.
It please refers to Fig.3, the present embodiment proposes a kind of content of pages extracting method, includes the following steps:
S310. the selected region in the page is obtained.
For example, if the object of user's operation be cell-phone customer terminal, user browsing webpage process need to text into
Row replicates, then user is operated on the touch screen of cell-phone customer terminal, and the finger face of user is in contact to obtain with touching screen
One annular chosen area, as shown in figure 3, the annular region in Fig. 3 is exactly chosen area in the text.
S320. it identifies the character in the chosen area, obtains the corresponding all sentences of the character, and delete the institute
There is the sentence repeated in sentence, obtain object statement.
Step S320 includes following sub-step:
S3201 identifies the character in the chosen area.Fig. 5 is please referred to, which includes:
S32011 identifies the complete character in the selected region, increases complete character mark for the complete character
Position.
S32012 identifies the incomplete character in the selected region, increases incomplete word for the incomplete character
Accord with flag.
In step s 320, client is identified all characters in chosen area by character acquiring technology,
It please refers to Fig.4, the character belonged in the chosen area has:
【2nd, ten, state, collection, go out, color】
Wherein, " ten " are the complete character in chosen area, " two, state, integrate, go out, color " to be non-complete in chosen area
Whole character.Respectively these characters increase character mark position, for representing whether character is the complete of complete character or character
Degree.
S3202 obtains the corresponding all first sentences of the character.Fig. 6 is please referred to, which includes:
S32021 retrieves the character in the content of pages, to obtain each in the selected region
Multiple first sentences corresponding to character.
S32022 inquires the multiple first sentence, to judge in the multiple first sentence with the presence or absence of the first sentence repeated.
S32023, if so, deleting first sentence of the repetition.
Specifically, judge first sentence belonging to character be client by the punctuation mark between sentence and sentence for boundary,
The corresponding first sentence of all characters in the chosen area is identified successively.For the sentence repeated in all sentences, client
The sentence repeated is deleted by duplicate removal technology.For example.Still with reference to Fig. 4, " two " corresponding first sentence is " two Group of Tens
The summit of leader Antalya turns out a great success ", " ten " corresponding first sentence is also " two Group of Ten leader Antalya summits
Turn out a great success ", " two " and " ten " corresponding first sentence is identical, then finally only retains one for the sentence repeated, by it
Remaining identical sentence is all deleted;It is identified according to this and duplicate removal, finally obtains the corresponding sentence of the character, that is, target language
Sentence:
【Two Group of Ten leader Antalya summits turn out a great success.
Thank to the outstanding positive achievement to work and obtain of last year Turkey of presiding country again.】
S3203 splits first sentence to obtain alternative word.Fig. 7 is please referred to, which includes following sub-step:
S32031 reads the continuation character string in first sentence;
S32032, according to the matching with default vocabulary by the continuation character string of sequence from left to right;
S32033 when the character string of the first length in the continuation character string is matched with default vocabulary, judges the first length
Whether the character string of degree plus 1 length matches with default vocabulary;
S32034, if it is not, by the character string of first length alternately word, and by the character string of first length from institute
It states and is cut off in continuation character string, continue to match using the continuation character string after excision;
S32035, if so, the first length is added 1 as being updated to the first length, and continue to judge that first length adds 1
The step of whether character string of length matches with default vocabulary.
Specifically, it is split according to the sentence selected in Fig. 4:
" two Group of Ten leader Antalya summits turn out a great success." split result is as follows:
【20 states, group, leader, Antalya, summit are opened, and are obtained, very, success】
" thank to the outstanding positive achievement to work and obtain of last year Turkey of presiding country again." split result is as follows:
【Again, thank, last year, chairman, state, Turkey, remarkably, work and, obtain, actively, achievement.】
S330. the alternative word is ranked up using at least one attribute of the alternative word, obtains ranking results.It is standby
Word is selected to include a variety of attributes, for example, the use temperature of alternative word, the part of speech of alternative word, number of characters that alternative word includes etc., profit
The attribute of word can realize differentiation and the importance sorting that content is chosen for user, so as to be more prone to identify and extract use
Content of pages selected by family.
In one example, alternative word is ranked up using an attribute of alternative word, and then obtains the alternative word
Ranking results.Such as can be ranked up by the integrity degree attribute of character in alternative word, because during character is obtained
The integrity degree attribute of character in alternative word is obtained, according to formula:
Wherein, I represents the character ordinal number in alternative word, and n represents the number of alternative word, and XI represents the complete of i-th character
Degree.The integrity degree numerical value of character in each alternative word can be obtained, can be thus achieved according to integrity degree numerical value and alternative word is carried out
Sequence.
In one example, alternative word is ranked up using an attribute of alternative word, and then obtains the alternative word
Ranking results.Such as can be ranked up by the temperature attribute of character in alternative word, the temperature of alternative word can be according to word
The temperature of hot word label is inquired in library, for the label of temperature then from big data to internet hunt in character library
The collection of engine or instant messaging tools obtains.For example, roast duck, park, caravan hot value be respectively 3,700,000 search values,
1500000 search values and 80 search values, then the temperature sequence of three is followed successively by " roast duck-park-caravan ".
In one example, it is ranked up using two attributes of alternative word or three attributes, which includes making first
It is ranked up with the first attribute, then ranking results is corrected using the second attribute and/or third attribute.Specifically,
Step S330 can include following sub-step at this time:
S3301 carries out priority ranking to the multiple alternative word according to the first property value of the alternative word, obtains the
One ranking results.
S3302, judges whether the first property value of the alternative word is more than the first predetermined threshold value, if so, by described standby
Word is selected to be stored in the first alternative phrase;
S3303, according to the second property value of the alternative word or third property value to standby in the described first alternative phrase
Word is selected to carry out minor sort again, obtains the ranking results.
Integrity degree of first attribute for alternative word is selected, the second attribute is the temperature of alternative word, and third attribute is alternative word
Part of speech when.The first minor sort is carried out according to the integrity degree of alternative word first, then compares integrity degree and preset threshold value
Compared with acquisition integrity degree is higher than the alternative word of threshold value, using these alternative words as the first alternative phrase, later again to the first alternative word
Group is ranked up according to the temperature of alternative word.However there are a kind of situations, still can not be determined only after exactly sorting according to temperature
One alternative word, then be ranked up according still further to the part of speech of alternative word.
Three attribute mentioned before are certainly not limited to, character length included in alternative word etc. can also be used to be joined
With sequence, the attribute sequence of alternative word is also that may be permuted combination, such as the first attribute can be with selected as alternative word
Temperature is ranked up first by temperature, is conducive to directly selecting for network hot word in this way, is improved the efficiency and standard of extraction content
True rate.
In one example, Fig. 8 is please referred to, step S3303 can also include following sub-step:
S33031 obtains the second property value of alternative word described in the first alternative phrase, the alternative word
Second property value and the second predetermined threshold value;
S33032, if there are the alternative word that the second property value is more than second predetermined threshold value, according to the alternative word
The second property value minor sort again is carried out to the alternative word in the described first alternative phrase;
S33033, if there is no the alternative word that the second property value is more than second predetermined threshold value, according to described alternative
The third attribute of word carries out minor sort again to the alternative word in the described first alternative phrase.
Specifically, it is not all hot word in the alternative word in first ranking results, in other words if by judging one by one
Network temperature is told somebody what one's real intentions are, when being not suitable as sort by, then according to the third attribute of the alternative word to first alternative word
Alternative word in group carries out minor sort again, obtains ranking results.
The third attribute includes the part of speech of alternative word, specifically, for the part of speech of alternative word:By answering mass users
For the statistics of behavior processed it is recognised that user wants the reproduction possibilities of noun, adjective and verb higher, wherein noun is highest;
So the sequence being ranked up to the alternative word phrase is:
Noun>Adjective>Verb>Other words
Wherein, other described words include number, quantifier and pronoun etc., since the word of other parts of speech is multiple as user's acquiescence
The possibility very little of content processed, so other words can not have to distinguish.
Such as in Fig. 3, for the temperature of alternative word, it has been searched 10,000 times if " 20 state " is identified, " 20 state "
For hot word, corresponding hot value is 10,000;" if outstanding " is entered, by the way that hot word bank is called to find " outstanding " and hot word, quilt
It has searched for 5000 times, the temperature is 5000;At this point, the sequence for being ranked up to obtain " 20 state " to the two according to hot value is high
In " outstanding ".But if preset heat degree threshold is above 10,000, then at this time hot value not as sequence reference value,
But part of speech is used as the Rule of judgment for judging sequence.
S340 chooses target alternative word, and extract the target alternative word according to the ranking results.
Client can will sort highest alternative word as target alternative word according to ranking results in alternative word, and this is standby
Word is selected to extract.Specifically, extraction can include the operation of two aspects, first, alternative word is replicated, second is that alternative word is pre-
It first replicates in memory.
Specifically, priority in second ranking results is first word by client, is highlighted on the text
It is marked, the word for being highlighted label is exactly the target alternative word of user, and further client replicates the target alternative word.
The mode highlighted can be it is highlighted highlight, color highlights or shape highlights etc..Highlighted highlight refers to change target alternative word
Background color, so as to which the region where the word be made to show in the form of highlighted;Color, which highlights, refers to the word face for changing the word
Color, to highlight in other words;Shape highlights the region shape for referring to change where the font or alternative word of alternative word.
In conclusion content of pages extracting method provided in this embodiment, it can be big being sorted and being screened using more attributes
The big efficiency and accuracy for improving extraction content.For example, after the integrity degree of alternative word sorts to the alternative word, it is further right
Judgement is identified in alternative word in first ranking results, and the temperature of alternative word or the part of speech of alternative word is selected to be arranged again
Sequence, so as to more efficiently copy the operation target of user.
Fig. 9 is please referred to, present embodiments provides a kind of content of pages extraction element, described device includes:
Region acquisition module performs step S210, for obtaining the selected region in the page;
Alternative word generation module performs step S220, for identifying the character in the selected region one by one, obtains packet
First sentence is split as alternative word by first sentence containing the character;
Attribute sorting module, perform step S230, for multiple attributes according to the alternative word to the alternative word into
Row sequence, obtains ranking results;
Content of pages extraction module performs step S240, for using the highest alternative word of the ranking results as target
Alternative word, and extract the target alternative word.
0 is please referred to Fig.1, present embodiments provides a kind of content of pages extraction element, described device includes:
Region acquisition module performs step S310, for obtaining the selected region in the page.
Alternative word generation module performs step S320, for identifying the character in the chosen area, obtains the character
Corresponding all sentences, and the sentence repeated in all sentences is deleted, obtain object statement.
Alternative word generation module includes following submodule:
Character recognition submodule performs step S3201, for identifying the character in the chosen area.
Character recognition submodule includes:
Complete character identifies submodule, performs step S32011, for identifying the complete character in the selected region,
Increase complete character flag for the complete character.
Incomplete character recognition submodule performs step S32012, incomplete in the selected region for identifying
Character increases incomplete character mark position for the incomplete character.
First sentence acquisition submodule performs step S3202, for obtaining the corresponding all first sentences of the character.
This yuan of sentence acquisition submodule, includes following submodule:
First sentence retrieves submodule, performs step S32021, the character is retrieved in the content of pages, with described in acquisition
Multiple first sentences corresponding to each character in selected region.
Submodule is inquired, performs step S32022, inquires the multiple first sentence, to judge whether deposited in the multiple first sentence
In the first sentence repeated.
Duplicate removal submodule performs step S32021, for deleting first sentence of the repetition when existing and repeating first sentence.
First sentence splits submodule and performs step S3203, for splitting first sentence to obtain alternative word.The step is included such as
Lower sub-step:
Character string reading submodule performs step S32031, for reading the continuation character string in first sentence;
Matched sub-block performs step S32032, according to sequence from left to right by the continuation character string with presetting
Vocabulary matches;
First matching judgment submodule performs step S32033, when the character string of the first length in the continuation character string
When being matched with default vocabulary, judge that the first length adds whether the character string of 1 length matches with default vocabulary;
First logic judgment submodule performs step S32034, for being in the judging result of the first matching judgment submodule
When no, by the character string of first length alternately word, and by the character string of first length from the continuation character string
Excision, continues to match using the continuation character string after excision;
Second logic judgment submodule performs step S32035, for being in the judging result of the first matching judgment submodule
When being, the first length is added 1 as being updated to the first length, and continue to judge that first length adds the character string of 1 length to be
No the step of being matched with default vocabulary.
Attribute sorting module performs step S330, for by least one attribute of the alternative word to described alternative
Word is ranked up, and obtains ranking results.Alternative word includes a variety of attributes, for example, the use temperature of alternative word, the word of alternative word
Property, number of characters that alternative word includes etc. can realize differentiation and the importance that content is chosen for user using the attribute of word
Sequence, so as to be more prone to identify and extract the content of pages selected by user.
In one example, attribute sorting module can include following submodule at this time:
First attribute sorting sub-module performs step S3301, according to the first property value of the alternative word to the multiple
Alternative word carries out priority ranking, obtains the first ranking results.
First determined property submodule performs step S3302, judges whether the first property value of the alternative word is more than the
One predetermined threshold value, if so, the alternative word is stored in the first alternative phrase;
Secondary sorting sub-module performs step S3303, according to the second property value of the alternative word or third property value pair
Alternative word in the first alternative phrase carries out minor sort again, obtains the ranking results.
Integrity degree of first attribute for alternative word is selected, the second attribute is the temperature of alternative word, and third attribute is alternative word
Part of speech when.The first minor sort is carried out according to the integrity degree of alternative word first, then compares integrity degree and preset threshold value
Compared with acquisition integrity degree is higher than the alternative word of threshold value, using these alternative words as the first alternative phrase, later again to the first alternative word
Group is ranked up according to the temperature of alternative word.However there are a kind of situations, still can not be determined only after exactly sorting according to temperature
One alternative word, then be ranked up according still further to the part of speech of alternative word.
Three attribute mentioned before are certainly not limited to, character length included in alternative word etc. can also be used to be joined
With sequence, the attribute sequence of alternative word is also that may be permuted combination, such as the first attribute can be with selected as alternative word
Temperature is ranked up first by temperature, is conducive to directly selecting for network hot word in this way, is improved the efficiency and standard of extraction content
True rate.
In one example, secondary sorting module can also include following submodule:
Second attribute thresholds comparison sub-module performs step S33031, obtains alternative described in the first alternative phrase
Second property value of word, second property value of the alternative word and the second predetermined threshold value;
First logic sorting sub-module performs step S33032, there is the second property value more than the described second default threshold
During the alternative word of value, then the alternative word in the described first alternative phrase is carried out again according to the second property value of the alternative word
Sequence;
First logic sorting sub-module, there is no the second property value be more than second predetermined threshold value alternative word when,
Minor sort again is then carried out to the alternative word in the described first alternative phrase according to the third attribute of the alternative word.
Content of pages extraction module performs step S340, for choosing the highest alternative word that sorts in the ranking results
As target alternative word, and extract the target alternative word.
1 is please referred to Fig.1, present embodiments provides a kind of terminal, the terminal can be used for implementing to carry in above-described embodiment
The content of pages extracting method of confession.Specifically:
Terminal 700 can include RF (Radio Frequency, radio frequency) circuit 110, include one or more meters
The memory 120 of calculation machine readable storage medium storing program for executing, input unit 130, display unit 140, sensor 150, voicefrequency circuit 160,
WiFi (wireless fidelity, Wireless Fidelity) module 170, including there are one or more than one processing core processing
The components such as device 180 and power supply 190.It will be understood by those skilled in the art that the terminal structure shown in figure was not formed to end
The restriction at end can include either combining certain components or different components arrangement than illustrating more or fewer components.
Wherein:
RF circuits 110 can be used for receive and send messages or communication process in, signal sends and receivees, particularly, by base station
After downlink information receives, transfer to one or more than one processor 180 is handled;In addition, the data for being related to uplink are sent to
Base station.In general, RF circuits 110 include but not limited to antenna, at least one amplifier, tuner, one or more oscillators, use
Family identity module (SIM) card, transceiver, coupler, LNA (Low Noise Amplifier, low-noise amplifier), duplex
Device etc..In addition, RF circuits 110 can also communicate with network and other equipment by radio communication.The wireless communication can make
With any communication standard or agreement, and including but not limited to GSM (Global System of Mobile communication, entirely
Ball mobile communcations system), GPRS (General Packet Radio Service, general packet radio service), CDMA (Code
Division Multiple Access, CDMA), WCDMA (Wideband Code Division Multiple
Access, wideband code division multiple access), LTE (Long Term Evolution, long term evolution), Email, SMS (Short
Messaging Service, short message service) etc..
Memory 120 can be used for storage software program and module, and processor 180 is stored in memory 120 by operation
Software program and module, so as to perform various functions application and data processing.Memory 120 can mainly include storage journey
Sequence area and storage data field, wherein, storing program area can storage program area, the application program needed for function (for example broadcast by sound
Playing function, image player function etc.) etc.;Storage data field can be stored uses created data (such as sound according to terminal 700
Frequency evidence, phone directory etc.) etc..In addition, memory 120 can include high-speed random access memory, can also include non-volatile
Property memory, a for example, at least disk memory, flush memory device or other volatile solid-state parts.Correspondingly, it deposits
Reservoir 120 can also include Memory Controller, to provide the access of processor 180 and input unit 130 to memory 120.
Input unit 130 can be used for receiving the number inputted or character information and generate and user setting and function
Control related keyboard, mouse, operating lever, optics or the input of trace ball signal.Specifically, input unit 130 may include touching
Sensitive surfaces 131 and other input equipments 132.Touch sensitive surface 131, also referred to as touch display screen or Trackpad are collected and are used
Family on it or neighbouring touch operation (such as user using any suitable object such as finger, stylus or attachment in touch-sensitive table
Operation on face 131 or near touch sensitive surface 131), and corresponding attachment device is driven according to preset formula.It is optional
, touch sensitive surface 131 may include both touch detecting apparatus and touch controller.Wherein, touch detecting apparatus detection is used
The touch orientation at family, and the signal that touch operation is brought is detected, transmit a signal to touch controller;Touch controller is from touch
Touch information is received in detection device, and is converted into contact coordinate, then gives processor 180, and processor 180 can be received
The order sent simultaneously is performed.Furthermore, it is possible to using multiple types such as resistance-type, condenser type, infrared ray and surface acoustic waves
Realize touch sensitive surface 131;In addition to touch sensitive surface 131, input unit 130 can also include other input equipments 132.Specifically,
Other input equipments 132 can include but is not limited to physical keyboard, function key (such as volume control button, switch key etc.),
It is one or more in trace ball, mouse, operating lever etc..
Display unit 140 can be used for display by information input by user or be supplied to the information of user and terminal 700
Various graphical user interface, these graphical user interface can be made of figure, text, icon, video and its arbitrary combination.
Display unit 140 may include display panel 141, optionally, LCD (Liquid Crystal Display, liquid crystal may be used
Show device), the forms such as OLED (Organic Light-Emitting Diode, Organic Light Emitting Diode) display panel is configured
141.Further, touch sensitive surface 131 can cover display panel 141, when touch sensitive surface 131 detects on it or neighbouring touches
After touching operation, processor 180 is sent to determine the type of touch event, is followed by subsequent processing type of the device 180 according to touch event
Corresponding visual output is provided on display panel 141.Although in fig. 11, touch sensitive surface 131 and display panel 141 are conducts
Two independent components realize input and input function, but in some embodiments it is possible to by touch sensitive surface 131 and display
Panel 141 is integrated and realizes and outputs and inputs function.
Terminal 700 may also include at least one sensor 150, such as optical sensor, motion sensor and other sensings
Device.Specifically, optical sensor may include ambient light sensor and proximity sensor, wherein, ambient light sensor can be according to environment
The light and shade of light adjusts the brightness of display panel 141, and proximity sensor can close display when terminal 700 is moved in one's ear
Panel 141 and/or backlight.As one kind of motion sensor, gravity accelerometer can detect in all directions (generally
Three axis) acceleration size, size and the direction of gravity are can detect that when static, can be used to identify terminal posture application (ratio
Such as horizontal/vertical screen switching, dependent game, magnetometer pose calibrating), Vibration identification correlation function (such as pedometer, tap);Extremely
In other sensors such as gyroscope, barometer, hygrometer, thermometer, the infrared ray sensors that terminal 700 can also configure, herein
It repeats no more.
Voicefrequency circuit 160, loud speaker 161, microphone 162 can provide the audio interface between user and terminal 700.Audio
The transformed electric signal of the audio data received can be transferred to loud speaker 161, sound is converted to by loud speaker 161 by circuit 160
Sound signal exports;On the other hand, the voice signal of collection is converted to electric signal by microphone 162, after being received by voicefrequency circuit 160
Audio data is converted to, then after audio data output processor 180 is handled, through RF circuits 110 to be sent to such as another end
Audio data is exported to memory 120 to be further processed by end.Voicefrequency circuit 160 is also possible that earphone jack,
To provide the communication of peripheral hardware earphone and terminal 700.
WiFi belongs to short range wireless transmission technology, and terminal 700 can help user's transceiver electronics by WiFi module 170
Mail, browsing webpage and access streaming video etc., it has provided wireless broadband internet to the user and has accessed.Although Fig. 8 is shown
WiFi module 170, but it is understood that, and must be configured into for terminal 700 is not belonging to, completely it can exist as needed
Do not change in the range of the essence of invention and omit.
Processor 180 is the control centre of terminal 700, utilizes various interfaces and each portion of the entire terminal of connection
Point, it is stored in memory 120 by running or performing the software program being stored in memory 120 and/or module and call
Interior data perform the various functions of terminal 700 and processing data, so as to carry out integral monitoring to terminal.Optionally, processor
180 may include one or more processing cores;Preferably, processor 180 can integrate application processor and modem processor,
Wherein, the main processing operation system of application processor, user interface and application program etc., modem processor mainly handles nothing
Line communicates.It is understood that above-mentioned modem processor can not also be integrated into processor 180.
Terminal 700 further includes the power supply 190 (such as battery) powered to all parts, it is preferred that power supply can pass through electricity
Management system and processor 180 are logically contiguous, so as to realize management charging, electric discharge and power consumption by power-supply management system
The functions such as management.Power supply 190 can also include one or more direct current or AC power, recharging system, power supply event
Hinder the random components such as detection circuit, power supply changeover device or inverter, power supply status indicator.
Although being not shown, terminal 700 can also include camera, bluetooth module etc., and details are not described herein.Specifically in this reality
It applies in example, the display unit of terminal is touch-screen display, and terminal has further included memory and one or more than one
Program, either more than one program is stored in memory and is configured to by one or more than one processing for one of them
Device performs said one or more than one program and includes the instruction for being operated below:
Obtain the chosen area in the text;
It identifies the character in the chosen area, obtains the corresponding sentence of the character;
The sentence is split as multiple alternative words;
Priority ranking is carried out to the multiple alternative word according to alternative word attribute, obtains ranking results;
Target word is marked, and replicate the target word according to the ranking results.
Further, the processor of terminal is additionally operable to perform the instruction operated below:Identify character in the chosen area
Corresponding all sentences;The sentence repeated in all sentences is deleted, obtains the corresponding sentence of the character.
Further, the processor of terminal is additionally operable to perform the instruction operated below:It is torn open using Forward Maximum Method algorithm
Divide the sentence, obtain multiple alternative words.
Further, the processor of terminal is additionally operable to perform the instruction operated below:Belong to according to the first of the alternative word
Property to the multiple alternative word carry out priority ranking, obtain the first ranking results;Judging the first attribute of the alternative word is
It is no to be more than the first predetermined threshold value, if so, the alternative word is stored in the first alternative phrase;According to the second of the alternative word
Attribute or third attribute carry out minor sort again to the alternative word in the described first alternative phrase, obtain the ranking results.
Specifically, first attribute includes the integrity degree of the alternative word, and the integrity degree of the alternative word is described standby
The area that word is selected to be occupied in the chosen area.
Further, the processor of terminal is additionally operable to perform the instruction operated below:It obtains in the described first alternative phrase
Second attribute of the alternative word, second attribute of the alternative word and the second predetermined threshold value;If there are the second categories
Property be more than second predetermined threshold value alternative word, then according to the second attribute of the alternative word in the described first alternative phrase
Alternative word carry out minor sort again;If there is no the alternative word that the second attribute is more than second predetermined threshold value, according to
The third attribute of alternative word carries out minor sort again to the alternative word in the described first alternative phrase.
Further, the second attribute of the alternative word includes the temperature of the alternative word, and the third attribute includes standby
Select the part of speech of word.
In conclusion terminal provided in this embodiment, by obtaining partially complete and incomplete character in chosen area, into
The alternative word that one step splits the corresponding sentence of the character and obtained to fractionation is repeatedly sorted, can be correct
It marks out user and wants the object content replicated, reduce the number of user's operation;By combining peripheral parts, advanced optimize
User replicates the experience sense of text.
The part or the technical side that technical solution in the present embodiment substantially in other words contributes to the prior art
The all or part of case can be embodied in the form of software product, which is stored in storage medium, if including
Dry instruction is used so that one or more terminal device performs all or part of the steps of the method according to each embodiment of the present invention.
The division of module/unit described in the present embodiment, only a kind of division of logic function, can have in actual implementation
Other dividing mode, such as multiple units or component may be combined or can be integrated into another device or some features
It can ignore or not perform.Some or all of module/unit therein can be selected according to the actual needs and realized to reach
The purpose of the present invention program.
In addition, each module/unit in each embodiment of the present invention can be integrated in a processing unit, it can also
That each unit is individually physically present, can also two or more units integrate in a unit.Above-mentioned integrated list
The form that hardware had both may be used in member is realized, can also be realized in the form of SFU software functional unit.
The above is only the preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art
For member, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications also should
It is considered as protection scope of the present invention.
Claims (19)
1. a kind of content of pages extracting method, which is characterized in that described method includes following steps:
Obtain the selected region in the page;
The character in the selected region is identified one by one, obtains the first sentence for including the character, splits first sentence to obtain
Alternative word;
The alternative word is ranked up using at least one attribute of the alternative word, obtains ranking results;
Target alternative word is chosen, and extract the target alternative word according to the ranking results.
2. the according to the method described in claim 1, it is characterized in that, character identified one by one in the selected region
Including:
Identify the complete character in the selected region;
Identify the incomplete character in the selected region;
Increase flag for the complete character and incomplete character, the flag is for identifying the complete character and described
The integrity degree of incomplete character.
3. according to the method described in claim 1, it is characterized in that, the first sentence for including the character that obtains includes:
The character is retrieved in the content of pages, to obtain corresponding to each character in the selected region
First sentence;
First sentence is inquired, to judge in the multiple first sentence with the presence or absence of the first sentence repeated;
If so, delete first sentence of the repetition.
4. according to the method described in claim 1, it is characterized in that, described be split as at least one alternative word packet by first sentence
It includes:
Read the continuation character string in first sentence;
According to the matching with default vocabulary by the continuation character string of sequence from left to right;
When the character string of the first length in the continuation character string is matched with default vocabulary, judge that the first length adds 1 length
Whether character string matches with default vocabulary;
If it is not, by the character string of first length alternately word, and by the character string of first length from the continuation character
It is cut off in string, continues to match using the continuation character string after excision;
If so, the first length is added 1 as being updated to the first length, and continue to judge the character that first length adds 1 length
The step of whether string matches with default vocabulary.
5. according to the method described in claim 1, it is characterized in that, multiple attributes according to the alternative word are to described standby
Word is selected to be ranked up, including:
Priority ranking is carried out to the multiple alternative word according to the first property value of the alternative word, obtains the first sequence knot
Fruit;
Judge whether the first property value of the alternative word is more than the first predetermined threshold value, if so, the alternative word is stored in
First alternative phrase;
The alternative word in the described first alternative phrase is carried out again according to the second property value of the alternative word or third property value
Minor sort obtains the ranking results.
6. according to the method described in claim 5, it is characterized in that, the first property value includes the complete of the alternative word
Degree, the integrity degree of the alternative word are calculated by equation below:
Wherein, X represents the integrated degree of alternative word, and I represents the character ordinal number in alternative word, and n represents of character in alternative word
Number, XI represent the integrity degree of i-th character.
7. according to the method described in claim 5, it is characterized in that, according to the second attribute of the alternative word or third attribute pair
Alternative word in the first alternative phrase carries out minor sort again, including:
Obtain the second property value of alternative word described in the described first alternative phrase, second attribute of the alternative word
Value and the second predetermined threshold value;
If there are the alternative word that the second property value is more than second predetermined threshold value, according to the second property value of the alternative word
Minor sort again is carried out to the alternative word in the described first alternative phrase;
If there is no the alternative word that the second property value is more than second predetermined threshold value, according to the third attribute of the alternative word
Minor sort again is carried out to the alternative word in the described first alternative phrase.
8. the method according to the description of claim 7 is characterized in that the second property value of the alternative word includes the alternative word
Hot value, the third attribute includes the part of speech of alternative word.
9. it according to the method described in claim 1, it is characterized in that, is highlighted described in the target alternative word and/or duplication
Target alternative word.
10. a kind of content of pages extraction element, which is characterized in that described device includes following module:
Region acquisition module, for obtaining the selected region in the page;
Alternative word generation module for identifying the character in the selected region one by one, obtains the first sentence for including the character,
First sentence is split as alternative word;
Attribute sorting module is ranked up the alternative word for multiple attributes according to the alternative word, obtains sequence knot
Fruit;
Content of pages extraction module for choosing target alternative word according to the ranking results, and extracts the target alternative word.
11. device according to claim 10, which is characterized in that the alternative word generation module includes character recognition submodule
Block, the character recognition module are used for:Identify the complete character in the selected region;It identifies in the selected region
Incomplete character;Increase flag for the complete character and incomplete character, the flag is used to identify the complete word
The integrity degree of symbol and the incomplete character.
12. device according to claim 10, which is characterized in that the alternative word generation module includes first sentence and obtains submodule
Block, this yuan of sentence acquisition submodule are used for:The character is retrieved in the content of pages, to obtain in the selected region
Each character corresponding to multiple first sentences;The multiple first sentence is inquired, to judge in the multiple first sentence with the presence or absence of weight
Multiple first sentence;If so, delete first sentence of the repetition.
13. device according to claim 10, which is characterized in that the alternative word generation module includes participle submodule,
The participle submodule is used to read the continuation character string in first sentence;According to sequence from left to right by the continuation character string
Matched with default vocabulary;When the character string of the first length in the continuation character string is matched with default vocabulary, first is judged
Length adds whether the character string of 1 length matches with default vocabulary;If it is not, by the character string of first length alternately word, and
The character string of first length from the continuation character string is cut off, continues to match using the continuation character string after excision;If
That the first length is added 1 as being updated to the first length, and continue to judge first length add 1 length character string whether
The step of being matched with default vocabulary.
14. device according to claim 10, which is characterized in that the attribute sorting module includes:
First attribute sorting sub-module, for the first property value according to the alternative word to the multiple alternative word into row major
Grade sequence, obtains the first ranking results;
First attribute thresholds judging submodule, for judging whether the first property value of the alternative word is more than the first default threshold
Value, if so, the alternative word is stored in the first alternative phrase;
Secondary sorting sub-module, for the second property value according to the alternative word or third property value to first alternative word
Alternative word in group carries out minor sort again, obtains the ranking results.
15. device according to claim 14, which is characterized in that the first property value includes the complete of the alternative word
Degree, the integrity degree of the alternative word are calculated by equation below:
Wherein, X represents the integrated degree of alternative word, and I represents the character ordinal number in alternative word, and n represents of character in alternative word
Number, XI represent the integrity degree of i-th character.
16. device according to claim 14, which is characterized in that the secondary sorting sub-module includes:
Second property value obtains submodule, for obtaining the second property value of alternative word described in the described first alternative phrase;
Second attribute thresholds judging submodule, second property value of the alternative word and the second predetermined threshold value;If it deposits
It is more than the alternative word of second predetermined threshold value in the second property value, then according to the second property value of the alternative word to described the
Alternative word in one alternative phrase carries out minor sort again;If there is no the second property values to be more than the alternative of second predetermined threshold value
Word then carries out minor sort again according to the third attribute of the alternative word to the alternative word in the described first alternative phrase.
17. device according to claim 16, which is characterized in that the second property value of the alternative word includes described alternative
The hot value of word, the third attribute include the part of speech of alternative word.
18. device according to claim 10, which is characterized in that the content of pages extraction module includes:
Module is highlighted, for being highlighted the target alternative word;
Submodule is replicated, for replicating the target alternative word.
19. a kind of client includes the device described in one of claim 10-18.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611260567.8A CN108268438B (en) | 2016-12-30 | 2016-12-30 | Page content extraction method and device and client |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611260567.8A CN108268438B (en) | 2016-12-30 | 2016-12-30 | Page content extraction method and device and client |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108268438A true CN108268438A (en) | 2018-07-10 |
CN108268438B CN108268438B (en) | 2021-10-22 |
Family
ID=62755020
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611260567.8A Active CN108268438B (en) | 2016-12-30 | 2016-12-30 | Page content extraction method and device and client |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108268438B (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110691028A (en) * | 2019-09-16 | 2020-01-14 | 腾讯科技(深圳)有限公司 | Message processing method, device, terminal and storage medium |
WO2020029210A1 (en) * | 2018-08-09 | 2020-02-13 | 深圳市柔宇科技有限公司 | Copy content selection method, terminal and storage medium |
CN111475093A (en) * | 2019-08-02 | 2020-07-31 | 广州三星通信技术研究有限公司 | Word selection method and electronic equipment |
CN111796952A (en) * | 2020-08-12 | 2020-10-20 | Oppo(重庆)智能科技有限公司 | Content operation method and device and computer readable storage medium |
CN112181167A (en) * | 2020-10-27 | 2021-01-05 | 维沃移动通信有限公司 | Input method candidate word processing method and electronic equipment |
CN113220191A (en) * | 2020-01-21 | 2021-08-06 | 佳能株式会社 | Image processing system for computerizing document, control method thereof and storage medium |
CN112181167B (en) * | 2020-10-27 | 2024-11-15 | 维沃移动通信有限公司 | Candidate word processing method for input method and electronic equipment |
Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101201841A (en) * | 2007-02-15 | 2008-06-18 | 刘二中 | Convenient method and system for electronic text-processing and searching |
CN101377855A (en) * | 2007-08-27 | 2009-03-04 | 富士施乐株式会社 | Document image processing apparatus, and information processing method |
CN102301366A (en) * | 2008-11-18 | 2011-12-28 | 夏普株式会社 | Information processing device |
CN102708147A (en) * | 2012-03-26 | 2012-10-03 | 北京新发智信科技有限责任公司 | Recognition method for new words of scientific and technical terminology |
CN103778200A (en) * | 2014-01-09 | 2014-05-07 | 中国科学院计算技术研究所 | Method for extracting information source of message and system thereof |
US20140157168A1 (en) * | 2012-11-30 | 2014-06-05 | International Business Machines Corporation | Copy and paste experience |
CN104462085A (en) * | 2013-09-12 | 2015-03-25 | 腾讯科技(深圳)有限公司 | Method and device for correcting search keywords |
US20150104764A1 (en) * | 2013-10-15 | 2015-04-16 | Apollo Education Group, Inc. | Adaptive grammar instruction for commas |
CN104699809A (en) * | 2015-03-20 | 2015-06-10 | 广东睿江科技有限公司 | Method and device for controlling optimized word bank |
CN104750661A (en) * | 2013-12-30 | 2015-07-01 | 腾讯科技(深圳)有限公司 | Method and device for selecting words and sentences of text |
US20150199091A1 (en) * | 2010-05-15 | 2015-07-16 | Roddy McKee Bullock | Enhanced E-Book and Enhanced E-Book Reader |
CN105446955A (en) * | 2015-11-27 | 2016-03-30 | 贺惠新 | Adaptive word segmentation method |
CN105550170A (en) * | 2015-12-14 | 2016-05-04 | 北京锐安科技有限公司 | Chinese word segmentation method and apparatus |
US20160147879A1 (en) * | 2014-11-24 | 2016-05-26 | Qiurong Huang | Fuzzy Search and Highlighting of Existing Data Visualization |
CN105808512A (en) * | 2016-03-04 | 2016-07-27 | 北京奇虎科技有限公司 | Editing method and editing apparatus for encyclopedic entries |
-
2016
- 2016-12-30 CN CN201611260567.8A patent/CN108268438B/en active Active
Patent Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101201841A (en) * | 2007-02-15 | 2008-06-18 | 刘二中 | Convenient method and system for electronic text-processing and searching |
CN101377855A (en) * | 2007-08-27 | 2009-03-04 | 富士施乐株式会社 | Document image processing apparatus, and information processing method |
CN102301366A (en) * | 2008-11-18 | 2011-12-28 | 夏普株式会社 | Information processing device |
US20150199091A1 (en) * | 2010-05-15 | 2015-07-16 | Roddy McKee Bullock | Enhanced E-Book and Enhanced E-Book Reader |
CN102708147A (en) * | 2012-03-26 | 2012-10-03 | 北京新发智信科技有限责任公司 | Recognition method for new words of scientific and technical terminology |
US20140157168A1 (en) * | 2012-11-30 | 2014-06-05 | International Business Machines Corporation | Copy and paste experience |
CN104462085A (en) * | 2013-09-12 | 2015-03-25 | 腾讯科技(深圳)有限公司 | Method and device for correcting search keywords |
US20150104764A1 (en) * | 2013-10-15 | 2015-04-16 | Apollo Education Group, Inc. | Adaptive grammar instruction for commas |
CN104750661A (en) * | 2013-12-30 | 2015-07-01 | 腾讯科技(深圳)有限公司 | Method and device for selecting words and sentences of text |
CN103778200A (en) * | 2014-01-09 | 2014-05-07 | 中国科学院计算技术研究所 | Method for extracting information source of message and system thereof |
US20160147879A1 (en) * | 2014-11-24 | 2016-05-26 | Qiurong Huang | Fuzzy Search and Highlighting of Existing Data Visualization |
CN104699809A (en) * | 2015-03-20 | 2015-06-10 | 广东睿江科技有限公司 | Method and device for controlling optimized word bank |
CN105446955A (en) * | 2015-11-27 | 2016-03-30 | 贺惠新 | Adaptive word segmentation method |
CN105550170A (en) * | 2015-12-14 | 2016-05-04 | 北京锐安科技有限公司 | Chinese word segmentation method and apparatus |
CN105808512A (en) * | 2016-03-04 | 2016-07-27 | 北京奇虎科技有限公司 | Editing method and editing apparatus for encyclopedic entries |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020029210A1 (en) * | 2018-08-09 | 2020-02-13 | 深圳市柔宇科技有限公司 | Copy content selection method, terminal and storage medium |
CN111475093A (en) * | 2019-08-02 | 2020-07-31 | 广州三星通信技术研究有限公司 | Word selection method and electronic equipment |
CN110691028A (en) * | 2019-09-16 | 2020-01-14 | 腾讯科技(深圳)有限公司 | Message processing method, device, terminal and storage medium |
CN110691028B (en) * | 2019-09-16 | 2022-07-08 | 腾讯科技(深圳)有限公司 | Message processing method, device, terminal and storage medium |
CN113220191A (en) * | 2020-01-21 | 2021-08-06 | 佳能株式会社 | Image processing system for computerizing document, control method thereof and storage medium |
CN111796952A (en) * | 2020-08-12 | 2020-10-20 | Oppo(重庆)智能科技有限公司 | Content operation method and device and computer readable storage medium |
CN112181167A (en) * | 2020-10-27 | 2021-01-05 | 维沃移动通信有限公司 | Input method candidate word processing method and electronic equipment |
CN112181167B (en) * | 2020-10-27 | 2024-11-15 | 维沃移动通信有限公司 | Candidate word processing method for input method and electronic equipment |
Also Published As
Publication number | Publication date |
---|---|
CN108268438B (en) | 2021-10-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106227774B (en) | Information search method and device | |
CN104239535B (en) | A kind of method, server, terminal and system for word figure | |
CN108541310B (en) | Method and device for displaying candidate words and graphical user interface | |
CN109783798A (en) | Method, apparatus, terminal and the storage medium of text information addition picture | |
CN109309751B (en) | Voice recording method, electronic device and storage medium | |
CN104123937B (en) | Remind method to set up, device and system | |
CN108268438A (en) | A kind of content of pages extracting method, device and client | |
CN104866511B (en) | A kind of method and apparatus of addition multimedia file | |
KR20170047268A (en) | Orphaned utterance detection system and method | |
CN111368063B (en) | Information pushing method based on machine learning and related device | |
CN103605656A (en) | Music recommendation method and device and mobile terminal | |
WO2014176750A1 (en) | Reminder setting method, apparatus and system | |
CN109815363A (en) | Generation method, device, terminal and the storage medium of lyrics content | |
CN108563965A (en) | Character input method and device, computer readable storage medium, terminal | |
WO2024036616A1 (en) | Terminal-based question and answer method and apparatus | |
CN110069769B (en) | Application label generation method and device and storage device | |
CN110278141A (en) | A kind of processing method of instant communication information, device and storage medium | |
CN109543014B (en) | Man-machine conversation method, device, terminal and server | |
CN114631094A (en) | Intelligent e-mail headline suggestion and remake | |
CN103366010A (en) | Method and device for searching audio file | |
CN108427761B (en) | News event processing method, terminal, server and storage medium | |
CN106534528A (en) | Processing method and device of text information and mobile terminal | |
US10140265B2 (en) | Apparatuses and methods for phone number processing | |
CN108549681A (en) | Data processing method and device, electronic equipment, computer readable storage medium | |
CN110020429B (en) | Semantic recognition method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |