[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN102541863A - Webpage compression method applied to mobile terminal - Google Patents

Webpage compression method applied to mobile terminal Download PDF

Info

Publication number
CN102541863A
CN102541863A CN2010105885003A CN201010588500A CN102541863A CN 102541863 A CN102541863 A CN 102541863A CN 2010105885003 A CN2010105885003 A CN 2010105885003A CN 201010588500 A CN201010588500 A CN 201010588500A CN 102541863 A CN102541863 A CN 102541863A
Authority
CN
China
Prior art keywords
webpage
content
subject
content blocks
subject content
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2010105885003A
Other languages
Chinese (zh)
Other versions
CN102541863B (en
Inventor
胡晨鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lusheng Technology Co.,Ltd.
Original Assignee
Leadcore Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Leadcore Technology Co Ltd filed Critical Leadcore Technology Co Ltd
Priority to CN201010588500.3A priority Critical patent/CN102541863B/en
Publication of CN102541863A publication Critical patent/CN102541863A/en
Application granted granted Critical
Publication of CN102541863B publication Critical patent/CN102541863B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Transfer Between Computers (AREA)

Abstract

The invention discloses a webpage compression method applied to a mobile terminal. The webpage compression method comprises the following steps of: firstly respectively analyzing an html document and a css document, generating a document object model tree and a render tree, downloading the required resource according to a link in the html file, and finally embedding resource into a webpage file to be rendered and presenting a webpage; and compressing the webpage after a document object model tree is generated, and downloading the required resource according to the link in the html document after the webpage is compressed. The invention relates to the field of mobile communication, the webpage compression method applied to the mobile terminal is applicable to various internet-connected mobile terminals, and a webpage browsing speed of a mobile terminal browser can be effectively improved.

Description

A kind of webpage compression method that is applied to portable terminal
Technical field
The present invention relates to field of mobile communication, relate in particular to a kind of webpage compression method that is applied to portable terminal.
Background technology
Mobile phone browser is meant and operates in the central Internet-browser of this embedded environment of mobile phone.Characteristics such as mobile phone is compared with traditional personal computer environment, and it is limited to have arithmetic capability, and internal memory is less relatively, and the power supply flying power is low, and user's mode of operation is special.Therefore, operate in Internet-browser on the mobile phone and need pass through special design and can adapt to the resource limit under the embedded environment, user experience preferably is provided.
Most of webpages on the internet all are that webpage is bulky for common computer screen designs at present, and content is various.The relative common computer of the screen of mobile phone and resolution thereof is very little, therefore has been difficult to appear preferably effect for these webpages.And; Also often comprised a large amount of garbage (for example advertisement link identifies picture or the like) in the webpage, the actual subject of these contents and webpage is also uncorrelated; But still be downloaded to client; Taken computing and storage resources, and because mobile phone screen is smaller, these irrelevant contents can have a strong impact on user's viewing experience.Therefore, experience for the web page browsing that strengthens mobile phone terminal, mobile phone terminal need be analyzed the webpage of browser downloads on the terminal, filters, and removes irrelevant contents as far as possible, reduces the download of the incoherent link resources of theme.
At present there has been the mobile phone browser of many commercializations to realize webpage is compressed, but basically all has been structure realization, generally all comprised following step with C-S (Client Server):
Website on the direct access internet of the browser of mobile phone, but pass through the indirect browsing page of server of browser manufacturer;
The server of browser manufacturer carries out the adjustment on the webpage framework, work such as the compression of picture to original web page;
The browser that the webpage that the server of browser manufacturer will be handled mails on the mobile phone appears;
Can find out; Such compress technique needs to safeguard huge server zone, and the cost that consumes on bandwidth and hardware all is very high; And browser also can receive the control of third party manufacturer, also possibly clash with the business model of many mobile phone terminals manufacturer.The webpage compress technique that this paper proposes relies on the arithmetic capability of client fully, and original web page is compressed, and on cost control and product are integrated, bigger advantage is arranged all.
Except on mobile phone, on other hand-held mobile terminals, owing to reasons such as screen and internal memories, also there is same problem in online now.
Summary of the invention
To the problems referred to above, the present invention provides a kind of webpage compression method that is applied to portable terminal, has effectively strengthened the web page browsing speed of browser of mobile terminal.
For achieving the above object, the present invention provides following technical scheme:
A kind of webpage compression method that is applied to portable terminal; This method is resolved html document and css document at first respectively; Generate document object model tree and play up tree; Download required resource based on the link in the html document, at last resource is embedded in and plays up in the web document and present webpage; After generating document object model tree, carry out the webpage compression, and after the webpage compression, download resource requirement based on the connection in the html document again.
Said webpage compression comprises the steps:
Step 1, webpage is divided into different content blocks;
Step 2, the degree of correlation of different content blocks bases and Web page subject is divided into subject content set and the set of non-subject content;
Step 3, element in the element in the set of non-subject content and the subject content set is carried out the similarity comparison; Similarity is lower than setting threshold; Then filter the element in the non-subject content set,, then keep the element in this subject content set if similarity is higher than setting threshold.
Webpage is divided into subject content to the present invention's employing and non-subject content is analyzed webpage, filters out the non-subject content not high with the Web page subject similarity, thereby reached the purpose of webpage compression, and it has following some advantage:
1, the content of webpage is analyzed, will be used as noise with the incoherent non-subject content of the theme of webpage and be filtered, strengthened viewing experience;
The similarity of 2, filtering based on subject content and non-subject content compares, and computational complexity is low, and consumption of natural resource is few, is applicable to the portable terminal that calculation resources is limited;
3, filtration can be removed a large amount of useless resource links, and like the advertisement picture, sign etc. have reduced the portable terminal traffic consumes.
Description of drawings
Fig. 1 plays up process flow diagram for to be portable terminal to web pages downloaded resolve;
Fig. 2 is the principle flow chart that is applied to the webpage compression method of portable terminal provided by the invention.
Embodiment
Do detailed description below in conjunction with the Figure of description specific embodiments of the invention.
See also Fig. 1; To be portable terminal to web pages downloaded resolve plays up process flow diagram: at first respectively html document and css document are resolved; Generate document object model tree (DOM Tree) and play up tree (Rendering Tree); Utilize webpage compression method compression webpage provided by the invention then, download required resource (picture, multimedia elements such as audio frequency and video) according to the link in the html document; After download accomplishing, browser just can be embedded in resource and play up in the web document and present webpage.
Seeing also Fig. 2, is the principle flow chart that is applied to the webpage compression method of portable terminal provided by the invention.
Step 201 is divided into N different content blocks with webpage;
Step 202, the content blocks basis that N is different and the degree of correlation of Web page subject be divided into x subject content and the individual non-subject content of y (x >=1, y >=1, x+y=N);
Step 203 is carried out similarity relatively with x subject content respectively with y non-subject content;
Step 204, if both similarities are lower than user's preset threshold, then execution in step 205, if similarity is higher than user's preset threshold, then keep this non-subject content; In the comparison process of similarity, can set y non-subject content one by one with x subject content in during one of them subject content comparison similarity be lower than user's preset threshold, then execution in step 205;
Step 205 is filtered this non-subject content, and execution in step 207;
Step 206 keeps this non-subject content, and execution in step 207;
Step 207 judges whether non-subject content relatively finishes, if relatively finish, then returns execution in step 203, proceeds the comparison of next non-subject content, if then finish this flow process.
In above-mentioned steps 201, web page contents is divided N content, specifically comprise the steps:
Step 2011, the traversal dom tree according to labels different in the dom tree, is divided into N content blocks with whole webpage.The granularity that content blocks is divided is thin more, and the compression effectiveness of webpage is good more, but correspondingly also can increase operand.So it is adaptive that the granularity that content blocks is divided can be carried out according to the Hardware configuration of different mobile terminal, such as, processor host frequency is lower than 200M, and the user can arrange the portable terminal that internal memory is lower than 20M byte, and the granularity of division can be confined to the 3rd layer of dom tree; The mobile phone terminal of higher configuration can adopt thinner granularity division.
In above-mentioned steps 202, content blocks is divided into topic module set and the set of non-topic module, specifically comprise the steps:
Step 2021 is obtained the weight CW of content blocks j j, i.e. the proportion that in all the elements piece that webpage is divided, occupies of content blocks j weights, the weights of Wj represent content piece j:
CW j = W j Σ i = 1 N W i Formula 1
Weights Wj mainly is positioned at the position of webpage according to content blocks j and the MIMETYPE (medium type of resource) of this content blocks j internal chaining weighs: if this content blocks j is positioned at the middle part or the middle and upper part of webpage, then increase the weights of this content blocks j; If the web page contents degree of correlation of the MIME TYPE of content blocks j internal chaining and current browsing is high, then increase weights, for example, current webpage belongs to video website, and then the link of the flv type of this content blocks j can increase the weights of this content blocks j.
For example; If a webpage comprises a plurality of text block and a plurality of video blocks, and this webpage belongs to news website, then the weights of webpage zone line and the text more than the zone line are set to 10; The text block of non-zone line can be in the interior value of the scope of [1,6] according to the distance apart from zone line; In addition, the MIME TYPE of text block internal chaining is because identical with type of webpage, and then weights can be in the interior value of the scope of [7,9], can obtain the weights W of content blocks j according to as above standard j, can calculate the weights CW of content blocks j according to formula 1 j
Step 2022 is divided into subject content set C (C according to weight with N content blocks 1, C 2... C k... C K) and non-subject content set θ (θ 1, θ 2... θ k... θ N-K), K<N wherein.
Weight CW as content blocks j jDuring greater than setting threshold, this content blocks j just can be considered to the subject content set, otherwise then this content blocks j is divided into non-subject content set.
Choosing of above-mentioned setting threshold can in concrete browser of mobile terminal is provided with, is can be the user one configuration interface is provided by User Defined, and the user can regulate said threshold size in this configuration interface.
In above-mentioned steps 203,, further comprise the steps: in order to carry out the comparison of similarity
Step 2031, the literal in the traversal webpage extracts the phrase that occurred in the webpage, forms the keyword set of this webpage.If phrase add up to n, then the keyword sets of this webpage is combined into T (T 1, T 2... T i... T n);
Step 2032 is each content blocks construction feature vector W (w 1, w 2... W i... W n).This proper vector comprises n component (n is the sum of phrase in this webpage), and each component is by keyword set T (T 1, T 2... T i... T n) in the word frequency of each element in this content blocks calculate, computing formula is described below:
w i = CW j × Tf Ij Σ i = 1 n ( Σ j = 1 N CW j × Tf Ij ) 2 Formula 2
Wherein, Tf IjBe keyword T iWord frequency in content blocks j, CW jWeight for content blocks j.
Step 2033 is calculated non-subject content set θ (θ 1, θ 2... θ k... θ N-K) proper vector and the subject content set C (C of interior element 1, C 2... C k... C K) the cosine distance of proper vector of interior element, this cosine distance can be used as the criterion of the similarity of non-subject content module and subject content module.The non-subject content module that similarity is lower than certain threshold value will be considered to the content that need be filtered, and these modules can remove from dom tree.Choosing according to user's personal set of said threshold value is relevant, and in the general end product, browser can provide a configuration interface, and the user can adjust this threshold value according to practical application.
The calculation of similarity degree formula is following, wherein X iAnd Y iRepresented i component of the proper vector of carrying out the similarity computing respectively:
d ( X , Y ) = Σ i = 1 n X i Y i Σ i = 1 n X i 2 Σ i = 1 n Y i 2 Formula 3
More than; Be merely preferred embodiment of the present invention, but protection scope of the present invention is not limited thereto, any technician who is familiar with the present technique field is in the technical scope that the present invention discloses; The variation that can expect easily or replacement all should be encompassed within protection scope of the present invention.Therefore, protection scope of the present invention should be as the criterion with the protection domain that claim was defined.

Claims (6)

1. webpage compression method that is applied to portable terminal; This method is resolved html document and css document at first respectively; Generate document object model tree and play up tree; Download required resource based on the link in the html document, at last resource is embedded in and plays up in the web document and present webpage; It is characterized in that: after generating document object model tree, carry out the webpage compression, and after the webpage compression, download resource requirement based on the connection in the html document again.
2. the webpage compression method that is applied to portable terminal as claimed in claim 1 is characterized in that: said webpage compression comprises the steps:
Step 1, webpage is divided into different content blocks;
Step 2, the degree of correlation of different content blocks bases and Web page subject is divided into subject content set and the set of non-subject content;
Step 3, element in the element in the set of non-subject content and the subject content set is carried out the similarity comparison; Similarity is lower than setting threshold; Then filter the element in the non-subject content set,, then keep the element in this subject content set if similarity is higher than setting threshold.
3. the webpage compression method that is applied to portable terminal as claimed in claim 2 is characterized in that: be divided into different content blocks through the different label webpages in the traversal document object model tree in the described step 1.
4. the webpage compression method that is applied to portable terminal as claimed in claim 2 is characterized in that: further comprise the steps: in the described step 2
Obtain the weight CW of content blocks j j, i.e. the importance degree that in all the elements piece that webpage is divided, occupies of content blocks j weights, the weights of Wj represent content piece j:
Figure FDA0000038184640000011
According to weight N content blocks is divided into subject content set C (C 1, C 2... C k... C K) and non-subject content set θ (θ 1, θ 2... θ k... θ N-K), K<N wherein.
Weight CW as content blocks j jDuring greater than setting threshold, this content blocks j just can be considered to the subject content set, otherwise then this content blocks j is divided into non-subject content set.
5. the webpage compression method that is applied to portable terminal as claimed in claim 3 is characterized in that: weights Wj depends primarily on the medium type of resource that content blocks j is positioned at position and this content blocks j internal chaining of webpage.
6. the webpage compression method that is applied to portable terminal as claimed in claim 2, it is characterized in that: said step 3 further comprises the steps:
Literal in the traversal webpage extracts the phrase that occurred in the webpage, forms the keyword set of this webpage, establishes the n that adds up to of phrase, and then the keyword sets of this webpage is combined into T (T 1, T 2... T i... T n);
Be each content blocks construction feature vector W (w 1, w 2... W i... W n), this proper vector comprises n component, and each component is by keyword set T (T 1, T 2... T i... T n) in the word frequency of each element in this content blocks calculate,
Figure FDA0000038184640000021
Wherein, Tf IjBe keyword T iWord frequency in content blocks j, CW jWeight for content blocks j;
Obtain non-subject content set θ (θ 1, θ 2... θ k... θ N-K) proper vector and the subject content set C (C of interior element 1, C 2... C k... C K) the cosine distance of proper vector of interior element, this cosine distance is promptly as the similarity of non-subject content module and subject content module:
Figure FDA0000038184640000022
X iAnd Y iRepresented i component of the proper vector of carrying out the similarity computing respectively.
CN201010588500.3A 2010-12-14 2010-12-14 A kind of Webpage compression method being applied to mobile terminal Active CN102541863B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201010588500.3A CN102541863B (en) 2010-12-14 2010-12-14 A kind of Webpage compression method being applied to mobile terminal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201010588500.3A CN102541863B (en) 2010-12-14 2010-12-14 A kind of Webpage compression method being applied to mobile terminal

Publications (2)

Publication Number Publication Date
CN102541863A true CN102541863A (en) 2012-07-04
CN102541863B CN102541863B (en) 2015-08-05

Family

ID=46348785

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201010588500.3A Active CN102541863B (en) 2010-12-14 2010-12-14 A kind of Webpage compression method being applied to mobile terminal

Country Status (1)

Country Link
CN (1) CN102541863B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103473347A (en) * 2013-09-24 2013-12-25 北京大学 Web page similarity-based browser rendering optimization method
CN103500118A (en) * 2013-10-24 2014-01-08 北京奇虎科技有限公司 Method and device for optimizing cascading style sheet
CN104965871A (en) * 2015-06-09 2015-10-07 北京金山安全软件有限公司 Page loading method and device and electronic equipment
CN106649344A (en) * 2015-10-31 2017-05-10 华为数字技术(苏州)有限公司 Network log compression method and apparatus
CN108536864A (en) * 2018-04-20 2018-09-14 平安科技(深圳)有限公司 Page numeric displaying method, device, computer equipment and storage medium
CN109003313A (en) * 2017-06-06 2018-12-14 腾讯科技(深圳)有限公司 A kind of methods, devices and systems transmitting Web page picture

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101097570A (en) * 2006-06-29 2008-01-02 上海唯客网广告传播有限公司 Advertisement classification method capable of automatic recognizing classified advertisement type
CN101246494A (en) * 2008-03-19 2008-08-20 腾讯科技(深圳)有限公司 Internet web page conversion method, system and equipment
CN100502309C (en) * 2006-09-12 2009-06-17 成都迈普产业集团有限公司 Embedded Web network management system and its interaction method
CN101639856A (en) * 2009-09-11 2010-02-03 清华大学 Webpage correlation evaluation device for detecting internet information spreading
CN101639853A (en) * 2009-08-26 2010-02-03 王建军 Text display method used for household electrical appliance terminal
CN101814118A (en) * 2009-07-02 2010-08-25 西安电子科技大学 Method for protecting web texts based on pictures

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101097570A (en) * 2006-06-29 2008-01-02 上海唯客网广告传播有限公司 Advertisement classification method capable of automatic recognizing classified advertisement type
CN100502309C (en) * 2006-09-12 2009-06-17 成都迈普产业集团有限公司 Embedded Web network management system and its interaction method
CN101246494A (en) * 2008-03-19 2008-08-20 腾讯科技(深圳)有限公司 Internet web page conversion method, system and equipment
CN101814118A (en) * 2009-07-02 2010-08-25 西安电子科技大学 Method for protecting web texts based on pictures
CN101639853A (en) * 2009-08-26 2010-02-03 王建军 Text display method used for household electrical appliance terminal
CN101639856A (en) * 2009-09-11 2010-02-03 清华大学 Webpage correlation evaluation device for detecting internet information spreading

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103473347A (en) * 2013-09-24 2013-12-25 北京大学 Web page similarity-based browser rendering optimization method
CN103473347B (en) * 2013-09-24 2017-01-11 北京大学 Web page similarity-based browser rendering optimization method
CN103500118A (en) * 2013-10-24 2014-01-08 北京奇虎科技有限公司 Method and device for optimizing cascading style sheet
CN103500118B (en) * 2013-10-24 2017-01-04 北京奇虎科技有限公司 A kind of Cascading Style Sheet optimization method and device
CN104965871A (en) * 2015-06-09 2015-10-07 北京金山安全软件有限公司 Page loading method and device and electronic equipment
CN106649344A (en) * 2015-10-31 2017-05-10 华为数字技术(苏州)有限公司 Network log compression method and apparatus
CN106649344B (en) * 2015-10-31 2020-01-10 华为数字技术(苏州)有限公司 Weblog compression method and device
CN109003313A (en) * 2017-06-06 2018-12-14 腾讯科技(深圳)有限公司 A kind of methods, devices and systems transmitting Web page picture
CN109003313B (en) * 2017-06-06 2021-09-03 腾讯科技(深圳)有限公司 Method, device and system for transmitting webpage picture
CN108536864A (en) * 2018-04-20 2018-09-14 平安科技(深圳)有限公司 Page numeric displaying method, device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN102541863B (en) 2015-08-05

Similar Documents

Publication Publication Date Title
CN102541863B (en) A kind of Webpage compression method being applied to mobile terminal
CN103164521B (en) A kind ofly to browse and the keyword calculation method of search behavior and device based on user
CN103353886B (en) The method and system of preview webpage
CN107807937B (en) Website SEO processing method, device and system
CN106202482B (en) Website optimization method and system based on user behavior analysis
CN104598505A (en) Multimedia resource recommendation method and device
KR20080052097A (en) Harmful web site filtering method and apparatus using web structural information
CN105045864A (en) Personalized recommendation method of digital resources
CN104765746A (en) Data processing method and device for mobile communication terminal browser
CN104503988A (en) Searching method and device
CN105589866A (en) Information display method and apparatus
CN105160016A (en) Method and device for acquiring user attributes
Ghasemisharif et al. Speedreader: Reader mode made fast and private
CN102955852A (en) Method, device and equipment for webpage resource processing
CN102033926B (en) Page content processing method and device
CN103902571A (en) Method and system for saving webpage complete content and corresponding client end and server
CN104156251A (en) Picture processing method and device
CN103020208A (en) Searching method and device adapting to mobile terminal
CN102693237B (en) Webpage content adaptation and encapsulation system and method
CN105045868B (en) A kind of method and device for searching for hot ticket
CN104572707A (en) Preferable object information providing method and device
CN108256078A (en) Information acquisition method and device
CN106126623A (en) Information processing method and device
CN101420490A (en) Data reading method and device
CN104951536B (en) Searching method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20170421

Address after: 201206 China (Shanghai) free trade zone, the moon Road, No. 3, building fourth, room B412, level 1258

Patentee after: Shanghai Li Ke Semiconductor Technology Co., Ltd.

Address before: 201206 Pudong New Area Mingyue Road, Shanghai, No. 1258

Patentee before: Leadcore Technology Co., Ltd.

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20200825

Address after: 610299 in Chengdu core Valley Industrial Park, Dongsheng Street, Shuangliu District, Chengdu City, Sichuan Province

Patentee after: Lusheng Technology Co.,Ltd.

Address before: 201206 China (Shanghai) free trade zone, the moon Road, No. 3, building fourth, room B412, level 1258

Patentee before: Shanghai Li Ke Semiconductor Technology Co.,Ltd.

TR01 Transfer of patent right