Background technology
Mobile phone browser is meant and operates in the central Internet-browser of this embedded environment of mobile phone.Characteristics such as mobile phone is compared with traditional personal computer environment, and it is limited to have arithmetic capability, and internal memory is less relatively, and the power supply flying power is low, and user's mode of operation is special.Therefore, operate in Internet-browser on the mobile phone and need pass through special design and can adapt to the resource limit under the embedded environment, user experience preferably is provided.
Most of webpages on the internet all are that webpage is bulky for common computer screen designs at present, and content is various.The relative common computer of the screen of mobile phone and resolution thereof is very little, therefore has been difficult to appear preferably effect for these webpages.And; Also often comprised a large amount of garbage (for example advertisement link identifies picture or the like) in the webpage, the actual subject of these contents and webpage is also uncorrelated; But still be downloaded to client; Taken computing and storage resources, and because mobile phone screen is smaller, these irrelevant contents can have a strong impact on user's viewing experience.Therefore, experience for the web page browsing that strengthens mobile phone terminal, mobile phone terminal need be analyzed the webpage of browser downloads on the terminal, filters, and removes irrelevant contents as far as possible, reduces the download of the incoherent link resources of theme.
At present there has been the mobile phone browser of many commercializations to realize webpage is compressed, but basically all has been structure realization, generally all comprised following step with C-S (Client Server):
Website on the direct access internet of the browser of mobile phone, but pass through the indirect browsing page of server of browser manufacturer;
The server of browser manufacturer carries out the adjustment on the webpage framework, work such as the compression of picture to original web page;
The browser that the webpage that the server of browser manufacturer will be handled mails on the mobile phone appears;
Can find out; Such compress technique needs to safeguard huge server zone, and the cost that consumes on bandwidth and hardware all is very high; And browser also can receive the control of third party manufacturer, also possibly clash with the business model of many mobile phone terminals manufacturer.The webpage compress technique that this paper proposes relies on the arithmetic capability of client fully, and original web page is compressed, and on cost control and product are integrated, bigger advantage is arranged all.
Except on mobile phone, on other hand-held mobile terminals, owing to reasons such as screen and internal memories, also there is same problem in online now.
Summary of the invention
To the problems referred to above, the present invention provides a kind of webpage compression method that is applied to portable terminal, has effectively strengthened the web page browsing speed of browser of mobile terminal.
For achieving the above object, the present invention provides following technical scheme:
A kind of webpage compression method that is applied to portable terminal; This method is resolved html document and css document at first respectively; Generate document object model tree and play up tree; Download required resource based on the link in the html document, at last resource is embedded in and plays up in the web document and present webpage; After generating document object model tree, carry out the webpage compression, and after the webpage compression, download resource requirement based on the connection in the html document again.
Said webpage compression comprises the steps:
Step 1, webpage is divided into different content blocks;
Step 2, the degree of correlation of different content blocks bases and Web page subject is divided into subject content set and the set of non-subject content;
Step 3, element in the element in the set of non-subject content and the subject content set is carried out the similarity comparison; Similarity is lower than setting threshold; Then filter the element in the non-subject content set,, then keep the element in this subject content set if similarity is higher than setting threshold.
Webpage is divided into subject content to the present invention's employing and non-subject content is analyzed webpage, filters out the non-subject content not high with the Web page subject similarity, thereby reached the purpose of webpage compression, and it has following some advantage:
1, the content of webpage is analyzed, will be used as noise with the incoherent non-subject content of the theme of webpage and be filtered, strengthened viewing experience;
The similarity of 2, filtering based on subject content and non-subject content compares, and computational complexity is low, and consumption of natural resource is few, is applicable to the portable terminal that calculation resources is limited;
3, filtration can be removed a large amount of useless resource links, and like the advertisement picture, sign etc. have reduced the portable terminal traffic consumes.
Embodiment
Do detailed description below in conjunction with the Figure of description specific embodiments of the invention.
See also Fig. 1; To be portable terminal to web pages downloaded resolve plays up process flow diagram: at first respectively html document and css document are resolved; Generate document object model tree (DOM Tree) and play up tree (Rendering Tree); Utilize webpage compression method compression webpage provided by the invention then, download required resource (picture, multimedia elements such as audio frequency and video) according to the link in the html document; After download accomplishing, browser just can be embedded in resource and play up in the web document and present webpage.
Seeing also Fig. 2, is the principle flow chart that is applied to the webpage compression method of portable terminal provided by the invention.
Step 201 is divided into N different content blocks with webpage;
Step 202, the content blocks basis that N is different and the degree of correlation of Web page subject be divided into x subject content and the individual non-subject content of y (x >=1, y >=1, x+y=N);
Step 203 is carried out similarity relatively with x subject content respectively with y non-subject content;
Step 204, if both similarities are lower than user's preset threshold, then execution in step 205, if similarity is higher than user's preset threshold, then keep this non-subject content; In the comparison process of similarity, can set y non-subject content one by one with x subject content in during one of them subject content comparison similarity be lower than user's preset threshold, then execution in step 205;
Step 205 is filtered this non-subject content, and execution in step 207;
Step 206 keeps this non-subject content, and execution in step 207;
Step 207 judges whether non-subject content relatively finishes, if relatively finish, then returns execution in step 203, proceeds the comparison of next non-subject content, if then finish this flow process.
In above-mentioned steps 201, web page contents is divided N content, specifically comprise the steps:
Step 2011, the traversal dom tree according to labels different in the dom tree, is divided into N content blocks with whole webpage.The granularity that content blocks is divided is thin more, and the compression effectiveness of webpage is good more, but correspondingly also can increase operand.So it is adaptive that the granularity that content blocks is divided can be carried out according to the Hardware configuration of different mobile terminal, such as, processor host frequency is lower than 200M, and the user can arrange the portable terminal that internal memory is lower than 20M byte, and the granularity of division can be confined to the 3rd layer of dom tree; The mobile phone terminal of higher configuration can adopt thinner granularity division.
In above-mentioned steps 202, content blocks is divided into topic module set and the set of non-topic module, specifically comprise the steps:
Step 2021 is obtained the weight CW of content blocks j
j, i.e. the proportion that in all the elements piece that webpage is divided, occupies of content blocks j weights, the weights of Wj represent content piece j:
Formula 1
Weights Wj mainly is positioned at the position of webpage according to content blocks j and the MIMETYPE (medium type of resource) of this content blocks j internal chaining weighs: if this content blocks j is positioned at the middle part or the middle and upper part of webpage, then increase the weights of this content blocks j; If the web page contents degree of correlation of the MIME TYPE of content blocks j internal chaining and current browsing is high, then increase weights, for example, current webpage belongs to video website, and then the link of the flv type of this content blocks j can increase the weights of this content blocks j.
For example; If a webpage comprises a plurality of text block and a plurality of video blocks, and this webpage belongs to news website, then the weights of webpage zone line and the text more than the zone line are set to 10; The text block of non-zone line can be in the interior value of the scope of [1,6] according to the distance apart from zone line; In addition, the MIME TYPE of text block internal chaining is because identical with type of webpage, and then weights can be in the interior value of the scope of [7,9], can obtain the weights W of content blocks j according to as above standard
j, can calculate the weights CW of content blocks j according to formula 1
j
Step 2022 is divided into subject content set C (C according to weight with N content blocks
1, C
2... C
k... C
K) and non-subject content set θ (θ
1, θ
2... θ
k... θ
N-K), K<N wherein.
Weight CW as content blocks j
jDuring greater than setting threshold, this content blocks j just can be considered to the subject content set, otherwise then this content blocks j is divided into non-subject content set.
Choosing of above-mentioned setting threshold can in concrete browser of mobile terminal is provided with, is can be the user one configuration interface is provided by User Defined, and the user can regulate said threshold size in this configuration interface.
In above-mentioned steps 203,, further comprise the steps: in order to carry out the comparison of similarity
Step 2031, the literal in the traversal webpage extracts the phrase that occurred in the webpage, forms the keyword set of this webpage.If phrase add up to n, then the keyword sets of this webpage is combined into T (T
1, T
2... T
i... T
n);
Step 2032 is each content blocks construction feature vector W (w
1, w
2... W
i... W
n).This proper vector comprises n component (n is the sum of phrase in this webpage), and each component is by keyword set T (T
1, T
2... T
i... T
n) in the word frequency of each element in this content blocks calculate, computing formula is described below:
Formula 2
Wherein, Tf
IjBe keyword T
iWord frequency in content blocks j, CW
jWeight for content blocks j.
Step 2033 is calculated non-subject content set θ (θ
1, θ
2... θ
k... θ
N-K) proper vector and the subject content set C (C of interior element
1, C
2... C
k... C
K) the cosine distance of proper vector of interior element, this cosine distance can be used as the criterion of the similarity of non-subject content module and subject content module.The non-subject content module that similarity is lower than certain threshold value will be considered to the content that need be filtered, and these modules can remove from dom tree.Choosing according to user's personal set of said threshold value is relevant, and in the general end product, browser can provide a configuration interface, and the user can adjust this threshold value according to practical application.
The calculation of similarity degree formula is following, wherein X
iAnd Y
iRepresented i component of the proper vector of carrying out the similarity computing respectively:
Formula 3
More than; Be merely preferred embodiment of the present invention, but protection scope of the present invention is not limited thereto, any technician who is familiar with the present technique field is in the technical scope that the present invention discloses; The variation that can expect easily or replacement all should be encompassed within protection scope of the present invention.Therefore, protection scope of the present invention should be as the criterion with the protection domain that claim was defined.