[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN102033926A - Page content processing method and device - Google Patents

Page content processing method and device Download PDF

Info

Publication number
CN102033926A
CN102033926A CN2010105897689A CN201010589768A CN102033926A CN 102033926 A CN102033926 A CN 102033926A CN 2010105897689 A CN2010105897689 A CN 2010105897689A CN 201010589768 A CN201010589768 A CN 201010589768A CN 102033926 A CN102033926 A CN 102033926A
Authority
CN
China
Prior art keywords
url
user
parameters
target page
domain name
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2010105897689A
Other languages
Chinese (zh)
Other versions
CN102033926B (en
Inventor
王岩
霍景
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN 201010589768 priority Critical patent/CN102033926B/en
Publication of CN102033926A publication Critical patent/CN102033926A/en
Application granted granted Critical
Publication of CN102033926B publication Critical patent/CN102033926B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Transfer Between Computers (AREA)

Abstract

The invention provides a page content processing method applied to a server side of an internet and providing page content processing service for a small terminal. The method comprises the following steps of: extracting the target page content required to be accessed by a user; judging whether a URL (Uniform Resource Locator) exists in the target page content or not; and if so, carrying out truncation processing on the URL according to a scheduled strategy, wherein the scheduled strategy comprises the following steps of: truncating a main domain part in the URL as a first part; truncating a plurality of characters behind the main domain in the URL as a second part; and inserting an ellipsis as a third part connecting the first part and the second part to form a truncated URL. The invention can effectively increase the browsing experience of surfing the internet with a cell phone and also prevent the user from being cheated by a phishing site.

Description

Page content processing method and device
Technical Field
The present invention relates to a web content processing technology, and more particularly, to a web content processing technology for providing a good web browsing experience to a specific user terminal.
Background
Since the nineties of the last century, internet technology is developed vigorously, internet applications such as web search, portals and online transactions produced in the past have a great influence on the work and life of people, the efficiency of information transfer in economic life is rapidly improved, and all links of the internet industry chain are developed and matured basically. However, with the maturity of 3G and other mobile broadband access technologies and the emergence of various new intelligent mobile terminals, the behavior of users to surf the internet is changing. More users begin to use mobile terminals including mobile phones to obtain information from the network and share information with people, and the mobile internet has entered into the lives of the public.
Today, internet application providers need to technically modify their products and services to provide better use experience for many mobile internet users. Because in the mobile internet, the terminal device of the user is usually mainly a mobile phone. Compared with a personal computer, a mobile phone has the advantage of mobility, but natural bottlenecks such as small screen, low access rate, high access cost, low processing capacity and the like exist at the same time. For this reason, various application providers are engaged in the applications themselves to assist users in improving these bottlenecks. For example, in order to be suitable for browsing a mobile phone screen, some well-known portal websites can launch mobile phone websites for mobile phone users, and compared with the characteristic that multimedia contents of website versions of computers are widely used, the service end of the mobile phone website is mainly provided with text contents. For another example, for some pictures, the application service provider can compress the pictures in a targeted manner to facilitate display, and meanwhile, the transmission of traffic can be reduced and the processing burden of the mobile phone terminal can be reduced. However, there are still a number of technical challenges to be solved for the internet application service provider to improve the user experience.
As mentioned above, currently, a server can push a page adapted to mobile phone browsing to a user. However, there is only a few effective processing means for URLs that often appear in page content. In many cases, for the simplicity of content presentation, an editor of page content will typically insert a URL into the page content to reference the content of another page to which the URL points. For example, a user may ask a question about national day section vacation arrangements at hundredths of knowledge (zhidao.baidu.com), the respondent may not give the answer directly, but instead put a URL to the government official site, and then invite the questioner to visit the URL to obtain the information he needs. For another example, in order to let the reader know the context of the entire topic event, an editor of a news page may refer to a plurality of past related events in the news content of the page, and the editor will generally refer to a plurality of URLs respectively pointing to the news pages of the past events. The URL is convenient to refer, but for mobile phone users, the mobile phone users may suffer from overlong URL in the process of browsing web pages. Referring to fig. 1, in some cases, the length of the URL may occupy more than half of the mobile phone screen, or even exceed the entire mobile phone screen. Under the condition that the URL is too long, the browsing experience of a mobile phone user is rapidly reduced, and the connection of text contents at two ends of the URL is greatly influenced.
Disclosure of Invention
The invention provides a method and a device for processing page content, which are used for improving the use experience of a user when the user uses a small terminal to browse a page, and are realized by the following technical scheme:
the invention provides a page content processing method, which is applied to an internet server and provides page content processing service for a small terminal, and the method comprises the following steps:
A. extracting the content of a target page which is requested to be accessed by a user;
B. judging whether the target page content has a URL or not; if yes, the step C is skipped to continue;
C. truncating the URL in the target page content according to a preset strategy, wherein the preset strategy comprises the following steps:
strategy C1, intercepting a main domain name part in the URL as a first part;
strategy C2, intercepting a plurality of characters behind the main domain name of the URL as a second part;
policy C3, inserting an elision symbol as a third part to connect the first part with the second part to form a truncated URL when a character is truncated between the second part and the first part;
D. and returning the processed target page to the user.
Preferably, the policy C2 is specifically: intercepting all characters of the URL starting with the last "/" as the second portion.
Preferably, the policy C2 is specifically: judging whether the length of the character behind the main domain name exceeds a preset threshold length;
if not, all characters behind the main domain name are reserved as the second part;
otherwise, a character with a preset length is cut forward from the end of the URL to serve as the second part.
Preferably, the method further comprises the following steps:
E. extracting parameters in the user access request, wherein the parameters at least comprise user terminal parameters, and judging whether the user terminal belongs to an intelligent type or a common type according to the user terminal parameters;
the specific step of judging whether the character length after the main domain name exceeds a preset threshold length is as follows: judging whether the character length behind the main domain name exceeds a preset threshold length corresponding to a terminal;
the intelligent terminal corresponds to a second preset threshold length, the common terminal corresponds to a first preset threshold length, and the second preset threshold length is larger than the first preset threshold length.
Preferably, the method further comprises:
E. extracting parameters in the user access request, wherein the parameters at least comprise user terminal parameters;
F. and transcoding the target page content.
Preferably, the step F specifically is: and carrying out code conversion on the target page content according to the user terminal parameters and the target page content.
Preferably, step D is preceded by:
E. extracting parameters in the user access request, wherein the parameters at least comprise user terminal parameters;
G. and D, judging whether a corresponding conversion address code needs to be added or not by combining the user terminal parameter and the target page pointed by the truncated URL, inserting the conversion address code if necessary, and otherwise, turning to the step D.
The invention also provides a page content processing device, which is applied to an internet server and provides page content processing service for a small terminal, and is characterized in that the device comprises: the system comprises a content extraction unit, a URL identification unit, a page rendering unit and a user interaction unit; wherein,
the content extraction unit is used for extracting the target page content which is requested to be accessed by the user;
the URL identification unit is used for judging whether a URL exists in the target page content or not; if the URL exists, submitting the identified URL to the page rendering unit for truncation processing, otherwise skipping the truncation processing for continuation;
the page rendering unit is configured to truncate the URL in the target page content according to a predetermined policy, where the predetermined policy includes:
strategy C1, intercepting a main domain name part in the URL as a first part;
strategy C2, intercepting a plurality of characters behind the main domain name of the URL as a second part;
policy C3, inserting an elision symbol as a third part to connect the first part with the second part to form a truncated URL when a character is truncated between the second part and the first part;
and the user interaction unit is used for returning the processed target page to the user.
Preferably, the policy C2 is specifically: intercepting all characters of the URL starting with the last "/" as the second portion.
Preferably, the policy C2 is specifically: judging whether the length of the character behind the main domain name exceeds a preset threshold length;
if not, all characters behind the main domain name are taken as the second part;
otherwise, a character with a preset length is cut forward from the end of the URL to serve as the second part.
Preferably, the method further comprises the following steps:
the parameter extraction unit is used for extracting parameters in the user access request, wherein the parameters at least comprise user terminal parameters;
the page rendering unit is further used for judging whether the user terminal belongs to an intelligent type or a common type according to the user terminal parameters;
the specific step of judging whether the character length after the main domain name exceeds a preset threshold length is as follows: judging whether the character length behind the main domain name exceeds a preset threshold length corresponding to a terminal;
the intelligent terminal corresponds to a second preset threshold length, the common terminal corresponds to a first preset threshold length, and the second preset threshold length is larger than the first preset threshold length.
Preferably, the method further comprises the following steps: and the code conversion unit is used for carrying out code conversion on the target page.
Preferably, the method further comprises the following steps:
the parameter extraction unit is used for extracting parameters in the user access request, wherein the parameters at least comprise user terminal parameters;
the code conversion unit is used for carrying out code conversion on the target page according to user terminal parameters and the target page content.
Preferably, the method further comprises the following steps:
the parameter extraction unit is used for extracting parameters in the user access request, wherein the parameters at least comprise user terminal parameters;
and the page rendering unit is further used for judging whether a corresponding conversion address code needs to be added or not by combining the user terminal parameter and the target page pointed by the truncated URL, inserting the corresponding conversion address code if the corresponding conversion address code needs to be added, and returning the processed target page to the user by the user interaction unit if the corresponding conversion address code does not need to be added.
Compared with the prior art, the technical scheme can effectively improve the experience of a user when the user browses the webpage by using a small terminal such as a mobile phone and the like, the situation that the URL occupies a large area of a screen can not occur when the content browsed by the user has the URL, and the risk that the user is cheated by a phishing website can be prevented.
Drawings
Fig. 1 is a schematic diagram of web browsing using a mobile phone.
FIG. 2 is a flow chart of a page content processing method of the present invention.
Fig. 3 is a logical block diagram of the page content processing apparatus of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in detail with reference to the accompanying drawings and specific embodiments. The invention aims to provide page content processing service for the small terminal at the server side and improve the webpage browsing experience of a user on the small terminal. The small-sized terminals mainly include mobile phones, PDAs, handheld game terminals, handheld reading terminals, and the like. The terminal is convenient to carry, but when the user uses the terminal, the terminal screen has difficulty in completely displaying standard Web page content under the condition of readable font size. The invention will be described in terms of how it can be implemented, using the most typical cell phone as an example. The present invention is described in the preferred embodiment as a software implementation, but does not exclude a hardware or firmware implementation.
Fig. 2 and fig. 3 show the flow of the page content processing method of the present invention and the corresponding logical structure of the page content processing apparatus. Referring first to fig. 2, a processing flow according to a preferred embodiment of the present invention includes the following steps.
Step 101, extracting user parameters from the user access request, and then turning to step 102. The user usually sends an access request to the server via the HTTP protocol, and the access request includes many user-related parameters. These parameters typically include the address of the destination page that the user requests access to (usually indicated by the URL field in the message) and user terminal parameters. When a user sends a request to a server through an HTTP protocol, the message naturally carries the accessed URL; in general, the message further includes a User-Agent field, which is generally used to store software platform information such as an operating system and a browser of the mobile phone terminal, manufacturer information, and hardware platform information such as a processor dominant frequency, a memory size, and a screen size. The user terminal parameters may describe the mobile phone in a sense, for example, if the operating system is the operating system of apple inc, such a mobile phone is usually a high-end smart machine, and further, for example, if the processor operates at the main frequency of 1GHZ and the screen is larger than 3 inches, such a mobile phone is usually a medium-end smart machine.
Step 102, code conversion is carried out on the target page accessed by the user.
In the present invention, step 102 is an optional step, because in implementing the present invention, there may be two situations:
I. the content of the target page requested to be accessed by the user may come from a third party. Taking the internet service provided by the applicant-hundredth company as an example, if the URL that the user currently requests to access is more specific, for example: http:// gate. basic. com/? src ═ http% 3A% 2F% 2 fww.java2s.com% 2F. The content that the user actually wishes to obtain at this point comes from http:// www.java2s.com. The target page to which http:// www.java2s.com points may be a standard Web page designed for personal computers. For many mobile phone platforms, it is inaccessible or has a poor access experience. In another situation, the user may have access to a product or service that was just developed in hundreds of degrees, and the product or service does not have a version page built in to suit the browsing of the mobile phone. The server side provides transcoding services for the user. Further, since the user terminal parameters are already obtained in step 101, the target page content may be correspondingly transcoded according to the user terminal parameters in combination with the target page content. For example, corresponding code conversion is performed according to an operating system of the mobile phone and a browser carried by the mobile phone, so that the converted page can be suitable for browsing on a screen of the mobile phone.
And II, the target page which the user requests to access has a proper version at the server. For example, the user accesses a more mature hundredth of a known channel (http:// zhidao. baidu. com), the server has one or more target pages of handset versions (usually corresponding to multiple types of handsets), and thus the transcoding in step 102 is naturally not required.
In the preferred embodiment, both cases I and II are implemented in correspondence to different servers. Such as the case where the URL the user requests access to contains a gate, base, com, or other predetermined domain name, is handled on a class a server that provides the transcoding service described in step 102. While for other cases, the processing is performed on the class B server, which skips the processing of step 102. This is only a setting on service allocation for the service end, and details are not described.
103, extracting the content of a target page which is requested to be accessed by a user, and turning to 104;
according to the address of the target page (i.e. the URL of the user request message) that the user requests to access, the corresponding target page that the user requests to access can be obtained. Many times, however, such a destination page is not suitable for direct provision to the user. Many service providers render the content of the page provided by themselves and then provide the rendered page to the user. As described in step 102, since the server embeds multiple versions of the target page, further, this step can load different versions of the target page according to the parameters of the user terminal.
And step 104, identifying and extracting the URL in the target page content, if the URL exists in the target page content, processing in step 105, and if not, processing in step 106.
For a target page accessed by a user, the content thereof is generally composed of text as well as multimedia content. For textual content, it may include some referenced URLs in addition to the ordinary textual content. The goal of this step is to identify the URL in the text content. In terms of the recognition method, since the character composition of the URL is greatly different from that of the ordinary text content, the URL can be recognized according to the character composition characteristics of the URL. The invention provides an exemplary piece of code to aid understanding.
$r=preg_match(′/(?:https?\:\/\/)?([^\/]+\/?)(.*)/i′,$text,$ma);
if(mb_strlen($ma[2])>$urllen){
$ma[2]=′...′.mb_substr($ma[2],-6,mb_strlen($ma[2]),′utf-8′);
}
return$ma[1].$ma[2];
Step 105, performing truncated rendering on the identified URL.
Because the URL included in the text content of the target page may be too long, resulting in poor browsing experience of the user using the mobile phone, the step truncates the longer URL in the text content of the target page. However, in order to ensure that the user has a rough idea of the source of the truncated URL, the truncation needs to be performed according to a certain policy. The truncation strategy of the invention comprises the following steps:
1) intercepting a main domain name part in the URL as a first part;
2) intercepting a plurality of characters behind the URL main domain name as a second part;
3) and when the characters between the second part and the first part are cut off, inserting an omission symbol as a third part to connect the first part and the second part, and finally forming the cut-off URL.
Intercepting the main domain name in the truncation process can ensure that the user can know the source site of the truncated URL, and the user is prevented from being cheated by a phishing website. And intercepting a plurality of characters behind the main domain name can assist the user in judging the content of the page pointed by the truncated URL. Some preferred auxiliary truncation strategies are provided below for illustration.
Policy example 1: the second part is optimized to be intercepted based on the foregoing truncation policy, and the last character after "/" in the URL is intercepted as the second part, such a policy is easily understood, for example, a truncated URL example: zhidao.baidu.com/../123456. html. It is apparent that a longer URL may be truncated to a shorter URL, while key information in the URL useful for the user to screen the URL-domain name and file name-are retained.
Policy example 2: on the basis of the foregoing truncation strategy, it is further determined when intercepting the second part whether the length of the characters after the main domain name, i.e. after the first "/" after the main domain name, exceeds a first predetermined threshold length, and if not, the main domain name and all the characters thereafter are retained. Otherwise, a character with a preset length is cut forward from the end of the URL to serve as a second part. Assume that the first predetermined threshold length is 15 and the predetermined length is 6. Now for a URL such as http:// zhidao.baidu.com/error.html, since the length after the main domain name does not exceed 15 characters, the result of the truncation is: html, only the identification part of the http protocol is cut off, and the useful information in the URL is basically preserved. For another example: http:// zhidao.baidu.com/query/153956116. htmlpush ═ bening, which is truncated according to strategy 2, is shown as: zhidao.baidu.com/. bening. At this time, the main domain name and partial URL parameter information are reserved to form a truncated URL, and the user can be assisted to know the URL.
Policy example 3: since the user terminal parameter information is already obtained in step 101, different truncation policies may be implemented according to different user terminal parameters. Assume that all handsets are classified into two types-normal and smart. Determining the type of the user mobile phone according to the user terminal parameters obtained in the step 101, and executing different truncation strategies according to different mobile phone types. For example, the policy of example 2 can be invoked for a normal type handset; the policy of example 1 is invoked for a smartphone. Of course the policy of example 2 may also be applied to a smart phone, except that for a smart phone, a second predetermined threshold length is used in the policy of example 2, preferably the second predetermined threshold length is greater than the first predetermined threshold length, such as 30. The above is only a simple example, and the types of mobile phones can be further subdivided according to the types of mobile phones commonly found on the market at present. For example, the types of mobile phones can be divided into four types: wml edition (e.g. low-end without operating system), normal edition (e.g. normal smartphone with operating system), colorful edition (e.g. smartphone with good performance), high-end edition (e.g. iphone and part Android phone)
And step 106, continuing to perform other renderings on the target page content, and then turning to step 107. For some conventional rendering processes, reference may be made to prior art implementations. However, these rendering processes are not essential to the present invention. A method for processing and inserting translation address code in cooperation with URL truncation is mainly introduced. For example, after the URL is truncated and rendered, it is determined whether a corresponding conversion address code (for example, a domain name prefix gate. basic. com) needs to be added or not by combining the user terminal parameter and the target page pointed by the truncated URL, and if so, the conversion address code is inserted into the truncated URL, otherwise, the process goes to step 107. For some high-end mobile phones, it is not necessary to insert a translation address code to trigger the target page to perform the transcoding process of step 102, so as to smoothly browse the target page, whereas for many common mobile phones, it is necessary to insert a translation address code into a truncated URL to trigger the target page to perform the transcoding process of step 102, so as to smoothly browse the target page. This allows the user to click on the truncated URL, as described in step 102, case I, and the corresponding request may be sent to the class a server for processing. The class a server needs the user to click on the truncated URL and the user terminal parameters of the user to trigger step 102, so that the process can return to step 101 and then to step 102. Two examples are provided below for illustration:
example 1:
[ original text code ]
Figure BDA0000038421050000111
And (3) for the code after the truncation and rendering of the common mobile phone:
Figure BDA0000038421050000112
and (3) for the smart phone, performing truncated rendering on the code:
Figure BDA0000038421050000113
example 2:
[ original text code ]
<a href=″http://www.java2s.com/″target=″_blank″>http://www.java2s.com/</a>
Because the length after the main domain name is less than 15, the truncated rendered codes of the common mobile phone and the smart mobile phone are the same:
Figure BDA0000038421050000114
step 107, returning the processed target page to the user. After the processing of step 101 and step 106, the user may obtain a rendered page, which is generally more suitable for the user to browse by using a mobile phone. Especially, when the text content of the target page contains a long URL, the URL is truncated and is easy to display on the mobile phone screen of the user. When the target page pointed by the truncated URL needs to be transcoded, the user clicks the truncated URL due to the previous process of inserting the translation address code, and the process of transcoding is performed in step 102. Such as http:// gate. basic. com/? src ═ http% 3A% 2F% 2 fww.java2s.com, and after the service provider transcoding process, the cell phone version www.java2s.com page appropriate for the cell phone screen is returned to the user's cell phone.
Referring to fig. 3, the present invention further provides a page content processing apparatus, including: a user interaction unit 210, a parameter extraction unit 220, a transcoding unit 230, a content extraction unit 240, a URL identification unit 250, and a page rendering unit 260.
The user interaction unit 210 is configured to receive an access request of a user and return a target page processed by a server to the user.
The parameter extraction unit 220 is configured to extract user parameters from the user access request. The user usually sends an access request to the server via the HTTP protocol, and the access request includes many user-related parameters. These parameters typically include the address of the destination page that the user requests access to (usually indicated by the URL field in the message) and user terminal parameters. When a user sends a request to a server through an HTTP protocol, the message naturally carries the accessed URL; in general, the message further includes a User-Agent field, which is generally used to store software platform information such as an operating system and a browser of the mobile phone terminal, manufacturer information, and hardware platform information such as a processor dominant frequency, a memory size, and a screen size. The user terminal parameters may describe the mobile phone in a sense, for example, if the operating system is the operating system of apple inc, such a mobile phone is usually a high-end smart machine, and further, for example, if the processor operates at the main frequency of 1GHZ and the screen is larger than 3 inches, such a mobile phone is usually a medium-end smart machine.
The code conversion unit 230 is configured to perform code conversion on a target page accessed by a user. In the present invention, the transcoding unit 230 is an optional logic module, because in implementing the present invention, there may be two situations:
I. the content of the target page requested to be accessed by the user may come from a third party. Taking the internet service provided by the applicant-hundredth company as an example, if the URL that the user currently requests to access is more specific, for example: http:// gate. basic. com/? src ═ http% 3A% 2F% 2 fww.java2s.com% 2F. The content that the user actually wishes to obtain at this point comes from http:// www.java2s.com. The target page to which http:// www.java2s.com points may be a standard Web page designed for personal computers. For many mobile phone platforms, it is inaccessible or has a poor access experience. In another situation, the user may have access to a product or service that was just developed in hundreds of degrees, and the product or service does not have an embedded version of the page suitable for mobile browsing. The server side provides transcoding services for the user. Further, since the parameter extracting unit 220 has already obtained the user terminal parameters, corresponding transcoding can be performed according to the user terminal parameters in combination with the characteristics of the target page. For example, corresponding code conversion is performed according to an operating system of the mobile phone and a browser carried by the mobile phone, so that the converted page can be suitable for browsing on a mobile phone screen.
And II, the target page which the user requests to access has a proper version at the server. For example, the user accesses a more mature hundred-degree known channel (http:// zhidao. baidu. com), and the server has one or more target pages of handset versions (usually corresponding to multiple types of handsets), so that the transcoding unit 230 is not required to perform transcoding naturally.
In the preferred embodiment, both cases I and II are implemented in correspondence to different servers. Such as the case where the URL to which the user requests access contains a gate, base, com, or other predetermined domain name, is processed on a class a server that provides a service with the function of the transcoding unit 230. While for other cases processing is done on a class B server, which may omit the transcoding unit 230. This is only a setting on service allocation for the service end, and details are not described.
The content extracting unit 240 is configured to extract content of a target page that a user requests to access. According to the address of the target page (i.e. the URL of the user request message) that the user requests to access, the corresponding target page that the user requests to access can be obtained. However, such a target page is not generally suitable for being directly provided to the user. Many service providers render the content of pages provided by themselves and then provide the rendered pages to users. As described above, since the server embeds multiple versions of the target page, further, different versions of the target page can be loaded according to the parameters of the user terminal in this step.
The URL identifying unit 250 is configured to identify and extract a URL in the target page content, and if the URL exists in the target page content, the page rendering unit 260 performs rendering processing on the identified URL. If the URL does not exist in the target page content, the process may be transferred to the page rendering unit 260 for other rendering processes, or the process may be transferred to the user interaction unit 210 by skipping the rendering process.
For a target page accessed by a user, the content thereof is generally composed of text as well as multimedia content. For textual content, it may include some referenced URLs in addition to the ordinary textual content. The function of the page rendering unit 260 is to identify a URL in the text content. In terms of the recognition method, since the character composition of the URL is greatly different from that of the ordinary text content, the URL can be recognized according to the character composition characteristics of the URL. The invention provides an exemplary piece of code to aid understanding.
$r=preg_match(′/(?:https?\:\/\/)?([^\/]+\/?)(.*)/i′,$text,$ma);
if(mb_strlen($ma[2])>$urllen){
$ma[2]=′...′.mb_substr($ma[2],-6,mb_strlen($ma[2]),′utf-8′);
}
return$ma[1].$ma[2];
The page rendering unit 260 is configured to perform truncated rendering on the identified URL.
Because the URL included in the text content of the target page may be too long, resulting in poor browsing experience of the user using the mobile phone, the step truncates the longer URL in the text content of the target page. However, in order to ensure that the user has a rough idea of the source of the URL, the truncation needs to be performed according to certain policies. The truncation strategy of the invention comprises the following steps:
1) intercepting a main domain name part in the URL as a first part;
2) intercepting a plurality of characters behind the URL main domain name as a second part;
3) and when the characters between the second part and the first part are cut off, inserting an omission symbol as a third part to connect the first part and the second part, and finally forming the cut-off URL.
Intercepting the main domain name in the truncation process can ensure that the user can know the source site of the truncated URL, and the user is prevented from being cheated by a phishing website. And intercepting a plurality of characters behind the main domain name can assist the user in judging the content of the page pointed by the truncated URL. Some preferred auxiliary truncation strategies are provided below for illustration.
Policy example 1: the second part is optimized to be intercepted based on the foregoing truncation policy, and the last character after "/" in the URL is intercepted as the second part, such a policy is easily understood, for example, a truncated URL example: zhidao.baidu.com/../123456. html. It is apparent that a longer URL may be truncated to a shorter URL, while key information in the URL useful for the user to screen the URL-domain name and file name-are retained.
Policy example 2: on the basis of the foregoing truncation strategy, it is further determined when intercepting the second part whether the length of the characters after the main domain name, i.e. after the first "/" after the main domain name, exceeds a first predetermined threshold length, and if not, the main domain name and all the characters thereafter are retained. Otherwise, a character with a preset length is cut forward from the end of the URL to serve as a second part. Assume that the first predetermined threshold length is 15 and the predetermined length is 6. Now for a URL such as http:// zhidao.baidu.com/error.html, since the length after the main domain name does not exceed 15 characters, the result of the truncation is: html, only the identification part of the http protocol is cut off, and the useful information in the URL is basically preserved. For another example: http:// zhidao.baidu.com/query/153956116. htmlpush ═ bening, which is truncated according to strategy 2, is shown as: zhidao.baidu.com/. bening. At this time, the main domain name and partial URL parameter information are reserved to form a truncated URL, and the user can be assisted to know the URL.
Policy example 3: since the parameter extraction unit 220 has already obtained the user terminal parameter information, the page rendering unit 260 may perform different truncation policies according to different user terminal parameters. Assume that all handsets are classified into two types-normal and smart. The page rendering unit 260 determines the type of the user mobile phone according to the previously obtained user terminal parameters, and executes different truncation strategies according to different mobile phone types. For example, the policy of example 2 can be invoked for a normal type handset; the policy of example 1 is invoked for a smartphone. Of course the policy of example 2 may also be applied to a smart phone, except that for a smart phone, a second predetermined threshold length is used in the policy of example 2, preferably the second predetermined threshold length is greater than the first predetermined threshold length, such as 30. The above is only a simple example, and the types of mobile phones can be further subdivided according to the types of mobile phones commonly found on the market at present. For example, the types of mobile phones can be divided into four types: wml edition (e.g. low-end machine without operating system), normal edition (e.g. normal smart phone with operating system), colorful edition (e.g. smart phone with good performance), high-end edition (e.g. iphone and Android phone)
The page rendering unit 260 is further configured to continue to perform other rendering on the content of the target page, and submit the rendered target page to the user interaction unit 210. The user interaction unit 210 returns the rendered target page to the user, and such target page is generally more suitable for the user to browse by using the mobile phone. Especially, when the target page content contains a long URL, the URL can be truncated and is easy to display on the mobile phone screen of the user. For some conventional rendering processes, reference may be made to prior art implementations. However, these rendering processes are not essential to the present invention. A method for processing and inserting translation address code in cooperation with URL truncation is mainly introduced. For example, after the URL is truncated and rendered, it is determined whether or not a corresponding translation address code (for example, a domain name prefix gate. basic. com) needs to be added by combining the user terminal parameter and the target page pointed by the truncated URL. If necessary, the translation address code is inserted into the truncated URL, otherwise, the user interaction unit 210 returns the processed target page to the user. For some high-end mobile phones, it is not necessary to insert a translation address code to trigger the code translation unit 230 to perform code translation processing on the target page, so as to smoothly browse the target page, however, for many common mobile phones, it is necessary to insert a translation address code into a truncated URL to trigger the code translation unit 230 to perform code translation processing on the target page, so as to smoothly browse the target page. This allows the corresponding request to be sent to the class a server for processing when the user clicks on the truncated URL, as described in case I. While a class a server with a transcoding unit 230 may provide transcoding services. Two examples are provided below for illustration:
example 1:
[ original text code ]
Figure BDA0000038421050000161
And (3) for the code after the truncation and rendering of the common mobile phone:
Figure BDA0000038421050000171
and (3) for the smart phone, performing truncated rendering on the code:
example 2:
[ original text code ]
<a href=″http://www.java2s.com/″target=″_blank″>http://www.java2s.com/</a>
Because the length after the main domain name is less than 15, the truncated rendered codes of the common mobile phone and the smart mobile phone are the same:
Figure BDA0000038421050000173
when the target page pointed by the truncated URL needs to be subjected to code conversion processing, due to the previous processing of inserting the conversion address code, when the user clicks the truncated URL, the user can reach the A-type server to carry out code conversion. Such as http:// gate. basic. com/? src ═ http% >, 3A% >, 2F% >, 2 fww.java2s.com, and after transcoding processing at the server, the cell phone version www.java2s.com page appropriate for the cell phone screen is returned to the user's cell phone.
The invention effectively improves the use experience of a user when the user uses the small terminal to browse the page, reserves key URL information for the user while ensuring the browsing experience of the user by various flexible URL shortening strategies, prevents the user from being deceived by a phishing website, and effectively prevents potential safety risks possibly generated by URL shortening.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (14)

1. A page content processing method is applied to an Internet server and provides page content processing service for a small terminal, and is characterized by comprising the following steps:
A. extracting the content of a target page which is requested to be accessed by a user;
B. judging whether the target page content has a URL or not; if yes, the step C is skipped to continue;
C. truncating the URL in the target page content according to a preset strategy, wherein the preset strategy comprises the following steps:
strategy C1, intercepting a main domain name part in the URL as a first part;
strategy C2, intercepting a plurality of characters behind the main domain name of the URL as a second part;
policy C3, inserting an elision symbol as a third part to connect the first part with the second part to form a truncated URL when a character is truncated between the second part and the first part;
D. and returning the processed target page to the user.
2. The method according to claim 1, wherein the policy C2 is specifically: intercepting all characters of the URL starting with the last "/" as the second portion.
3. The method according to claim 1, wherein the policy C2 is specifically: judging whether the length of the character behind the main domain name exceeds a preset threshold length;
if not, all characters behind the main domain name are reserved as the second part;
otherwise, a character with a preset length is cut forward from the end of the URL to serve as the second part.
4. The method of claim 3, further comprising:
E. extracting parameters in the user access request, wherein the parameters at least comprise user terminal parameters, and judging whether the user terminal belongs to an intelligent type or a common type according to the user terminal parameters;
the specific step of judging whether the character length after the main domain name exceeds a preset threshold length is as follows: judging whether the character length behind the main domain name exceeds a preset threshold length corresponding to a terminal;
the intelligent terminal corresponds to a second preset threshold length, the common terminal corresponds to a first preset threshold length, and the second preset threshold length is larger than the first preset threshold length.
5. The method of claim 1, further comprising:
E. extracting parameters in the user access request, wherein the parameters at least comprise user terminal parameters;
F. and transcoding the target page content.
6. The method according to claim 5, wherein step F is in particular: and carrying out code conversion on the target page content according to the user terminal parameters and the target page content.
7. The method of claim 1, further comprising, prior to step D:
E. extracting parameters in the user access request, wherein the parameters at least comprise user terminal parameters;
G. and D, judging whether a corresponding conversion address code needs to be added or not by combining the user terminal parameter and the target page pointed by the truncated URL, inserting the conversion address code if necessary, and otherwise, turning to the step D.
8. A page content processing device, which is applied to an Internet server and provides page content processing service for a small terminal, is characterized in that the device comprises: the system comprises a content extraction unit, a URL identification unit, a page rendering unit and a user interaction unit; wherein,
the content extraction unit is used for extracting the target page content which is requested to be accessed by the user;
the URL identification unit is used for judging whether a URL exists in the target page content or not; if the URL exists, submitting the identified URL to the page rendering unit for truncation processing, otherwise skipping the truncation processing for continuation;
the page rendering unit is configured to truncate the URL in the target page content according to a predetermined policy, where the predetermined policy includes:
strategy C1, intercepting a main domain name part in the URL as a first part;
strategy C2, intercepting a plurality of characters behind the main domain name of the URL as a second part;
policy C3, inserting an elision symbol as a third part to connect the first part with the second part to form a truncated URL when a character is truncated between the second part and the first part;
and the user interaction unit is used for returning the processed target page to the user.
9. The apparatus according to claim 8, wherein the policy C2 is specifically: intercepting all characters of the URL starting with the last "/" as the second portion.
10. The apparatus according to claim 8, wherein the policy C2 is specifically: judging whether the length of the character behind the main domain name exceeds a preset threshold length;
if not, all characters behind the main domain name are reserved as the second part;
otherwise, a character with a preset length is cut forward from the end of the URL to serve as the second part.
11. The apparatus of claim 10, further comprising:
the parameter extraction unit is used for extracting parameters in the user access request, wherein the parameters at least comprise user terminal parameters;
the page rendering unit is further used for judging whether the user terminal belongs to an intelligent type or a common type according to the user terminal parameters;
the specific step of judging whether the character length after the main domain name exceeds a preset threshold length is as follows: judging whether the character length behind the main domain name exceeds a preset threshold length corresponding to a terminal;
the intelligent terminal corresponds to a second preset threshold length, the common terminal corresponds to a first preset threshold length, and the second preset threshold length is larger than the first preset threshold length.
12. The apparatus of claim 8, further comprising:
and the code conversion unit is used for carrying out code conversion on the target page.
13. The apparatus of claim 12, further comprising:
the parameter extraction unit is used for extracting parameters in the user access request, wherein the parameters at least comprise user terminal parameters;
the code conversion unit is used for carrying out code conversion on the target page according to user terminal parameters and the target page content.
14. The apparatus of claim 8, further comprising:
the parameter extraction unit is used for extracting parameters in the user access request, wherein the parameters at least comprise user terminal parameters;
and the page rendering unit is further used for judging whether a corresponding conversion address code needs to be added or not by combining the user terminal parameter and the target page pointed by the truncated URL, inserting the conversion address code if the corresponding conversion address code needs to be added, and returning the processed target page to the user by the user interaction unit if the corresponding conversion address code does not need to be added.
CN 201010589768 2010-12-15 2010-12-15 Page content processing method and device Active CN102033926B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201010589768 CN102033926B (en) 2010-12-15 2010-12-15 Page content processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201010589768 CN102033926B (en) 2010-12-15 2010-12-15 Page content processing method and device

Publications (2)

Publication Number Publication Date
CN102033926A true CN102033926A (en) 2011-04-27
CN102033926B CN102033926B (en) 2013-09-04

Family

ID=43886819

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201010589768 Active CN102033926B (en) 2010-12-15 2010-12-15 Page content processing method and device

Country Status (1)

Country Link
CN (1) CN102033926B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012159291A1 (en) * 2011-06-24 2012-11-29 华为技术有限公司 Method for identifying type of terminal and wireless access network device
CN103067341A (en) * 2011-10-20 2013-04-24 中国移动通信集团四川有限公司 Online business hall access method, system and device
CN104281677A (en) * 2014-09-29 2015-01-14 百度在线网络技术(北京)有限公司 Page displaying method and device
CN104504335A (en) * 2014-12-24 2015-04-08 中国科学院深圳先进技术研究院 Fishing APP detection method and system based on page feature and URL feature
CN104601736A (en) * 2013-10-30 2015-05-06 腾讯科技(深圳)有限公司 Method and device for realizing short uniform resource locator (URL) service
CN110505253A (en) * 2018-05-16 2019-11-26 杭州海康威视系统技术有限公司 A kind of method, apparatus and storage medium of requested webpage information

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5761436A (en) * 1996-07-01 1998-06-02 Sun Microsystems, Inc. Method and apparatus for combining truncated hyperlinks to form a hyperlink aggregate
KR20040056420A (en) * 2002-12-23 2004-07-01 에스케이 텔레콤주식회사 Method and system of transmitting and exchanging url in mobile communication system
JP2005100086A (en) * 2003-09-25 2005-04-14 Mitsubishi Paper Mills Ltd Url transfer system
CN101674374A (en) * 2009-09-02 2010-03-17 优视动景(北京)技术服务有限公司 Webpage content extraction forwarding system for mobile communication terminal and application method thereof
US20100122161A1 (en) * 2008-11-13 2010-05-13 International Business Machines Corporation Method and system for intelligently truncating character strings in a service registry computing environment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5761436A (en) * 1996-07-01 1998-06-02 Sun Microsystems, Inc. Method and apparatus for combining truncated hyperlinks to form a hyperlink aggregate
KR20040056420A (en) * 2002-12-23 2004-07-01 에스케이 텔레콤주식회사 Method and system of transmitting and exchanging url in mobile communication system
JP2005100086A (en) * 2003-09-25 2005-04-14 Mitsubishi Paper Mills Ltd Url transfer system
US20100122161A1 (en) * 2008-11-13 2010-05-13 International Business Machines Corporation Method and system for intelligently truncating character strings in a service registry computing environment
CN101674374A (en) * 2009-09-02 2010-03-17 优视动景(北京)技术服务有限公司 Webpage content extraction forwarding system for mobile communication terminal and application method thereof

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012159291A1 (en) * 2011-06-24 2012-11-29 华为技术有限公司 Method for identifying type of terminal and wireless access network device
CN103609193A (en) * 2011-06-24 2014-02-26 华为技术有限公司 Method for identifying type of terminal and wireless access network device
CN103067341A (en) * 2011-10-20 2013-04-24 中国移动通信集团四川有限公司 Online business hall access method, system and device
CN103067341B (en) * 2011-10-20 2017-05-03 中国移动通信集团四川有限公司 Online business hall access method, system and device
CN104601736A (en) * 2013-10-30 2015-05-06 腾讯科技(深圳)有限公司 Method and device for realizing short uniform resource locator (URL) service
WO2015062491A1 (en) * 2013-10-30 2015-05-07 Tencent Technology (Shenzhen) Company Limited Methods and apparatus for realizing short url service
US9544355B2 (en) 2013-10-30 2017-01-10 Tencent Technology (Shenzhen) Company Limited Methods and apparatus for realizing short URL service
CN104601736B (en) * 2013-10-30 2018-10-23 腾讯科技(深圳)有限公司 A kind of implementation method and device of short URL services
CN104281677A (en) * 2014-09-29 2015-01-14 百度在线网络技术(北京)有限公司 Page displaying method and device
CN104504335A (en) * 2014-12-24 2015-04-08 中国科学院深圳先进技术研究院 Fishing APP detection method and system based on page feature and URL feature
CN104504335B (en) * 2014-12-24 2017-12-05 中国科学院深圳先进技术研究院 Fishing APP detection methods and system based on page feature and URL features
CN110505253A (en) * 2018-05-16 2019-11-26 杭州海康威视系统技术有限公司 A kind of method, apparatus and storage medium of requested webpage information

Also Published As

Publication number Publication date
CN102033926B (en) 2013-09-04

Similar Documents

Publication Publication Date Title
US7853593B2 (en) Content markup transformation
US20130275496A1 (en) Method, Apparatus And System For Rendering Web Page
CN109636488B (en) Advertisement putting method and device
US20030100320A1 (en) Efficient hyperlinks for transmitted hyperlinked information
CN102033926B (en) Page content processing method and device
US20020069296A1 (en) Internet content reformatting apparatus and method
US9571556B2 (en) Browser kernel adaptation method and browser therefor
US20090063530A1 (en) System and method for mobile web service
CN103810268B (en) Search result recommendation information loading method, device and system and URL detection method, device and system
CN105808587B (en) method, gateway equipment and system for embedding information in webpage
CN102523533A (en) Management method of online video advertisement related to video content
US20130305131A1 (en) Method, system and computer storage medium for pre-reading network data
WO2015109928A1 (en) Method, device and system for loading recommendation information and detecting url
WO2011140784A1 (en) Method for screening mobile terminal from accessing wireless network information, mobile terminal and system thereof
US10049089B2 (en) Methods for compressing web page menus and devices thereof
CN103793508B (en) A kind of loading recommendation information, the methods, devices and systems of network address detection
CN103825772B (en) Identifying user clicks on the method and gateway device of behavior
CN104426863B (en) A kind of page request method, page request device, transfer server and terminal
WO2008132706A1 (en) A web browsing method and system
US20010056497A1 (en) Apparatus and method of providing instant information service for various devices
WO2014044154A1 (en) Method and apparatus for obtaining information
CN105824951A (en) Retrieval method and retrieval device
US20140082484A1 (en) Method and apparatus for obtaining information
US10104196B2 (en) Method of and server for transmitting a personalized message to a user electronic device
CN101782915A (en) Method and device for subscribing to really simple syndication (RSS)

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant