[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN102298621B - System for obtaining page user focus degree PageFocus by method for aggregating and displaying same source information search engine based on focus degree - Google Patents

System for obtaining page user focus degree PageFocus by method for aggregating and displaying same source information search engine based on focus degree Download PDF

Info

Publication number
CN102298621B
CN102298621B CN 201110228853 CN201110228853A CN102298621B CN 102298621 B CN102298621 B CN 102298621B CN 201110228853 CN201110228853 CN 201110228853 CN 201110228853 A CN201110228853 A CN 201110228853A CN 102298621 B CN102298621 B CN 102298621B
Authority
CN
China
Prior art keywords
pagefocus
webpage
search
content
browser
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN 201110228853
Other languages
Chinese (zh)
Other versions
CN102298621A (en
Inventor
王东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN 201110228853 priority Critical patent/CN102298621B/en
Publication of CN102298621A publication Critical patent/CN102298621A/en
Application granted granted Critical
Publication of CN102298621B publication Critical patent/CN102298621B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Transfer Between Computers (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a method for aggregating and displaying the same source information search engine based on a focus degree and a system. The method comprises the following steps that a search engine finds all of target websites which are matched with the condition as original search results; the original search results are aggregated into a title search result according to the essential elements of content quality, displayed account information of a weighed power purchaser, service quality, and the like; and only the title research result is used as the research result to display to an inquirer, and the full research results are only unfolded to the inquirer as required. The system adopts a statistic server to match with the network browser to convert all of the operations of the user into a focus degree score value PageFocus to the page and to send the focus degree score value back to the statistic server to represent the content quality, so that the system can be used as a method for the search engine for selecting 'title search result' and performing result display ranking. The invention further relates to a method capable of automatically identifying user state and providing proper page style and content.

Description

The system that obtains web page user attention rate PageFocus based on the homologous information search engine aggregation display method of attention rate
Technical field
The present invention relates to computer networking technology, particularly utilize computing machine in the internet or enterprises provides the search engine technique of search service on the net.The invention still further relates to a kind of system and web site contents style self-reacting device and method of obtaining the web page user attention rate.
Background technology
Exist at present a large amount of " webpage or the network service in identical (or similar) source " on Internet, for example: 1 by same person or tissue writing by the article of massive duplication, viewpoint, Intelligence Page; 2 by same person or tissue interview (or issue) by the news report webpage of massive duplication; 3 by same person or be organized in being posted of BBS forum speech model; 5 different data formats that produced by same person or tissue, the multimedia file of compression factor; 6 executable program, data, design documents by same person or tissue generation; That produce and the information content that extensively copied of 7 other modes.These " webpage or the network services in identical (or similar) source " are enumerated in present search engine search results, occupy a large amount of lengths, and content is identical, and inconvenient inquiry browses.
Present various search engine and webpage seniority among brothers and sisters service system, all only adopted click traffic and the mode of the webpage residence time to weigh the popular degree of webpage, and the method for taking is main: 1) search engine class: rely on the inquiry click of Search Results to be calculated the popular degree of webpage, for example google, Baidu.2) ALEXA website seniority among brothers and sisters class: rely on the toolbar software that is embedded on browser, the user is sent it back server (parameter comprises current web page address, page open time) to the click of hyperlink and the webpage residence time, but do not comprise other appraisal procedures.The Alexa principle of work can referring to:
http://www.singtaonet.com/it/it_sp/t20051110_43674.html
http://www.people.com.cn/GB/it/8219/41552/41597/3109586.html
Present various website can be divided into following classification:
Classification one: all web site contents (for example: news website) all have same style and content to any user at synchronization.
Classification two: style that can be different according to user's set and display and content are (for example: the news website of google).
But these websites can not provide different display styles and content at real-time different conditions according to the user.
Summary of the invention
In order to improve the deficiency of the problems referred to above, the invention provides a kind of like this searching method, it can be aggregating into a record because of the identical Search Results that the searchers is had identical use value of content, it is the title search result, launch to check again as required the apparatus and method of other results, thereby avoid " title search result " clickedly to cause that the destination server visit capacity is excessive paralyse due to frequent, " title search result " click is distributed to apparatus and method on other Search Results targets automatically.The present invention also provides a kind of like this system, its utilization can with network on the web browser that coordinates of statistical server, user's all operations were behavior is converted into scoring to this webpage, and send it back statistical server, as the scoring to the degree of concern of webpage, thereby can be used as arrangement method and the instrument of search engine.The present invention also provides a kind of like this method: utilize the various information of can be obtainable, helping to judge user's environment of living in and state, in synchronization, same website, even the time in the same page, provide different display styles and contents to the user of different conditions.
To achieve these goals, a kind of searching method that the polymerization of homologous information site search engine is shown, it comprises the following steps:
(1) inquiry passes through Web browser or accessible with application software search engine, and input needs the keyword of inquiry;
(2) find whole qualified targeted sites as original searching results by search engine;
(3) the power buyer's who " becomes the title search result " by " homologous information processing module " inquiry accounts information, and choose the object that is used as " title search result " in original searching results in conjunction with other judgment rules;
(4) " the title search result " that only will be chosen by search engine Web server or application server shows the inquiry as Search Results, and provides one with " the button of " details or other information are checked in expansion " implication for it;
(5) inquiry also can press corresponding with it " button ", and search engine is illustrated in to it original searching results that finds in (2) again.
" homologous information processing module " has a plurality of " (the corresponding information kind) homologous information processing module " to form, such as: " homology Web Page Processing module ", " homology multimedia processing module ", " homology picture processing module ", " homology document process module ", " homology software processing module ", " same source data or database processing module ", " homology GIS message processing module ", " with the value network service processing module ", " with being worth the business information processing module " etc.
Described " homologous information processing module " comprises the steps:
(1) information of at first by " information category judge module ", Web crawler being received is carried out the kind judgement;
(2) with concentrated send to " (the corresponding information kind) the homologous information processing module " of the information of identical type;
(3) will enter " non-homogeneous (the corresponding information kind) object information storehouse " or " homology (the corresponding information kind) object information storehouse " by the search information filing after " (the corresponding information kind) homologous information processing module " processing.
(4) by system, " non-homogeneous (the corresponding information kind) object information storehouse " and " homology (the corresponding information kind) object information storehouse " is published on Web server, for inquiry's inquiry.As implementation method in another, also can directly provide inquiry service based on dynamic web page to the inquiry according to these two databases.
Described step by " homology Web Page Processing module " processing info web is as follows:
(1) when " search engine search part " receives the keyword that needs inquiry, at first judge by " Search Results has been distributed on the decision device on Web server " whether this keyword was inquired about by other people in the recent period, if be queried, and result is in " search engine search results Web server " upper issue, directly return to Search Results, the Web syndication that will have identical source in this result becomes a Search Results, after clicking " same source web page " button, can see the search result web page that another comprises whole Search Results on " search engine search results Web server ", complete whole query script,
(2) if when " search engine search part " receives the keyword that needs inquiry, judge that by " Search Results has been distributed on the decision device on Web server " this keyword do not inquired about by other people in the recent period, and also do not have corresponding Query Result in " search engine search results Web server " upper issue:
A. start " Webpage search device " search " non-homogeneous web results database " and " homology web results database " and find the web page address that meets searching key word, and obtain the content of these webpages;
if B. " Webpage search device " do not find the web page address that meets searching key word in " non-homogeneous web results database " and " homology web results database ", return to the result that the inquiry " does not have eligible webpage ", and this searching key word is joined next round to be upgraded in the task of " non-homogeneous web results database " and " homology web results database ", select into " non-homogeneous web results database " or " homology web results database " if found qualified web page address whether have with source web page according to it in renewal process, if so again someone to search for same keyword be just can find result,
(3) by " web page contents separation vessel ", web page contents and the hyperlink target that finds resolved into: the kinds such as multimedia, picture, word, hyperlink;
(4) produce court verdict by various content decision devices respectively:
A. produce target web contained " identical multimedia file degree SMS (Same Media Score) " by " content of multimedia decision device ";
B. produce target web contained " the degree SPS of identical picture (Same Photo Score) " by " image content decision device ";
C. produce target web contained " the degree STS of same text (Same Text Score) " by " word content decision device ";
D. produce target web contained " the degree SHS of identical hyperlink (Same Hyperlinks Score) " by " linked contents decision device ";
(5) obtain respectively " multimedia judgement weight SMP ", " picture is adjudicated weight SPP ", " word judgement weight STP ", " link judgement weight SHP " from " with source web page decision rule storehouse " and go on foot with (4) " identical multimedia file degree SMS ", " the degree SPS of identical picture ", " the degree STS of same text ", " the degree SHS of identical hyperlink " the doing mathematics multiplication that generates respectively;
(6) the mathematics multiplication result that (5) step was obtained is done addition, obtains " the homology degree SSS (Same of webpage
Sourc Score) ", homology degree SSS=(SMS*SMP)+(SPS*SPP)+(STS*STP)+(SHS*SHP);
Whether " the homology degree SSS " that (7) judge this webpage exceeds thresholding, if exceed thresholding be judged to be " same source web page " with other webpage, if do not exceed thresholding be judged to be " non-homogeneous webpage ";
(8) " the non-homogeneous webpage " that (7) step produced enters " non-homogeneous web results database " by " non-homogeneous Web Page Processing module "; " same source web page " that (7) step was produced enters " homology web results database " by " homology Web Page Processing module ";
(9) dynamically generated the static Web page of Search Results by the content of " search result web page distributor " basis " homology web results database " and " non-homogeneous web results database ", be published to " search engine search results Web server ", then present to inquiring user by browser;
(10) as the another kind of implementation method in (9) step, also can directly present to inquiring user by browser by " dynamic web page Web server ".
Describedly also can be comprised the steps: by " homologous information processing module "
(1) receiving inquiry's searching key word, and judging according to key words content and keyword grammer file or the network service that needs are searched by software;
(2) judgement " content that will search for is distributed on Web server? " if the target of search is distributed on " search engine search results Web server " directly returns to Search Results, will meet search condition in this result and have the file in identical source or the entrance that obtains of network service aggregates into one " title search result ", after clicking " same source file " button, can see the webpage that another comprises whole Search Results on " search engine search results Web server ", the inquiry can be seen meet whole Search Results of querying condition, complete search procedure.If the target of search is not distributed on " search engine search results Web server " since (3) step;
(3) return to the prompting that the inquiry " does not have qualified result ";
(4) this searching key word is joined next round and upgrade in the task of " homologous information index data base " and " non-homogeneous information index database ", and regularly start the renewal process of two databases;
(5) renewal process of " homologous information index data base " and " non-homogeneous information index database ":
A. by the emerging file destination of searcher search and webpage or service entrance, enter this entrance by software and obtain this document or network service;
B. by " content decision device " judge new-found information " belonging to same content with the content of current " homologous information index data base "? " if "Yes" it is included into this classification of " homologous information index data base " as a new element; If "No" judge that by " content decision device " content of its " with current non-homogeneous information index database " belongs to same content? "
If C. "Yes": " for current information and with it homology and be stored in information in ' non-homogeneous information index database ', a newly-built classification is also all transferred to ' homologous information index data base ' ";
If D. "No": " be the current newly-built classification of information, and deposit in ' non-homogeneous information index database ' ";
(6) dynamically generated the static Web page of Search Results by the content of " search result web page distributor " basis " homology web results database " and " non-homogeneous web results database ", be published to " search engine search results Web server ", then present to the inquiry who comes to search for by browser;
(7) as the another kind of implementation method in (6) step, also can directly present to inquiring user by browser by " dynamic web page Web server ".
Described when processing document by the homologous information processing module, the renewal process of " homologous information index data base " and " non-homogeneous information index database " is:
A. by the emerging document files of " document searching device " search and webpage or link entrance, enter this entrance by software and obtain this document or service;
B. by " word content decision device " and " image content decision device " judge new-found document content " belonging to same content with the content of current ' homology document index database '? " if "Yes" it is included into this classification of " homology document index database " as a new element; If "No" judge that by " document content decision device " content of its " with current non-homogeneous document index database " belongs to same content? "
If C. "Yes": " for current document and with it homology and be stored in document in ' non-homogeneous document index database ', a newly-built classification is also all transferred to ' homology document index database ' "; If "No" " be the current newly-built classification of document, and deposit in ' non-homogeneous document index database ' ";
Described related content decision device module comprises the steps:
(1) receive " being judged object ": can receive the multimedia in a plurality of sources, and record is judged the quantity I nputQuantity of object;
(2) search " being judged object " set attribute that participates in comparing, record the quantity SameQuantity that current attribute has identical value " being judged object ";
(3) " weight " value Power of the current attribute of input in deterministic process;
(4) calculate by whole " being judged object " goodness of fit on current attribute: PSame=SameQuantity*Power;
(5) return to (1) next " attribute " carried out (1)~(4), obtain the PSame of this attribute, until obtain the PSame value of subordinate's property;
(6) calculate and return the identical content degree value of " being judged object ": SameMediaPower=(all mathematics accumulated values of Psame value)/InputQuantity.
When content decision device module was the word content decision device, it comprised the steps:
(1) find out the total length value SameLenth of the part that has identical word or sentence in word content;
(2) find out in a plurality of word contents of input the length value MinLenth of the input characters that length is the shortest;
(3) return to word similarity degree value SameTextPower=SameLenth/MinLenth.
When content decision device module was the linked contents decision device, it comprised the steps:
(1) receive " being judged object ": the URL address of a plurality of hyperlinks;
(2) the target URL number of addresses that all occurred on estimative each hyperlink page pointed of statistics " being judged object " similarity degree: SameURLPower=;
(3) return to SameURLPower.
When content decision device module was business information content decision device, it comprised the steps:
(1) comparison participates in whether the business information of comparison is identical product or service, if "no" is returned to " inconsistent ", if "Yes" entered for (2) step.
(2) whether the judgement business information that participates in comparison has geographic position susceptibility, if "no" is returned to judged result " unanimously ", if "Yes" carry out (3) and go on foot.
(3) whether the supplier of the business information of judgement participation comparison is in identical city or zone, if "no" is returned to judged result " inconsistent ", if return to judged result " unanimously ".
The concrete methods of realizing that " title search result " selected is as follows:
(1) calculate the probability weights PWn that each " homology Search Results " becomes " title search result ":
PWn=TP*PageFocus/(RespDelay-K)
N: this Search Results is the n bar
Less than or equal to zero the time, (RespDelay-K) answering value is 1 as (RespDelay-K)
PageFocus: webpage attention rate value
RespDelay: web service operating lag
K: the service response constant, suggestion K is set to 50 milliseconds (ms).
TP: title search is power as a result
(2) add up all summations of the probability weights PWn of original " homology Search Results " of summation: the whole probability weights of PWall;
(3) calculate the probability that every " homology Search Results " becomes " title search result ": Pn=PWn/Pwall;
(4) according to the probability of Pn value, along with searchers's access action, dynamically select at random " title search result ", present to the searchers.
The computing method of the probability weights PWn of described " title search result " can also be:
A.PWn=(TP+PageFocus)/(RespDelay-K) or,
B.PWn=(TP+PageFocus)/RespDelay/K or,
c.PWn=TP*PageFocus/RespDelay/K。
Described " homologous information processing module ":
A. can be embedded in search engine;
B. can be placed between " search engine " and " search engine search results Web server ";
C. also can be used as pretreatment module is placed between " search engine " and searched website.
Described expansion checks that the button of details or other Meaning of Informations can be hyperlink or various software interface control.
A kind of system that obtains web page user Search Results attention rate comprises the PageFocus webserver, PageFocus web browser and webpage score server,
The PageFocus webserver comprises PageFocus browser ID registrar, the concern of PageFocusAccServer webpage statistical server, PageFocus browser online upgrading server and data encrypting and deciphering module;
The PageFocus web browser comprises PageFocus browser ID Registering modules, pays close attention to score value PageFocus computing module.
Its job step is as follows:
(1) " PageFocus web browser ", each browser all possesses globally unique ID identification number when mounted, or initiatively seeks in use " PageFocus browser ID registrar " on network to obtain globally unique ID identification number;
(2) " PageFocus web browser " possesses and has the general networks browser, and the user is converted to " paying close attention to score value PageFocus " of webpage and forms " PageFocus packet " according to weight to the operation of browser with to the operation of webpage, be passed to by procotol with cipher mode " the PageFocusAccServer webpage is paid close attention to statistical server " of this search engine;
(3) " PageFocusAccServer webpage pay close attention to statistical server " " paying close attention to score value PageFocus " of after " PageFocus packet " that each " PageFocus web browser " of receiving the whole world sent, its inside being comprised is added on corresponding webpage;
(4) " paying close attention to score value PageFocus " of each webpage of the whole world that comprises on " PageFocusAccServer webpage concern statistical server ", these information can form by various disposal routes: search engine is selected to can be used as the foundation of " title search result ", also can directly be announced out conduct " the popular degree ranking list of webpage " webpage seniority among brothers and sisters foundation, search engine in having the identical content Search Results service.
Described PageFocusAccServer webpage is paid close attention to statistical server and can be adopted mathematics logarithm or scientific notation to record score.
Described PageFocus packet can form when browser thoroughly cuts out this webpage, also can regularly form, and forms in the time of also can being accumulated to certain score value again.
Described concern score value PageFocus forms according to the listed weight of following table:
Figure BSA00000554541100081
Note:
Weighted value in 1 form is embodiment, and other numerical value also can adopt, and is scope of the present invention.
The calculation procedure of described word read speed is as follows:
A. mouse roller rolls: the each word line number of rolling of word read speed=(viewing area width/set width) */rolling time interval;
B. keyboard page turning: the word line number of word read speed=(viewing area width/set width) each page turning of */page turning time interval;
C. the forms scroll bar rolls: the each word line number of rolling of word read speed=(viewing area width/set width) */rolling time interval.
Described PageFocus packet comprises PageFocus browser ID, webpage URL and webpage PageFocus score value field.
Possesses each webpage of " same source web page " in the page rank process that the participation search engine provides, can use the summation of user's attention rate PageFocus score value that each " same source web page " obtain as the foundation of rank, that is: A can adopt the summation of user's attention rate PageFocus that each " same source web page " obtain as the rank foundation when participating in the search-engine results rank in " the title search result " of " same source web page "; Each webpage in B " same source web page " also can adopt the summation of user's attention rate PageFocus that each webpage of " the same source web page " of its subordinate obtains as the rank foundation when participating in the search-engine results rank.
A kind of automatic decision User Status also provides appropriate web page style and the method for content, and it comprises the steps:
(1) after " Website server cluster entrance " receives that the user accesses the request of this website webpage first, at first in access protocal or the IP layer protocol in obtain its IP address;
(2) inquiring about its IP address according to the IP address in " IP address properties database " is " IP address, workplace " or the IP address of leisure occasion " private or ", if " IP address, workplace " carried out for (3) step, if carried out for (4) step " the IP address of individual or leisure occasion ";
(3) obtain " IP address, workplace " residing geographic position, and obtain administrative time of this geographic area, if this IP address affiliated area is in the working time, " work style server " upper providing to it its access is provided to be fit to the Page Service that use the workplace, otherwise to carry out for (4) step;
(4) " individual and Casual Style server " upper providing to it its access is provided is fit to Page Service individual and that the leisure state uses.
By such scheme, can be identical and the Search Results that the searchers has identical use value is aggregated into a record content, i.e. title search result launches to check the apparatus and method of other results as required again.Designed and avoided " title search result " clickedly to cause that the destination server visit capacity is excessive paralyses due to frequent, " title search result " click has been distributed to device on other Search Results targets automatically.The present invention is except possessing existing search engine, the various network services that also possesses search various " multimedias ", " document ", " software ", " hardware and software source code or design document ", " data or database ", " information " is such as the function of file-sharing, FTP service, P2P service etc.
Utilization can with network on the web browser that coordinates of statistical server, user's all operations were behavior is converted into scoring to this webpage, and send it back statistical server, as the scoring to the degree of concern of webpage, thereby can be used as the rank instrument of search engine.
By web site contents style adaptive approach, the user can:
1. 9:00~18:00 in morning of 1~5 belongs to the working time week, and in running order people need to see succinctly, relatively rigorous style and as far as possible and the duty related content.
2. week 1~5 18:00 in evening~morning 9:00 and the whole day in week 6~7, belong to leisure time, and the people who is in the leisure state need to see style and the content of ripple alive, lively, leisure.
3. be in that people from workplace need to see succinctly, relatively rigorous style and as far as possible and the duty related content.
4. the people who is in family and Condom need to see style and the content of ripple alive, lively, leisure.
5. the people who is in other environment or state need to see with at that time environment and state adapt style and content.
Brief Description Of Drawings
Fig. 1 is the system works structural drawing of homologous information site search engine aggregation display method;
Fig. 2 is homologous information processing module cut-away view;
Fig. 3 is homology Web Page Processing module process flow diagram;
Fig. 4 is homology multimedia processing module process flow diagram;
Fig. 5 is homology picture processing module process flow diagram;
Fig. 6 is homology document process module process flow diagram;
Fig. 7 is homology software processing module process flow diagram;
Fig. 8 is same source data or database processing module process flow diagram;
Fig. 9 is homology GIS message processing module process flow diagram;
Figure 10 is with value network service processing module process flow diagram;
Figure 11 is with being worth business information processing module process flow diagram;
Figure 12 is for obtaining web page user attention rate system construction drawing;
Figure 13 is not for possessing the existing routine search engine web station system of content and style adaptive technique;
Figure 14 be the present invention possess content and style adaptive technique the search engine web site system.
Embodiment
Now the present invention is described further by reference to the accompanying drawings.
Fig. 1 is the system works structural drawing of homologous information site search engine aggregation display method.The 1st step: pass through Web browser or accessible with application software search engine by the inquiry, and input needs the keyword of inquiry.The 2nd step: find whole qualified targeted sites as " original searching results " by search engine.The 3rd step: by " homologous information processing module " inquiry power buyer's that " becomes the title search result " accounts information, and choose in " original searching results " in conjunction with other judgment rules during the object that is used as " title search result ": A " homologous information processing module " can be embedded in search engine; " homologous information processing module " can be placed between " search engine " and " search engine search results Web server "; C " homologous information processing module " also can be used as pretreatment module and is placed between " search engine " and searched website.The 4th step: " the title search result " that only will be chosen by search engine Web server or application server shows the inquiry as Search Results, and provides one with " button (the comprising hyperlink or various software interface control) " of " details or other information are checked in expansion " implication for it.The 5th step: only have the inquiry to wish further to launch certain " title search result ", and when pressing with it corresponding " button ", search engine is illustrated in to it " original searching results " that finds in " the 2nd step " again.
Fig. 2 is homologous information processing module cut-away view." homologous information processing module " is defined as: be mainly used to 1) judge that whether a plurality of nodes are arranged in the one group of information node that finds according to searching key word is that (these websites have same search to the inquiry and are worth or use value one or more repetition websites with information source, usually needn't all directly represent to the inquiry), and these are repeated websites aggregate into a Search Results and issue the inquiry, just these Search Results are presented when only having the inquiry to need the website of other equal values.2) mainly to concentrate on the search of webpage different with existing search engine, " homologous information processing module " can also process the various network services of various " multimedias ", " document ", " software ", " hardware and software source code or design document ", " data or database ", " information " except needing processing " Html webpage ", such as: file-sharing, FTP service, P2P service etc.
" homologous information processing module " adopts modular construction, can progressively develop as required and implement each module wherein, and possess extended capability, and each module also can further be strengthened the accuracy of its automatic decision simultaneously, comprising:
1 " information category judge module ": the kind of judgement information, and information of the same type is concentrated the processing module that sends to respective type information, as following module.
2 " homology Web Page Processing modules ": be used for judging and process belonging to same source and the inquiry being had the webpage of equal value of finding, for example: Html, ASP, JSP, PHP, content of BBS forum etc.
3 " homology multimedia processing modules ": be used for judging and processing the same source that belongs to of finding, and the inquiry had multimedia file or a network service of equal value, for example: .MP3, .AVI, .WMV .MPEG .WAV, .RM wait various video files, and various Video service access interface based on stream media technology.
4 " homology picture processing modules ": be used for judging and process belonging to same source or having identical content of finding, and the inquiry being had the picture of equal value, for example: .GIF .JPG .BMP .PNG etc.
5 " homology document process modules ": be used for judging and process belonging to same source, having identical or related content of finding, and the inquiry had various format file files or a network service of equal value, for example: " .Doc ", " .Txt ", " .Pdf ", " .XLS ", " .PPT " etc.
6 " homology software processing module ": can judge and process the same software that the computer application software installation procedure that finds belongs to same author, they can be to adapt to similar and different operating system, the software installation procedure of identical or different version.
7 " same source data or database processing modules ": be used for judging and process belonging to same source or having identical content of finding, and the inquiry is had equal value, the data file of known format or database file, for example: .DAT, .XLS .MDF .DBF etc.
8 " homology GIS message processing modules ": be used for judging and process belonging to same source or having identical content of finding, and the inquiry being had numerical map file or the service of equal value.
9 " with the value network service processing module ": be used for judging and process belonging to same source or having identical content of finding, and the inquiry had a network service of equal value, for example: the FTP download service of same file, relay simultaneously the IPTV service of a TV station, the mail service of 1GB capacity etc. is provided simultaneously.
10 " with being worth the business information processing modules ": be used for judging and process belonging to same source or having identical content of finding, be in identical geography or administrative region, and the inquiry had equal value, issue the commercial product of oneself or the ad content of service by network, for example: the egg that provides in same block is sold information, the haircut that provides in same block service sale information is in the operable telephonic communication service in same city etc." information category judge module "
" information category judge module " is mainly used in sorting out its type in the information of collecting, and delivers to corresponding message processing module.
The information source that " information category judge module " processed mainly contains 3 kinds of forms:
(1) form web page: information comes from the web page contents of website, also contains the hyperlink of pointing to particular file types in webpage simultaneously, for example: " http://www.008.org.cn/up/the_quiet_american.mp3 "
(2) network service form: comprise the network service entrance that the various network services device provides, for example: the kind sub-services of ftp file download service, various P2P (Pear To Pear) software (for example: BT download, eMule download), NEWS SERVER service etc.For knowing of network service entrance, two kinds of approach can be arranged:
A. the network service that can find on webpage: the network service entrance that can know by the analyzing web page content.
B. directly submit its network service entrance or content by Internet Service Provider to this search engine.
(3) data or database form: directly provide the Data Enter service to network by search engine, submitted to the information of oneself by the network user, the final information that forms data file or database form, when this search engine was queried, therefrom inquiry's requirement was satisfied in information extraction.
The kind determination methods of " form web page " information is as follows:
Webpage itself just can directly be exported to " homology Web Page Processing module " as " webpage " and process, in addition, " information category judge module " according to the webpage grammer (for example: Html, Java, JSP, ASP, ASPX, PHP etc. language) for the grammer of " hyperlink ", can directly parse the file type of its sensing, can distinguish its information type according to different file types, see following table for details:
Figure BSA00000554541100131
For example:
1. contain in webpage: " Http:// xxx/xxx/song.mp3" hyperlink, can judge that its target is " multimedia " type information.
2. contain in webpage: " Http:// xxx/xxx/song.rar" hyperlink, decompress after finding this file destination, find that the inside only contains " song.mp3 " can judge that still target is " multimedia " type information.
3. contain in webpage: " Http:// xxx/xxx/song.rar" hyperlink, decompress after finding this file destination, find that the title of file number, each file of file that the inside is contained and catalogue is all identical with the mounting disc of certain known software with size, can judge that it is " software " type information.
The kind determination methods of " network service form " information is as follows:
The 1st step: access this service as domestic consumer, to obtain its content.
The 2nd step: the content that obtains is classified according to following table.
Figure BSA00000554541100141
The 3rd step: if acquisition is compressed format files, need to launch classifying according to the 2nd step after its content.
The kind determination methods of " data or database form " information is as follows:
The 1st step: visit data file or database, to obtain its content.
The 2nd step: directly carry out " the 4th step " from data file or database if the information that obtains is file.
The 3rd step: if the information that obtains is the position of storing documents, need to access this position to obtain file destination from data file or database.
The 4th step: the content that obtains is classified according to following table.
Figure BSA00000554541100142
The 5th step: if acquisition is compressed format files, need to launch classifying according to 4 steps after its content." homology Web Page Processing module "
Fig. 3 is " homology Web Page Processing module " process flow diagram." homology Web Page Processing module " major function: will find according to searching key word, webpage with identical main contents, represent to the inquiry with " title search result " form, and can see the Query Result of the webpage with identical main contents that all inquires by " expansion " implication button.For improving substantially the serviceability of native system, we have adopted following technology:
Adopted the webpage distribution technology, use " search result web page distributor " Search Results to be published in advance " search engine search results Web server ", the searching requirement that direct response had been queried is avoided according to a large amount of calculating of request dynamic from the database generating dynamic web page.
" homologous information processing module " is placed in " non-homogeneous web results database " and " homology web results database " result is sub-category, and regularly be published to " search engine search results Web server " by " search result web page distributor ", avoid double counting and reduced the calculating stand-by period.
" homologous information processing module " treatment scheme is as follows:
the 1st step: when " search engine search part " receives the keyword that needs inquiry, at first judge by " Search Results has been distributed on the decision device on Web server " whether this keyword was inquired about by other people in the recent period, if be queried, and result is in " search engine search results Web server " upper issue, directly return to Search Results (seeing figure " M1 " mark), the Web syndication that will have identical source in this result becomes a Search Results, after clicking " same source web page " button, can see the search result web page that another comprises whole Search Results on " search engine search results Web server ", complete whole query script.
The 2nd step: if when " search engine search part " receives the keyword that needs inquiry, judge that by " Search Results has been distributed on the decision device on Web server " this keyword do not inquired about by other people in the recent period, and also do not have corresponding Query Result in " search engine search results Web server " upper issue:
Start " Webpage search device " search " non-homogeneous web results database " and " homology web results database " and find the web page address that meets searching key word, and obtain the content of these webpages.
if " Webpage search device " do not find the web page address that meets searching key word in " non-homogeneous web results database " and " homology web results database ", return to the result that the inquiry " does not have eligible webpage ", and this searching key word is joined next round to be upgraded in the task of " non-homogeneous web results database " and " homology web results database ", select into " non-homogeneous web results database " or " homology web results database " if found qualified web page address whether have with source web page according to it in renewal process, if so again someone to search for same keyword be just can find result.
The 3rd step: by " web page contents separation vessel ", web page contents and the hyperlink target that finds resolved into: the kinds such as multimedia, picture, word, hyperlink.
The 4th step: produce court verdict by various content decision devices respectively
A. produce target web contained " identical multimedia file degree SMS " (Same Media Score) (multimedia definition comprises: broadcast service or the file service of the broadcast service of Flash class, vedio/audio file or file service, IPTV/ direct broadcasting satellite/audio-video monitoring/real-time information such as performance/manual answering, other multimedia services) by " content of multimedia decision device ".
B. produce target web contained " the degree SPS of identical picture " (Same Photo Score) by " image content decision device ".
C. produce target web contained " the degree STS of same text " (Same Text Score) by " word content decision device ".
D. produce target web contained " the degree SHS of identical hyperlink " (Same Hyperlinks Score) by " linked contents decision device ".
The 5th step: obtain respectively " multimedia judgement weight SMP ", " picture is adjudicated weight SPP ", " word judgement weight STP ", " link judgement weight SHP " from " with source web page decision rule storehouse " and go on foot with the 4th " identical multimedia file degree SMS ", " the degree SPS of identical picture ", " the degree STS of same text ", " the degree SHS of identical hyperlink " the doing mathematics multiplication that generates respectively.
The 6th step: the mathematics multiplication result that will obtain in " the 5th step " is done addition, obtain " homology degree SSS (the Same Sourc Score) " of webpage, homology degree SSS=(SMS*SMP)+(SPS*SPP)+(STS*STP)+(SHS*SHP)
The 7th step: whether " the homology degree SSS " that judge this webpage exceeds thresholding, if exceed thresholding be judged to be " same source web page " with other webpage, if do not exceed thresholding be judged to be " non-homogeneous webpage ".
The 8th step: " the non-homogeneous webpage " that will produce in " the 7th step " enters " non-homogeneous web results database " by " non-homogeneous Web Page Processing module "; " the same source web page " that will produce in " the 7th step " enters " homology web results database " by " homology Web Page Processing module ".
The 9th step: the static Web page that is dynamically generated Search Results by the content of " search result web page distributor " basis " homology web results database " and " non-homogeneous web results database ", be published to " search engine search results Web server ", then present to inquiring user by browser.(seeing figure " M2 " mark).
As the another kind of implementation method in the 9th step, also can directly present to inquiring user by browser by " dynamic web page Web server ".(seeing figure " M3 " mark).
" web page contents sorter " can be realized by software, direct basis " Html grammer ", " ASP/ASPX grammer ", and " PHP ", the syntax parsing that uses on various webpages such as " JSP " goes out the type of each content.
" homology multimedia processing module "
Fig. 4 is " homology multimedia processing module " process flow diagram.For the multimedia file that meets search condition or service, " homology multimedia processing module " all adopts the hyperlink mode in the Html webpage to offer the person of being queried.For improving substantially the serviceability of native system, we have adopted following technology:
Adopted the webpage distribution technology, use " search result web page distributor " Search Results to be published in advance " search engine search results Web server ", the searching requirement that direct response had been queried is avoided according to a large amount of calculating of request dynamic from the database generating dynamic web page.
" homologous information processing module " is placed in " non-homogeneous multimedia index database " and " homology multimedia index database " result is sub-category, and regularly be published to " search engine search results Web server " by " search result web page distributor ", avoid double counting and reduced the calculating stand-by period.
" homology multimedia processing module " treatment scheme is as follows:
The 1st step: receiving inquiry's searching key word, and what look for according to key words content and keyword grammer judgement needs by software is multimedia file or service (for example, contain in keyword the searching of " .MP3 " expression needs be .MP3 file rather than the webpage that contains this word).
the 2nd the step: the judgement " content that will search for is distributed on Web server? " if the target of search is distributed on " search engine search results Web server " directly returns to Search Results (seeing figure " M1 " mark), will meet the multimedia Ploymerized Interface that obtains that search condition has identical source in this result and become one " title search result ", after clicking " same source file " button, can see the webpage that another comprises whole Search Results on " search engine search results Web server ", the inquiry can be seen meet whole Search Results of querying condition, complete search procedure.If the target of search is not distributed on " search engine search results Web server " since the 3rd step.
The 3rd step: return to the result that the inquiry " does not have eligible multimedia ".
The 4th step: this searching key word is joined next round upgrade in the task of " homology multimedia index database " and " non-homogeneous multimedia index database ", and regularly start the renewal process of two databases.
The 5th step: the renewal process of " homology multimedia index database " and " non-homogeneous multimedia index database ":
A. by the emerging multimedia file of " multimedia search device " search and webpage or service entrance, enter this entrance by software and obtain this document or service.
B. by " content of multimedia decision device " judge new-found content of multimedia " belonging to same content with the content of current " homology multimedia index database "? " if "Yes" it is included into this classification of " homology multimedia index database " as a new element; If "No" judge that by " content of multimedia decision device " content of its " with current non-homogeneous multimedia index database " belongs to same content? "
If C. "Yes": " for current multimedia and with it homology and be stored in multimedia in ' non-homogeneous multimedia index database ', a newly-built classification is also all transferred to ' homology multimedia index database ' "; If "No" " be the current newly-built classification of multimedia, and deposit in ' non-homogeneous multimedia index database ' ";
The 6th step: the static Web page that is dynamically generated Search Results by the content of " search result web page distributor " basis " homology web results database " and " non-homogeneous web results database ", be published to " search engine search results Web server ", then present to by browser the inquiry's (seeing figure " M2 " mark) who comes to search for.
As the another kind of implementation method in the 6th step, also can directly present to inquiring user by browser by " dynamic web page Web server ".(seeing figure " M3 " mark).
" homology picture processing module "
Fig. 5 is homology picture processing module process flow diagram.For the picture file that meets search condition or link, " homology picture processing module " all adopts the hyperlink mode in the Html webpage to offer the person of being queried.For improving substantially the serviceability of native system, we have adopted following technology:
Adopted the webpage distribution technology, use " search result web page distributor " Search Results to be published in advance " search engine search results Web server ", the searching requirement that direct response had been queried is avoided according to a large amount of calculating of request dynamic from the database generating dynamic web page.
" homologous information processing module " is placed in " non-homogeneous picture indices database " and " homology picture indices database " result is sub-category, and regularly be published to " search engine search results Web server " by " search result web page distributor ", avoid double counting and reduced the calculating stand-by period." homology picture processing module " treatment scheme is as follows:
The 1st step: receiving inquiry's searching key word, and sentencing according to key words content and keyword grammer by software
Disconnected needs are looked for is picture file or link (for example, contain in keyword the searching of " .JPG " expression needs be .JPG file rather than the webpage that contains this word).
the 2nd the step: the judgement " content that will search for is distributed on Web server? " if the target of search is distributed on " search engine search results Web server " directly returns to Search Results (seeing figure " M1 " mark), will meet the Ploymerized Interface that obtains that search condition has the picture in identical source in this result and become one " title search result ", after clicking " same source file " button, can see the webpage that another comprises whole Search Results on " search engine search results Web server ", the inquiry can be seen meet whole Search Results of querying condition, complete search procedure.If the target of search is not distributed on " search engine search results Web server " since the 3rd step.
The 3rd step: return to the result that the inquiry " does not have eligible picture ".
The 4th step: this searching key word is joined next round upgrade in the task of " homology picture indices database " and " non-homogeneous picture indices database ", and regularly start the renewal process of two databases.
The 5th step: the renewal process of " homology picture indices database " and " non-homogeneous picture indices database ":
A. by the emerging picture file of " picture searching device " search and webpage or link entrance, enter this entrance by software and obtain this document or service.
B. by " image content decision device " judge new-found image content " belonging to same content with the content of current " homology picture indices database "? " if "Yes" it is included into this classification of " homology picture indices database " as a new element; If "No" judge that by " image content decision device " content of its " with current non-homogeneous picture indices database " belongs to same content? "
If C. "Yes": " for current picture and with it homology and be stored in picture in ' non-homogeneous picture indices database ', a newly-built classification is also all transferred to ' homology picture indices database ' "; If "No" " be the current newly-built classification of picture, and deposit in ' non-homogeneous picture indices database ' ";
The 6th step: the static Web page that is dynamically generated Search Results by the content of " search result web page distributor " basis " homology web results database " and " non-homogeneous web results database ", be published to " search engine search results Web server ", then present to by browser the inquiry's (seeing figure " M2 " mark) who comes to search for.
As the another kind of implementation method in the 6th step, also can directly present to inquiring user by browser by " dynamic web page Web server ".(seeing figure " M3 " mark).
" homology document process module "
Fig. 6 is homology document process module process flow diagram.Homology document process module " support common document format: " .Txt ", " .Doc ", " .PPT ", " .PDF ", " .XLS " etc.For the document files that meets search condition or link, " homology document process module " all adopts the hyperlink mode in the Html webpage to offer the person of being queried.For improving substantially the serviceability of native system, we have adopted following technology:
Adopted the webpage distribution technology, use " search result web page distributor " Search Results to be published in advance " search engine search results Web server ", the searching requirement that direct response had been queried is avoided according to a large amount of calculating of request dynamic from the database generating dynamic web page.
" homologous information processing module " is placed in " non-homogeneous document index database " and " homology document index database " result is sub-category, and regularly be published to " search engine search results Web server " by " search result web page distributor ", avoid double counting and reduced the calculating stand-by period." homology document process module " treatment scheme is as follows:
The 1st step: receiving inquiry's searching key word, and what look for according to key words content and keyword grammer judgement needs by software is document files or link (for example, contain in keyword the searching of " .PDF " expression needs be .PDF file rather than the webpage that contains this word).
the 2nd the step: the judgement " content that will search for is distributed on Web server? " if the target of search is distributed on " search engine search results Web server " directly returns to Search Results (seeing figure " M1 " mark), will meet the Ploymerized Interface that obtains that search condition has the document in identical source in this result and become one " title search result ", after clicking " same source file " button, can see the webpage that another comprises whole Search Results on " search engine search results Web server ", the inquiry can be seen meet whole Search Results of querying condition, complete search procedure.If the target of search is not distributed on " search engine search results Web server " since the 3rd step.
The 3rd step: return to the result that the inquiry " does not have eligible document ".
The 4th step: this searching key word is joined next round upgrade in the task of " homology document index database " and " non-homogeneous document index database ", and regularly start the renewal process of two databases.
The 5th step: the renewal process of " homology document index database " and " non-homogeneous document index database ":
A. by the emerging document files of " document searching device " search and webpage or link entrance, enter this entrance by software and obtain this document or service.
B. by " word content decision device " and " image content decision device " judge new-found document content " belonging to same content with the content of current ' homology document index database '? " if "Yes" it is included into this classification of " homology document index database " as a new element; If "No" judge that by " document content decision device " content of its " with current non-homogeneous document index database " belongs to same content? "
If C. "Yes": " for current document and with it homology and be stored in document in ' non-homogeneous document index database ', a newly-built classification is also all transferred to ' homology document index database ' "; If "No" " be the current newly-built classification of document, and deposit in ' non-homogeneous document index database ' ";
The 6th step: the static Web page that is dynamically generated Search Results by the content of " search result web page distributor " basis " homology web results database " and " non-homogeneous web results database ", be published to " search engine search results Web server ", then present to by browser the inquiry's (seeing figure " M2 " mark) who comes to search for.
As the another kind of implementation method in the 6th step, also can directly present to inquiring user by browser by " dynamic web page Web server ".(seeing figure " M3 " mark).
" homology software processing module "
Fig. 7 is homology software processing module process flow diagram.For the software document that meets search condition or link, " homology software processing module " all adopts the hyperlink mode in the Html webpage to offer the person of being queried.For improving substantially the serviceability of native system, we have adopted following technology:
Adopted the webpage distribution technology, use " search result web page distributor " Search Results to be published in advance " search engine search results Web server ", the searching requirement that direct response had been queried is avoided according to a large amount of calculating of request dynamic from the database generating dynamic web page.
" homologous information processing module " is placed in " non-homogeneous software index data base " and " with the source software index data base " result is sub-category, and regularly be published to " search engine search results Web server " by " search result web page distributor ", avoid double counting and reduced the calculating stand-by period." homology software processing module " treatment scheme is as follows:
The 1st step: receiving inquiry's searching key word, and what look for according to key words content and keyword grammer judgement needs by software is software document or link (for example, contain in keyword the searching of " .EXE " expression needs be .EXE file rather than the webpage that contains this word).
the 2nd the step: the judgement " content that will search for is distributed on Web server? " if the target of search is distributed on " search engine search results Web server " directly returns to Search Results (seeing figure " M1 " mark), will meet the Ploymerized Interface that obtains that search condition has the software in identical source in this result and become one " title search result ", after clicking " same source file " button, can see the webpage that another comprises whole Search Results on " search engine search results Web server ", the inquiry can be seen meet whole Search Results of querying condition, complete search procedure.If the target of search is not distributed on " search engine search results Web server " since the 3rd step.
The 3rd step: return to the result that the inquiry " does not have eligible software ".
The 4th step: this searching key word is joined next round upgrade in the task of " with the source software index data base " and " non-homogeneous software index data base ", and regularly start the renewal process of two databases.
The 5th step: the renewal process of " with the source software index data base " and " non-homogeneous software index data base ":
A. by the emerging software document of " software search device " search and webpage or link entrance, enter this entrance by software and obtain this document or service.
B. by " software content decision device " judge new-found software content " belonging to same content with the content of current " with the source software index data base "? " if "Yes" it is included into this classification of " with the source software index data base " as a new element; If "No" judge that by " software content decision device " content of its " with current non-homogeneous software index data base " belongs to same content? "
If C. "Yes": " for current software and with it homology and be stored in software in ' non-homogeneous software index data base ', a newly-built classification is also all transferred to ' with the source software index data base ' "; If "No" " be the current newly-built classification of software, and deposit in ' non-homogeneous software index data base ' ";
The 6th step: the static Web page that is dynamically generated Search Results by the content of " search result web page distributor " basis " homology web results database " and " non-homogeneous web results database ", be published to " search engine search results Web server ", then present to by browser the inquiry's (seeing figure " M2 " mark) who comes to search for.
As the another kind of implementation method in the 6th step, also can directly present to inquiring user by browser by " dynamic web page Web server ".(seeing figure " M3 " mark).
" same source data or database processing module "
Fig. 8 is same source data or database processing module process flow diagram.For the software document that meets search condition or link, " same source data processing module " all adopts the hyperlink mode in the Html webpage to offer the person of being queried.For improving substantially the serviceability of native system, we have adopted following technology:
Adopted the webpage distribution technology, use " search result web page distributor " Search Results to be published in advance " search engine search results Web server ", the searching requirement that direct response had been queried is avoided according to a large amount of calculating of request dynamic from the database generating dynamic web page.
" homologous information processing module " is placed in " non-homogeneous data directory database " and " same source data index data base " result is sub-category, and regularly be published to " search engine search results Web server " by " search result web page distributor ", avoid double counting and reduced the calculating stand-by period." same source data processing module " treatment scheme is as follows:
The 1st step: receiving inquiry's searching key word, and what need to look for by the judgement of data based key words content and keyword grammer is data file or link (for example, contain in keyword the searching of " .DBF " expression needs be ..DBF file rather than the webpage that contains this word).
the 2nd the step: the judgement " content that will search for is distributed on Web server? " if the target of search is distributed on " search engine search results Web server " directly returns to Search Results (seeing figure " M1 " mark), will meet the Ploymerized Interface that obtains that search condition has the data in identical source in this result and become one " title search result ", after clicking " same source file " button, can see the webpage that another comprises whole Search Results on " search engine search results Web server ", the inquiry can be seen meet whole Search Results of querying condition, complete search procedure.If the target of search is not distributed on " search engine search results Web server " since the 3rd step.
The 3rd step: return to the result that the inquiry " does not have eligible data ".
The 4th step: this searching key word is joined next round upgrade in the task of " same source data index data base " and " non-homogeneous data directory database ", and regularly start the renewal process of two databases.
The 5th step: the renewal process of " same source data index data base " and " non-homogeneous data directory database ":
A. by the emerging data file of " data search device " search and webpage or link entrance, enter this entrance by data and obtain this document or service.
B. by " data content decision device " judge new-found data content " belonging to same content with the content of current " same source data index data base "? " if "Yes" it is included into this classification of " same source data index data base " as a new element; If "No" judge that by " data content decision device " content of its " with current non-homogeneous data directory database " belongs to same content? "
If C. "Yes": " for current data and with it homology and be stored in data in ' non-homogeneous data directory database ', a newly-built classification is also all transferred to ' same source data index data base ' "; If "No" " be the current newly-built classification of data, and deposit in ' non-homogeneous data directory database ' ";
The 6th step: the static Web page that is dynamically generated Search Results by the content of " search result web page distributor " basis " homology web results database " and " non-homogeneous web results database ", be published to " search engine search results Web server ", then present to by browser the inquiry's (seeing figure " M2 " mark) who comes to search for.
As the another kind of implementation method in the 6th step, also can directly present to inquiring user by browser by " dynamic web page Web server ".(seeing figure " M3 " mark).
" homology GIS message processing module "
Fig. 9 is " homology GIS message processing module " process flow diagram.For the GIS message file or the link that meet search condition, " homology GIS message processing module " all adopts the hyperlink mode in the Html webpage to offer the person of being queried.For improving substantially the serviceability of native system, we have adopted following technology:
Adopted the webpage distribution technology, use " search result web page distributor " Search Results to be published in advance " search engine search results Web server ", the searching requirement that direct response had been queried is avoided according to a large amount of calculating of request dynamic from the database generating dynamic web page.
" homologous information processing module " is placed in " non-homogeneous GIS information index database " and " homology GIS information index database " result is sub-category, and regularly be published to " search engine search results Web server " by " search result web page distributor ", avoid double counting and reduced the calculating stand-by period." homology GIS message processing module " treatment scheme is as follows:
The 1st step: receiving inquiry's searching key word, and what look for according to key words content and keyword grammer judgement needs by software is GIS message file or link (for example, contain in keyword the searching of " .JPG " expression needs be .JPG file rather than the webpage that contains this word).
the 2nd the step: the judgement " content that will search for is distributed on Web server? " if the target of search is distributed on " search engine search results Web server " directly returns to Search Results (seeing figure " M1 " mark), will meet the Ploymerized Interface that obtains that search condition has the GIS information in identical source in this result and become one " title search result ", after clicking " same source file " button, can see the webpage that another comprises whole Search Results on " search engine search results Web server ", the inquiry can be seen meet whole Search Results of querying condition, complete search procedure.If the target of search is not distributed on " search engine search results Web server " since the 3rd step.
The 3rd step: return to the result that the inquiry " does not have eligible GIS information ".
The 4th step: this searching key word is joined next round upgrade in the task of " homology GIS information index database " and " non-homogeneous GIS information index database ", and regularly start the renewal process of two databases.
The 5th step: the renewal process of " homology GIS information index database " and " non-homogeneous GIS information index database ":
A. by the emerging GIS message file of " GIS information searcher " search and webpage or link entrance, enter this entrance by software and obtain this document or service.
B. by " GIS information content decision device " judgement the new-found GIS information content " belonging to same content with the content of current " homology GIS information index database "? " if "Yes" it is included into this classification of " homology GIS information index database " as a new element; If "No" judge that by " GIS information content decision device " content of its " with current non-homogeneous GIS information index database " belongs to same content? "
If C. "Yes": " for current GIS information and with it homology and be stored in GIS information in ' non-homogeneous GIS information index database ', a newly-built classification is also all transferred to ' homology GIS information index database ' "; If "No" " be the current newly-built classification of GIS information, and deposit in ' non-homogeneous GIS information index database ' ";
The 6th step: the static Web page that is dynamically generated Search Results by the content of " search result web page distributor " basis " homology web results database " and " non-homogeneous web results database ", be published to " search engine search results Web server ", then present to by browser the inquiry's (seeing figure " M2 " mark) who comes to search for.
As the another kind of implementation method in the 6th step, also can directly present to inquiring user by browser by " dynamic web page Web server ".(seeing figure " M3 " mark).
" with the value network service processing module "
Figure 10 is " with the value network service processing module " process flow diagram.For the network service that meets search condition, " with the value network service processing module " all adopts the hyperlink mode in the Html webpage to offer the person of being queried.For improving substantially the serviceability of native system, we have adopted following technology:
Adopted the webpage distribution technology, use " search result web page distributor " Search Results to be published in advance " search engine search results Web server ", the searching requirement that direct response had been queried is avoided according to a large amount of calculating of request dynamic from the database generating dynamic web page.
" with the value information processing module " is with in result is sub-category is placed on " non-with value network service index data base " and " serving index data base with value network ", and regularly be published to " search engine search results Web server " by " search result web page distributor ", avoid double counting and reduced the calculating stand-by period." with the value network service processing module " treatment scheme is as follows:
The 1st step: receiving inquiry's searching key word, and what look for according to key words content and keyword grammer judgement needs by software is network service file or link (for example, contain in keyword the searching of " .JPG " expression needs be .JPG file rather than the webpage that contains this word).
the 2nd the step: the judgement " content that will search for is distributed on Web server? " if the target of search is distributed on " search engine search results Web server " directly returns to Search Results (seeing figure " M1 " mark), will meet the Ploymerized Interface that obtains that search condition has the network service in identical source in this result and become one " title search result ", after clicking " same value document " button, can see the webpage that another comprises whole Search Results on " search engine search results Web server ", the inquiry can be seen meet whole Search Results of querying condition, complete search procedure.If the target of search is not distributed on " search engine search results Web server " since the 3rd step.
The 3rd step: return to the result that the inquiry " does not have eligible network service ".
The 4th step: this searching key word is joined next round upgrade in the task of " with value network service index data base " and " non-with value network service index data base ", and regularly start the renewal process of two databases.
The 5th step: the renewal process of " with value network service index data base " and " non-with value network service index data base ":
A. by the emerging network service file of " searching network services device " search and webpage or link entrance, enter this entrance by software and obtain this document or service.
B. by the new-found network service content of " network service content decision device " judgement " belonging to same content with the content of current " with value network service index data base "? " if "Yes" it is included into this classification of " with value network service index data base " as a new element; If "No" judge that by " network service content decision device " content of its " with current non-with value network service index data base " belongs to same content? "
If C. "Yes": " for current network service and with it be worth and be stored in network service in ' non-with value network service index data base ', a newly-built classification is also all transferred to ' serving index data base with value network ' "; If "No" " be the current newly-built classification of network service, and deposit in ' non-with value network service index data base ' ";
The 6th step: the static Web page that is dynamically generated Search Results by the content of " search result web page distributor " basis " with being worth the webpage result database " and " non-with being worth the webpage result database ", be published to " search engine search results Web server ", then present to by browser the inquiry's (seeing figure " M2 " mark) who comes to search for.
As the another kind of implementation method in the 6th step, also can directly present to inquiring user by browser by " dynamic web page Web server ".(seeing figure " M3 " mark).
" with being worth the business information processing module "
Figure 11 is " with being worth the business information processing module " process flow diagram.For the business information that meets search condition, " with being worth the business information processing module " all adopts the hyperlink mode in the Html webpage to offer the person of being queried.For improving substantially the serviceability of native system, we have adopted following technology:
Adopted the webpage distribution technology, use " search result web page distributor " Search Results to be published in advance " search engine search results Web server ", the searching requirement that direct response had been queried is avoided according to a large amount of calculating of request dynamic from the database generating dynamic web page.
" with the value information processing module " is with in result is sub-category is placed on " non-with being worth the business information index data base " and " with being worth the business information index data base ", and regularly be published to " search engine search results Web server " by " search result web page distributor ", avoid double counting and reduced the calculating stand-by period." with being worth the business information processing module " treatment scheme is as follows:
The 1st step: receiving inquiry's searching key word, and what look for according to key words content and keyword grammer judgement needs by software is business information file or link (for example, contain in keyword the searching of " .JPG " expression needs be .JPG file rather than the webpage that contains this word).
the 2nd the step: the judgement " content that will search for is distributed on Web server? " if the target of search is distributed on " search engine search results Web server " directly returns to Search Results (seeing figure " M1 " mark), will meet the Ploymerized Interface that obtains that search condition has the business information in identical source in this result and become one " title search result ", after clicking " same value document " button, can see the webpage that another comprises whole Search Results on " search engine search results Web server ", the inquiry can be seen meet whole Search Results of querying condition, complete search procedure.If the target of search is not distributed on " search engine search results Web server " since the 3rd step.
The 3rd step: return to the result that the inquiry " does not have eligible business information ".
The 4th step: this searching key word is joined next round upgrade in the task of " with being worth the business information index data base " and " non-with being worth the business information index data base ", and regularly start the renewal process of two databases.
The 5th step: the renewal process of " with being worth the business information index data base " and " non-with being worth the business information index data base ":
A. by the emerging business information file of " business information searcher " search and webpage or link entrance, enter this entrance by software and obtain this document or service.
B. by the new-found business information content of " business information content decision device " judgement " belonging to same content with the content of current " with being worth the business information index data base "? " if "Yes" it is included into this classification of " with being worth the business information index data base " as a new element; If "No" judge that by " business information content decision device " content of its " with current non-with being worth the business information index data base " belongs to same content? "
If C. "Yes": " for current business information and with it be worth and be stored in business information in ' non-with being worth the business information index data base ', a newly-built classification is also all transferred to ' with value business information index data base ' "; If "No" " be the current newly-built classification of business information, and deposit in ' non-with be worth business information index data base ' ";
The 6th step: the static Web page that is dynamically generated Search Results by the content of " search result web page distributor " basis " with being worth the webpage result database " and " non-with being worth the webpage result database ", be published to " search engine search results Web server ", then present to by browser the inquiry's (seeing figure " M2 " mark) who comes to search for.
As the another kind of implementation method in the 6th step, also can directly present to inquiring user by browser by " dynamic web page Web server ".(seeing figure " M3 " mark).
The characteristics of " with being worth the business information processing module " are whether to have identical use value to the inquiry with a plurality of business information targets of inquiry's distribution automatic decision according to commodity or service feature, supply, thereby as the foundation that it is aggregated into " title search result ", and the foundation of result ranking.
The content decision device can be general in various " homology (with being worth) message processing modules ".
" content decision device " specific implementation
" content of multimedia decision device " specific implementation:
1 input: many matchmakers file (record into file if the service of playing just will rise, or obtain media file information from Play Server) that can receive a plurality of sources.
2 process: carry out the comparison of the content of multimedia goodness of fit.
3 return: calculate the identical content degree value that has in the input multimedia: SameMediaPower.
Concrete methods of realizing:
The 1st step: receive " being judged object ": the multimedia that can receive a plurality of sources.And record is judged the quantity of object: InputQuantity.
The 2nd step: search the attribute that " being judged object " can participate in comparing in following table, record the quantity that current attribute has identical value " being judged object ": SameQuantity (for example, 5 are judged in object, there are 3 attributes to have identical value, the SameQuantity=3 of this attribute)
The 3rd step: input current attribute " weight " value (finding) in deterministic process: Power from following table
The 4th step: calculate by whole " being judged object " goodness of fit on current attribute: PSame=SameQuantity*Power
The 5th step: return to " the 1st step " to next " attribute " execution " the 1st step "~" the 4th step ", obtain the PSame of this attribute.Until obtain subordinate's property the PSame value.
The 6th step: the identical content degree value of calculating and return " being judged object ": SameMediaPower=(all mathematics accumulated values of Psame value)/InputQuantity.
Video file or the service of playing judge content:
Figure BSA00000554541100261
Figure BSA00000554541100271
Note:
1. the invention reside in employing " weight " value and calculate the method for the comparison importance of every kind of attribute, and be not only listed concrete numerical value in table, in table, " weight " concrete numerical value is only representative value, changes according to actual needs its concrete numerical value and still belongs to category of the present invention.
2. according to actual conditions, some property value may be " empty (Null) ", and in computation process, property value equates for " sky " Shi Buying is considered attribute.
Audio file judgement content:
Figure BSA00000554541100272
Figure BSA00000554541100281
Note:
1 the invention reside in the method that employing " weight " value is calculated the comparison importance of every kind of attribute, and be not only listed concrete numerical value in table, in table, " weight " concrete numerical value is only representative value, changes according to actual needs its concrete numerical value and still belongs to category of the present invention.
2 according to actual conditions, and some property value may be " empty (Null) ", and in computation process, property value equates for " sky " Shi Buying is considered attribute.
Flash file judgement content:
Figure BSA00000554541100282
Note:
1. the invention reside in employing " weight " value and calculate the method for the comparison importance of every kind of attribute, and be not only listed concrete numerical value in table, in table, " weight " concrete numerical value is only representative value, changes according to actual needs its concrete numerical value and still belongs to category of the present invention.
2. according to actual conditions, some property value may be " empty (Null) ", and in computation process, property value equates for " sky " Shi Buying is considered attribute.
" image content decision device " specific implementation
1 input: the picture that can receive a plurality of sources.
2 process: carry out the comparison of the image content goodness of fit.
3 return: calculate the identical content degree value that has in the input picture: SamePicPower.
Concrete methods of realizing:
The 1st step: receive " being judged object ": the picture that can receive a plurality of sources.And record is judged the quantity of object: InputQuantity.
The 2nd step: search the attribute that " being judged object " can participate in comparing in following table, record the quantity that current attribute has identical value " being judged object ": SameQuantity (for example, 5 are judged in object, there are 3 attributes to have identical value, the SameQuantity=3 of this attribute)
The 3rd step: input current attribute " weight " value (finding) in deterministic process: Power from following table
The 4th step: calculate by whole " being judged object " goodness of fit on current attribute: PSame=SameQuantity*Power
The 5th step: return to " the 1st step " to next " attribute " execution " the 1st step "~" the 4th step ", obtain the PSame of this attribute.Until obtain subordinate's property the PSame value.
The 6th step: the identical content degree value of calculating and return " being judged object ": SamePicPower=(all mathematics accumulated values of Psame value)/InputQuantity.
Attributes various according to picture and image recognition software are for the judgement of similarity degree.
Note:
1. the invention reside in employing " weight " value and calculate the method for the comparison importance of every kind of attribute, and be not only listed concrete numerical value in table, in table, " weight " concrete numerical value is only representative value, changes according to actual needs its concrete numerical value and still belongs to category of the present invention.
2. according to actual conditions, some property value may be " empty (Null) ", and in computation process, property value equates for " sky " Shi Buying is considered attribute.
" word content decision device " specific implementation
" word content decision device ", can realize by software:
1 input: can receive the word in a plurality of sources, as " being judged object ".
2 process: carry out the comparison of the image content goodness of fit.
3 return: the consistent degree value SameTextPower between " being judged object ".
Implementation method:
The 1st step: find out in a plurality of pictures of input
In word content, has the total length value of the part of identical word or sentence: SameLenth.
The 2nd step: find out in a plurality of word contents of input the length value of the input characters that length is the shortest, MinLenth.
The 3rd step: return to word similarity degree value: SameTextPower=SameLenth/MinLenth
In the word that finds in this way: the normally same piece of writing article number of pages of the long article word of length is few or contain mass advertising and outside hyperlink, and the shortest normally same piece of writing of the word article of length is divided into multipage number more or contain minimum advertisement and outside hyperlink.
" linked contents decision device " specific implementation
" linked contents decision device " can be realized by software: be used for comparing hyperlink contained on a plurality of webpages and whether have common trait.
1 input: the Url address (every group of whole hyperlinks that hyperlink normally obtains from a webpage) of organizing hyperlink more.
2 process: carry out hyperlink Url address goodness of fit calculating between each group
3 return: have identical hyperlink number between each group.
Implementation method:
The 1st step: receive " being judged object ": the URL address of organizing hyperlink more.
The 2nd step: the URL number of addresses that statistics " being judged object " similarity degree: SameURLPower=all occurred every group of hyperlink.
The 3rd step: return to SameURLPower.
" software content decision device " specific implementation
" software content decision device ", whether a plurality of softwares that are used for comparing input are software of the same race.
1 input: the software that can receive a plurality of sources.
2 process: carry out the comparison of the software content goodness of fit.
3 return: software content goodness of fit numerical value.
Concrete methods of realizing:
The 1st step: receive " being judged object ": the file of a plurality of inputs or catalogue.And record is judged the quantity of object: InputQuantity.
The 2nd step: search the attribute that " being judged object " can be compared in following table, record the quantity that current attribute has identical value " being judged object ": SameQuantity (for example, 5 are judged in object, there are 3 attributes to have identical value, the SameQuantity=3 of this attribute)
The 3rd step: input current attribute " weight " value (finding) in deterministic process: Power from following table
The 4th step: calculate by whole " being judged object " goodness of fit on current attribute: PSame=SameQuantity*Power.
The 5th step: return to " the 1st step " to next " attribute " execution " the 1st step "~" the 4th step ", obtain the PSame of this attribute.Until obtain subordinate's property the PSame value.
The 6th step: the identical value of calculating and return " being judged object ": SameSoftPower=(all mathematics accumulated values of Psame value)/InputQuantity.
Figure BSA00000554541100311
Note:
1. the invention reside in employing " weight " value and calculate the method for the comparison importance of every kind of attribute, and be not only listed concrete numerical value in table, in table, " weight " concrete numerical value is only representative value, changes according to actual needs its concrete numerical value and still belongs to category of the present invention.
2. according to actual conditions, some property value may be " empty (Null) ", and in computation process, property value equates for " sky " Shi Buying is considered attribute.
" data or data-base content decision device " specific implementation
Compare one by one every data in the different pieces of information library file and record content and whether equate, return to the database consistent degree value SameDBPower that participates in comparison and whether surpass thresholding.
The database that records number/participation comparison that the SameDBPower=field name is identical and numerical value equates has the minimum number that records of this field.
SameDBPower has reflected that identical content records number and relatively has the minimum ratio that records the database of number, and the SameDBPower value is: 0~1.
" data or data-base content decision device " specific implementation
Can adopt following performing step for data file:
The 1st step: in a plurality of data files that participate in comparison, choose at random a file as " comparison standard ".
The 2nd step: carry out the conforming rough comparison of other file and " comparison standard ": file size, file verification and, the file attribute informations such as title, theme, version, author, classification, key word, remarks.
The 3rd step: if unanimously be judged to be " rough consistent ", such judged result is the output of conduct " data or data-base content decision device " directly.
The 4th step: further compare as need, in the input file that obtains " rough consistent ", carried out for the 5th step.
The 5th step: meticulous comparison: the comparison one by one of each byte in file attribute information and file.The file that whole features are all coincide can be judged to be " in full accord ", as the output of " data or data-base content decision device ".
Can adopt following performing step for database file:
The 1st step: the database file to input judges whether to meet database format of the same race according to filename suffix and file attribute.
The 2nd step: carried out for the 3rd step for database format of the same race, for direct the 4th step of database format not of the same race
The 3rd step: form database of the same race compares roughly: file size, file verification and, the file attribute informations such as title, theme, version, author, classification, key word, remarks.Above-mentioned feature carried out for the 4th step not in full conformity with as the output of " inconsistent " judged result for the database file that meets fully.
The 4th step: the meticulous comparison of database: (this step adapts to various database file and participates in the content comparison).Form according to every kind of database file extracts its " database table " one by one, judges whether its " database table " structure is consistent: inconsistent conduct " inconsistent " output, consistent database file carried out for the 5th step.
The 5th step: comparison participates in the content of every record of the database file of comparison one by one: run into the identical situation of content that records: for counter " the SameRecNum field name is identical and numerical value equates record number " adds 1.
The 6th step: calculate " SameDBPower database consistent degree value "=" the SameRecNum field name is identical and numerical value equates record number "/" database that participates in comparison has the minimum number that records of this field ".(SameDBPower has reflected that identical content records number and relatively has the minimum ratio that records the database of number, and the SameDBPower value is: 0~1).
The 7th step: whether judgement " SameDBPower database consistent degree value " surpasses thresholding, surpass thresholding and export " unanimously " as judged result, otherwise output " inconsistent " is as judged result.
" GIS information content decision device "
" GIS information content decision device ", can realize by software:
1 input: can receive the numerical map in a plurality of sources, as " being judged object ".
2 process: carry out the goodness of fit comparison of the coverage of numerical map.
3 return: the consistent degree value SameMapPower (value 0~1) between " being judged object ".
Implementation method:
The 1st step: open the numerical map file of participating in comparison according to the form of numerical map.
The 2nd step: find the northwest corner of numerical map and the longitude and latitude of southeast corner (can be also the map diagonal angle of other form).
The 3rd step: the northwest corner of the numerical map of comparing and longitude, the latitude error of southeast corner are participated in comparison, calculate the consistance value SameMapPower of map overlay area:
Suppose that " Fig. 1 " and " Fig. 2 " participates in comparison:
:
The area of minimum map in the area of the secondary map of SameMapPower=two overlapping region/two secondary maps.
The 4th step: return to the SameMapPower value.
The 5th step: judge whether (for example: threshold value=0.8), be to be judged to be identical map, be not to be judged to be not identical map to SameMapPower over thresholding.
" network service content decision device "
The FTP service content judgement of " network service content decision device ":
The 1st step: adopt corresponding File Transfer Protocol to log in the service that participates in comparison, and obtain its inner file.
The 2nd step: after the file that obtains the FTP service, at first judge according to the filename suffix whether file type is consistent, if inconsistent returning " inconsistent " as output, if file type is consistent, carried out for the 3rd step.
The 3rd step: according to file type adopt " content of multimedia decision device ", " image content decision device ", " word content decision device ", " software content decision device ", whether " data or data-base content decision device " or " GIS information content decision device " adjudicates its file content consistent, and returns to its judged result.
The mailbox service content judgement that the Email website provides:
If the mailbox service information spinner that the Email website provides is by the webpage of each website of software search, and parse from the webpage label mailbox size, charge situation, whether support the information such as POP agreement.
The 1st step: mailbox size is divided into corresponding grade, (such as: 10MB~25MB, 25MB~100MB, 100MB~300MB, 300MB~1GB, 1GB~100GB etc.), then judgement participates in the mailbox of comparison whether in same rank, if "no" return to " inconsistent ", if "Yes" carried out for the 2nd step.
The 2nd step: whether comparison " charge situation " is consistent, if "no" return to " inconsistent ", if "Yes" carried out for the 3rd step.
The 3rd step: comparison supports whether the POP terms of agreement is consistent, if "no" return to " inconsistent ", if "Yes" return to " unanimously ".
" business information content decision device "
Whether the product of issuing on webpage or service sale information is identical, and in identical physical geography scope, in identical administrative geography scope, identical distance range.
The 1st step: whether the business information that comparison participates in comparison is identical product or service, if "no" is returned to " inconsistent ", if "Yes" entered for the 2nd step.
The 2nd step: whether the business information that judgement participates in comparison (for example: personal consumption class commodity, need have geographic position susceptibility to the service of Site Service has geographic position susceptibility, such as ice cream, private tutor's service etc.), if "no" is returned to judged result " unanimously ", and if "Yes" would carry out the 3rd the step.
The 3rd step: whether the supplier that judgement participates in the business information of comparison is in identical city or zone, if "no" is returned to judged result " inconsistent ", if return to judged result " unanimously ".
" obtain web page user attention rate subsystem "
Figure 12 is for obtaining web page user attention rate subsystem structure figure.This search engine can and the collaborative work mode of supporting web browser (or compatible this search engine can and other third party's browsers of communications protocol between supporting web browser with it) with it, gather the user to the degree of concern of each webpage by web browser, and report search engine, the foundation of carrying out search result rank or selection " title search result " as search engine.This method and device can also be separately outside search engines, and independent formation can provide the Web inquiry system of " webpage popular degree ranking list ", and can carry out charge operation or in return condition exchange other interests for.
Native system mainly comprises the two large divisions: " the PageFocus webserver " and " PageFocus web browser ".
" the PageFocus webserver " structure
" the PageFocus webserver " obtains global user to the degree of concern of each webpage by " PageFocus web browser ", and forms " pay close attention to score value PageFocus " database of this webpage, as the metric of the popular degree of webpage.
" the PageFocus webserver " is comprised of following:
(1) " PageFocus browser ID registrar ": for " the PageFocus web browser " that is just using on network distributes globally unique ID identification number.
(2) " the PageFocusAccServer webpage is paid close attention to statistical server ": " the paying close attention to score value PageFocus " for one or more webpages that comprises in " PageFocus packet " that global " the PageFocus web browser " that is moving of reception sent.Be used for distinguishing the different users that browses for No. ID.
(3) " PageFocus browser online upgrading server ": be used for providing online upgrade service to the whole world " PageFocus web browser ".
(4) " data encrypting and deciphering module ": be used for transmitting enciphered data between " the PageFocus webserver " and " PageFocus web browser ", place and attacked or steal information.
" PageFocus web browser " structure
" PageFocus web browser " reports the active user for the degree of concern of certain webpage by network to " the PageFocus webserver ".
" PageFocus web browser " is comprised of following:
(1) " pay close attention to score value PageFocus computing module ": according to the operation of user to " PageFocus web browser ", calculate the user to the degree of concern of certain webpage, and form " PageFocus packet " to " the PageFocusAccServer webpage is paid close attention to statistical server " report.
(2) " PageFocus browser ID Registering modules ": with " PageFocus browser ID registrar " communication obtaining globally unique sign ID, as the foundation of distinguishing different user.
(3) " PageFocus browser online upgrading module ": with " PageFocus browser online upgrading server " communication, be latest edition to keep " PageFocus browser " on active user's computing machine.
This device comprises: " the PageFocus web browser " of the invention, " PageFocus browser ID registrar " and " webpage score server ", and concrete methods of realizing is as follows:
The 1st step: develop special " a PageFocus web browser ", each browser all possesses globally unique ID identification number when mounted, or initiatively seeks in use " PageFocus browser ID registrar " on network to obtain globally unique ID identification number.
The 2nd step: " PageFocus web browser " possesses and (for example: the repertoire IE browser of Microsoft) has the general networks browser.
The 3rd step: " PageFocus web browser " also possesses the user converted to " paying close attention to score value PageFocus " of webpage and forms " PageFocus packet " according to the listed weight of following table to the operation of browser with to the operation of webpage, be passed to by procotol with cipher mode " the PageFocusAccServer webpage is paid close attention to statistical server " of this search engine.
The 4th step: " paying close attention to score value PageFocus " that " the PageFocusAccServer webpage is paid close attention to statistical server " comprises its inside after " PageFocus packet " that each " PageFocus web browser " of receiving the whole world sent is added on corresponding webpage.
The 5th goes on foot: " paying close attention to score value PageFocus " of each webpage of the whole world that comprises on " PageFocusAccServer webpage concern statistical server ", these information can form by various disposal routes: search engine is selected to can be used as the foundation of " title search result ", also can directly be announced out conduct " the popular degree ranking list of webpage " webpage seniority among brothers and sisters foundation, search engine in having the identical content Search Results service.
The method that " PageFocus web browser " calculating " is paid close attention to score value PageFocus ":
The repertoire that has generic browser due to " PageFocus web browser ", so can be when the user uses browser, gather its operation behavior according to following table, and according to " weight " of every kind of behavior, this webpage is carried out " paying close attention to score value PageFocus " score, and form a minute value record of " paying close attention to score value PageFocus " about this webpage when browser thoroughly cuts out this webpage, issue with the form of " PageFocus packet "
" the PageFocusAccServer webpage is paid close attention to statistical server ".
Note:
1. although with these standards of grading, erroneous judgement may be arranged, can obtain statistical accuracy by a large amount of operations on network.
2. listed " weight " concrete numerical value in the table, be only representative value, and the invention reside in by browser is page marking, and the change of any other " weight project " and " weight " all belongs to category of the present invention.
3. adopt the user to be based on abundant trust for netizen's social morality to the mode of webpage ballot, so its " weight " to the mathematics multiplication of whole score, rather than the mathematics addition.
4. because each webpage all may obtain a large amount of PageFocus scores, may cause overflowing of software variable, so can adopt " mathematics logarithm " or " scientific notation " to record score " the PageFocusAccServer webpage is paid close attention to statistical server ".
5. be other approach of this method, except forming when browser thoroughly cuts out this webpage " PageFocus packet ", can also determine with other any regular the opportunity of " PageFocus packet ", for example: regularly, be accumulated to certain score value etc., these methods all belong to category of the present invention.
6. the detailed computing method of " every style of writing word reading rate " in showing:
A. mouse roller rolls: word read speed=(viewing area width/set width) *Each word line number of rolling/rolling time interval.
B. keyboard page turning: word read speed=(viewing area width/set width) *The word line number of each page turning/page turning time interval.
C. the forms scroll bar rolls: word read speed=(viewing area width/set width) *Each word line number of rolling/rolling time interval.
The formation method of " PageFocus packet "
The content of " PageFocus packet ":
Figure BSA00000554541100381
Note: each " PageFocus packet " can comprise the call of a plurality of webpages.Every webpage call can also add other attribute, but in order to raise the efficiency, only lists most important content in table, adds other attributes and also belong to category of the present invention in table." PageFocus packet " sends the selection on opportunity:
Reduce to send the bandwidth that " PageFocus packet " take and the pressure that brings to server end, can take one of following several schemes:
When thoroughly being closed from browser, certain webpage sends " PageFocus packet ".
When thoroughly cutting out, browser sends " PageFocus packet ".
Browser is retained in local computer with " PageFocus packet " with document form, runs up to specific quantity or length-specific or special time and sends during the cycle again.
" title search result " selection algorithm
This algorithm is mainly used in how selecting to be used as " the homology Search Results " of " title search result " in original searching results.This algorithm need to address the problem:
1. judge the content quality of webpage, the preferential demonstration that quality is high by network user behavior and web page contents.
2. avoid a certain Search Results to bear too much click traffic because becoming " title search result ", cause the slack-off even collapse of website processing speed.
3. avoid a certain Search Results to bear too much click traffic and cause service response speed slack-off because becoming " title search result ", and reduce visitor's experience good opinion.
4. making becomes " title search result " as a kind of power, can offer the website that needs, and this power can be bought in these websites.
5. the baseline results of each " homology Search Results " all has an opportunity to become " title search result " according to certain probability.
" title search result " system of selection is, when selecting " title search result " in " homology Search Results ", " search result content quality ", " weighted value " and " service response delay " three key elements have been considered simultaneously, that is: the preferential demonstration that content quality is high, the preferential demonstration that weighting is arranged, the preferential demonstration that network service is good; Still according to this principle, and " weighted value " can be bought to system operator of the present invention when arranging all " homology Search Results ".The concrete methods of realizing that " title search result " selected is as follows:
The 1st step: calculating each " homology Search Results " becomes the probability weights PWn of " title search result " (this Search Results is the n bar):
PWn=TP*PageFocus/(RespDelay-K)
Note 1: less than or equal to zero the time, (RespDelay-K) answering value is 1 as (RespDelay-K).
Note 2: in formula, the variable implication is as follows
A.PageFocus webpage attention rate value: be this Search Results according to the present invention in " obtaining the method and apparatus of web page user attention rate " " PageFocus value " of obtaining.
B.RespDelay web service operating lag: be the operating lag of this Search Results when providing service access to the searchers.(experience the operating lag that depends on the website due to access, react slower, it is poorer to experience).
C.K service response constant: be the constant that can define, 50 milliseconds (ms) used in suggestion, will do not discovered lower than the service response delay of K value, do not affect experience, thereby can ignore.
The D.TP title search is power as a result: as a kind of weighting, anyone can obtain " the TP title search is power as a result " by various give-and-take conditions with the network operator of system of the present invention.
E. as other implementation algorithm of this formula, following other form can also be arranged:
a.PWn=(TP+PageFocus)/(RespDelay-K)
b.PWn=(TP+PageFocus)/RespDelay/K
c.PWn=TP*PageFocus/RespDelay/K
The 2nd step: the statistics summation is the summation of the probability weights PWn of original " homology Search Results " all: the whole probability weights of PWall.
The 3rd step: calculate the probability that every " homology Search Results " becomes " title search result ": Pn=PWn/PWall.
The 4th step: according to the probability of Pn value, along with searchers's access action, dynamically select at random " title search result ", present to the searchers.
The adaptive apparatus and method of web site contents style
Content of the present invention is: utilize the various information of can be obtainable, helping to judge user's environment of living in and state, make the user who is in different operating or life leisure state under the prerequisite that need not any operation, registration, setting or Cookie setting, see different styles during access same page URL address, comprising:
1. utilize user's IP address to judge its residing country or zone, then in conjunction with just can calculating visitor's the local administrative region time by this website time, can judge that by his time he is in the duty state that still lies fallow.
2. the IP address by the user can inquire the attribute of this IP address: family, workplace.Place of living according to it provides style and the content that is fit to its environment of living in.
3. can know its residing geographic position by user's IP address, when the inquiry business information, can
Automatically will be arranged in the foremost apart from he nearest supplier.
Be exemplified below:
The content of seeing when synchronization, a webpage that different users accesses identical URL in this website is different:
A. the user in duty and environment sees is serious, brief introduction, the page that does not contain leisure recreation and amusement information.
B. the user in state and environment of lying fallow sees be lively, can contain leisure recreation and amusement information, can contain the page of personal consumption advertising message.
The present invention can partly or entirely be applied to the web station system beyond search engine, all belongs to category of the present invention.
Each large-scale website, in order to satisfy the access of large flow, all adopted server cluster at present, even sets up the local service subsystem in the zone, shunts user's access.But being exactly each cluster member, the key character of present server cluster all provides identical content.As Figure 13: the front user who visits is by " Website server cluster entrance " equipment, and any feature of part ground directly is assigned on certain server cluster member server with identical content.
As Figure 14, and device of the present invention has been done partly change to said structure, after " Website server cluster entrance " receives calling party, whether in running order the various customer attribute informations such as the IP address that sends during according to its access websites judge whether it is in running order, and according to it provide the information service of different-style and content to it.
The automatic decision User Status also provides appropriate web page style and the method for content
The 1st step: at first server cluster is divided into " work style " and " individual and Casual Style " two large classes, no matter be static page or dynamic page, in the identical content of this two classes server update, automatically produce two class styles, so that the user of different operating or life leisure state sees different styles when access same page URL address.
The 2nd step: after " Website server cluster entrance " receives that the user accesses the request of this website webpage first, at first obtain its IP address at (or in IP layer protocol) in access protocal.
The 3rd step: inquire about its IP address according to the IP address in " IP address properties database " is " IP address, workplace " or " the IP address of individual or the occasion of lying fallow ", if " IP address, workplace " carried out for the 4th step, if carried out for the 5th step " the IP address of individual or leisure occasion ".
The 4th step: obtain " IP address, workplace " residing geographic position, and obtain administrative time of this geographic area, (" work style server " upper providing to it of the 8:00 in week 1~5~20:00) its access is assigned in server cluster is fit to the Page Service that use the workplace, otherwise carried out for the 5th step if this IP address affiliated area is in the working time.
The 5th step: " individual and Casual Style server " upper the providing to it that its access is assigned in server cluster is fit to Page Service individual and that the leisure state uses.

Claims (10)

1. system that obtains the web page user attention rate based on the homologous information site search engine aggregation display method of attention rate, described system comprises the PageFocus webserver, PageFocus web browser, it is characterized in that:
The PageFocus webserver comprises PageFocus browser ID registrar, the concern of PageFocusAccServer webpage statistical server, PageFocus browser online upgrading server and data encrypting and deciphering module;
The PageFocus web browser comprises PageFocus browser ID Registering modules, pays close attention to score value PageFocus computing module;
" PageFocus web browser ", each PageFocus web browser all possesses globally unique ID identification number when mounted, or initiatively seeks in use " PageFocus browser ID registrar " on network to obtain globally unique ID identification number;
" PageFocus web browser " has the repertoire of general networks browser, and with the user to the operation of PageFocus web browser with to the operation of webpage, and the web page contents feature converts " pay close attention to score value PageFocus " of webpage to and forms " PageFocus packet " according to weight, is passed to by procotol with cipher mode " the PageFocusAccServer webpage is paid close attention to statistical server " of this search engine;
" PageFocusAccServer webpage pay close attention to statistical server " will be somebody's turn to do " the concern score value PageFocus " that " the PageFocusAccServer webpage is paid close attention to statistical server " inside comprises and be added on corresponding webpage after " PageFocus packet " that each " PageFocus web browser " of receiving the whole world sent;
" the concern score value PageFocus " of each webpage of the whole world that comprises on " the PageFocusAccServer webpage is paid close attention to statistical server ", these information exchanges are crossed various disposal routes formation: webpage is ranked foundation with search engine or search engine is selected in having the identical content Search Results as the foundation of " title search result " or the service of out conduct of direct announcement " webpage hot topic degree ranking list ".
2. system according to claim 1, it is characterized in that, described PageFocus packet can form when the PageFocus web browser thoroughly cuts out this webpage, also can regularly form, form again in the time of also can being accumulated to certain score value, pay close attention to the calculating pressure of statistical server to reduce the PageFocusAccServer webpage.
3. system according to claim 1, is characterized in that, described concern score value PageFocus presses
Form according to the listed weight of following table:
Figure FSB00001064958300021
Figure FSB00001064958300031
Figure FSB00001064958300041
4. system according to claim 3, is characterized in that, described concern score value PageFocus forms according to the listed weight of following table:
Figure FSB00001064958300042
5. system according to claim 3, is characterized in that, described concern score value PageFocus forms according to the listed weight of following table:
Figure FSB00001064958300043
6. system according to claim 3, is characterized in that, described concern score value PageFocus forms according to the listed weight of following table:
7. system according to claim 2, is characterized in that, described PageFocus packet comprises PageFocus browser ID, webpage URL and webpage PageFocus score value field.
8. system according to claim 1, it is characterized in that, possesses each webpage of " same source web page " in the page rank process that the participation search engine provides, use the summation of the concern score value PageFocus that each " same source web page " obtain as the foundation of rank, that is: A can adopt the summation of the concern score value PageFocus that each " same source web page " obtain as the rank foundation when participating in the search-engine results rank in " the title search result " of " same source web page "; Each webpage in B " same source web page " also can adopt the summation of the concern score value PageFocus that each webpage of " the same source web page " of its subordinate obtains as the rank foundation when participating in the search-engine results rank.
9. system according to claim 1, is characterized in that, described PageFocus web browser also comprises PageFocus browser online upgrading module.
10. according to claim 1~9 arbitrary described systems, is characterized in that, described system also comprises webpage score server.
CN 201110228853 2006-02-22 2006-02-22 System for obtaining page user focus degree PageFocus by method for aggregating and displaying same source information search engine based on focus degree Expired - Fee Related CN102298621B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201110228853 CN102298621B (en) 2006-02-22 2006-02-22 System for obtaining page user focus degree PageFocus by method for aggregating and displaying same source information search engine based on focus degree

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201110228853 CN102298621B (en) 2006-02-22 2006-02-22 System for obtaining page user focus degree PageFocus by method for aggregating and displaying same source information search engine based on focus degree

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CN2006100079057A Division CN101025737B (en) 2006-02-22 2006-02-22 Attention degree based same source information search engine aggregation display method

Publications (2)

Publication Number Publication Date
CN102298621A CN102298621A (en) 2011-12-28
CN102298621B true CN102298621B (en) 2013-11-06

Family

ID=45359035

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201110228853 Expired - Fee Related CN102298621B (en) 2006-02-22 2006-02-22 System for obtaining page user focus degree PageFocus by method for aggregating and displaying same source information search engine based on focus degree

Country Status (1)

Country Link
CN (1) CN102298621B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103246680B (en) * 2012-02-13 2016-05-18 腾讯科技(深圳)有限公司 A kind of method in browser, web page contents polymerization being represented and device
KR101974867B1 (en) * 2012-08-24 2019-08-23 삼성전자주식회사 Apparatas and method fof auto storage of url to calculate contents of stay value in a electronic device
CN104750701A (en) * 2013-12-27 2015-07-01 中兴通讯股份有限公司 Search processing method, device and terminal
TWI587703B (en) * 2015-09-25 2017-06-11 禾聯碩股份有限公司 Displaying device and method thereof for generating chatting room with selected channels
WO2017117806A1 (en) * 2016-01-08 2017-07-13 马岩 Term search method and system for web information
CN114154027B (en) * 2021-12-06 2024-10-22 深圳市大数据资源管理中心 Non-homologous inconsistent data processing method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1207186A (en) * 1995-12-30 1999-02-03 时代线路股份有限公司 Data retrieval method and apparatus with multiple source capability
CN1254136A (en) * 1998-11-12 2000-05-24 英业达股份有限公司 Method for inquiring about index multi-media header data and its device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1207186A (en) * 1995-12-30 1999-02-03 时代线路股份有限公司 Data retrieval method and apparatus with multiple source capability
CN1254136A (en) * 1998-11-12 2000-05-24 英业达股份有限公司 Method for inquiring about index multi-media header data and its device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
JP特开2003-44484A 2003.02.14

Also Published As

Publication number Publication date
CN102298621A (en) 2011-12-28

Similar Documents

Publication Publication Date Title
CN101025737B (en) Attention degree based same source information search engine aggregation display method
TWI416344B (en) Computer-implemented method and computer-readable medium for providing access to content
US7987261B2 (en) Traffic predictor for network-accessible information modules
US8447640B2 (en) Device, system and method of handling user requests
CN103221951B (en) Predictive query suggestion caching
JP5662961B2 (en) Review processing method and system
KR100650404B1 (en) On-line Advertising System And Method
US9710555B2 (en) User profile stitching
CN105051732B (en) The ranking of locally applied content
US8688519B1 (en) Targeting mobile applications through search query mining
CN106708817B (en) Information searching method and device
US20140074612A1 (en) System and Method for Targeting Information Items Based on Popularities of the Information Items
CN102982042A (en) Personalization content recommendation method and platform and system
CN102782676A (en) Online search based on geography tagged recommendations
KR20090100430A (en) Seeking answers to questions
CN102779136A (en) Method and device for information search
KR20100094021A (en) Customized and intellectual symbol, icon internet information searching system utilizing a mobile communication terminal and ip-based information terminal
CN102298621B (en) System for obtaining page user focus degree PageFocus by method for aggregating and displaying same source information search engine based on focus degree
KR20100021888A (en) A profit distribution system for content provider and method thereof
KR20050050016A (en) On-line advertising system and method
US20100161590A1 (en) Query processing in a dynamic cache
CN102880622A (en) Method and system for determining user characteristics on internet
Yan et al. Analysis of research papers on E-commerce (2000–2013): based on a text mining approach
CN101788981A (en) Deep web mobile search method, server and system
CN101887438A (en) Method and equipment for determining principle of optimality of search engine of webpage

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20131106

Termination date: 20150222

EXPY Termination of patent right or utility model