[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN101276377B - Method, system for acquiring resource related information and application in search engine - Google Patents

Method, system for acquiring resource related information and application in search engine Download PDF

Info

Publication number
CN101276377B
CN101276377B CN2008101117924A CN200810111792A CN101276377B CN 101276377 B CN101276377 B CN 101276377B CN 2008101117924 A CN2008101117924 A CN 2008101117924A CN 200810111792 A CN200810111792 A CN 200810111792A CN 101276377 B CN101276377 B CN 101276377B
Authority
CN
China
Prior art keywords
resource
relevant information
player
client
resource object
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN2008101117924A
Other languages
Chinese (zh)
Other versions
CN101276377A (en
Inventor
尹卓
孙桂林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sogou Technology Development Co Ltd
Original Assignee
Beijing Sogou Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sogou Technology Development Co Ltd filed Critical Beijing Sogou Technology Development Co Ltd
Priority to CN2008101117924A priority Critical patent/CN101276377B/en
Publication of CN101276377A publication Critical patent/CN101276377A/en
Application granted granted Critical
Publication of CN101276377B publication Critical patent/CN101276377B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Transfer Between Computers (AREA)

Abstract

The invention discloses a method for collecting resource relevant information, a system and the application in the searching engine thereof. The invention solves the problems of collecting not in time, large cost of collecting by the traditional server. The method comprises: client collection and/or server collection. The client collects the relevant information of the resource via the users trigger; the server collects the relevant information of the resource via obtaining the collecting task. The method uses the user using act to collect information and decomposes the collecting cost to the extensive users reducing the hardware, bandwidth cost; because the occupied client resource is few, the method does not influences the using feel of the users; in that way the method utmost reduces the cost problems of the background collection by the traditional server. At the same time, because of the large amount of user online playing resource, the method can collect the real-time information of the resources in time under the condition of almost zero cost to improve the collecting timeliness. It is hard for the pure server collection to achieve the improvement.

Description

The method of acquiring resource related information, system and the application in search engine thereof
Technical field
The present invention relates to networking technology area, particularly relate to method, system and the application in search engine thereof of acquiring resource related information.
Background technology
In search engine, how by Search Results is reasonably sorted, the resource of high-quality is come forward position, directly influence user's experience.For example, for the music searching engine, the important evidence that sorts is the quality of audio resource, mainly comprises the tonequality and the playing duration of resource, and another one very important basis be download, the buffer speed of resource.
The acquisition mode of these sort bies and resource has much relations.Traditional acquisition method is to use a huge server group of planes to carry out extracting, the analysis of resource, obtains tonequality, duration and the velocity information of resource.But because download, the buffer speed of resource are the information that changes in a kind of moment, the mode of this server end collection is difficult in time know the velocity information of resource, and then the search engine ordering of resource is impacted.
And, for the music searching engine of specialty, need carry out information acquisition, and the mode that described server end is gathered in order to reach enough processing poweies, needs to drop into a large amount of servers to the link of magnanimity.Simultaneously, in order to obtain higher collecting efficiency, collecting device need be deployed in the extraordinary machine room of network environment.Thus, cause acquisition cost very big.
Summary of the invention
Technical matters to be solved by this invention provides method, system and the application in search engine thereof of acquiring resource related information, has the problem untimely, that acquisition cost is big of gathering in the mode that solves the collection of traditional server end.
For solving the problems of the technologies described above,, the invention discloses following technical scheme according to specific embodiment provided by the invention:
The method of acquiring resource related information comprises:
The hiding player that utilizes the page to embed sends the request of collection to background server;
Receive the acquisition tasks that described background server distributes according to described request, determine the resource object of needs collection relevant information according to described acquisition tasks;
The described resource object of hiding player plays that utilizes the described page to embed;
The hiding player that utilizes the described page to embed, the relevant information of gathering described resource object in client;
The relevant information of the resource object that collects is submitted to described background server.
Wherein: described background server distributes acquisition tasks according to task priority.
Wherein: as acquisition target, as acquisition tasks, described relevant information with the resource object that collects also comprises after being submitted to described background server described background server with the sample resource under the described website with the website under the resource:
With the relevant information of the sample resource under the website that collects, be labeled as the relevant information of all resources under the described website.
Wherein: the relevant information of the sample resource under the described website that collects comprises: the speed of download of resource.
Wherein, gather the speed of download of resource object in the following manner:
By gathering bit rate, playing duration and the download ratio of resource object, calculate speed of download.
Wherein, the described hiding player that utilizes the described page to embed, the relevant information of gathering described resource object in client comprises:
The application programming interfaces that provide by the player webpage connector obtain the relevant information of described resource object.
Described resource related information comprises the speed of download of resource; By gathering bit rate, playing duration and the download ratio of resource, calculate speed of download.
Wherein, also comprise: the resource related information that collects is handled, obtained being used for the result of search engine ordering.
The method of acquiring resource related information comprises:
When the user clicked the audition resource, the resource that described user is clicked was defined as needing to gather the resource object of relevant information;
The described resource object of player plays that utilizes the page to embed;
The player that utilizes the described page to embed, the relevant information of gathering described resource object in client;
The relevant information of the resource object that collects is submitted to background server.
Wherein, the described player that utilizes the described page to embed, the relevant information of gathering described resource object in client comprises:
The application programming interfaces that provide by the player webpage connector obtain the relevant information of described resource object.
Wherein:
Described resource related information comprises the speed of download of resource; By gathering bit rate, playing duration and the download ratio of resource, calculate speed of download.
Wherein, also comprise:
The resource related information that collects is handled, obtained being used for the result of search engine ordering.
The system of acquiring resource related information comprises:
The hiding player that is used to utilize the page to embed sends the unit that collection is asked to background server;
Be used to receive the acquisition tasks that described background server distributes according to described request, determine the unit of the resource object of needs collection relevant information according to described acquisition tasks;
Be used to utilize the unit of the described resource object of hiding player plays that the described page embeds;
Be used to utilize the hiding player of described page embedding, gather the unit of the relevant information of described resource object in client;
The relevant information that is used for the resource object that will collect is submitted to the unit of described background server.
Wherein, described background server, as acquisition tasks, also comprises the sample resource under the described website as acquisition target with the website under the resource:
Be used for described relevant information and be submitted to after the background server,, be labeled as the unit of the relevant information of all resources under the described website the relevant information of the sample resource under the website that collects with the resource object that collects.
Wherein, the described player that is used to utilize described page embedding is gathered the unit of the relevant information of described resource object in client, by gathering bit rate, playing duration and the download ratio of resource object, calculates speed of download.
Wherein, the described hiding player that is used to utilize described page embedding is gathered the unit of the relevant information of described resource object in client, specifically is used for obtaining by the application programming interfaces that the player webpage connector provides the relevant information of described resource object.
Wherein, also comprise:
Be used for the resource related information that collects is handled, obtain being used for the unit of the result of search engine ordering.
The system of acquiring resource related information comprises:
Be used for when the user clicks the audition resource, the resource of described user's click be defined as gathering the unit of the resource object of relevant information;
Be used to utilize the unit of the described resource object of player plays that the page embeds;
Be used to utilize the player of described page embedding, gather the unit of the relevant information of described resource object in client;
The relevant information that is used for the resource object that will collect is submitted to the unit of background server.
Wherein, the described player that is used to utilize described page embedding is gathered the unit of the relevant information of described resource object in client, specifically is used for obtaining by the application programming interfaces that the player webpage connector provides the relevant information of described resource object.
Wherein, also comprise:
Be used for the resource related information that collects is handled, obtain being used for the unit of the result of search engine ordering.
According to specific embodiment provided by the invention, the invention discloses following technique effect:
At first, resource related information acquisition method provided by the invention, the mode by client is gathered solves the problem that the traditional approach acquisition cost is big, collection is untimely.
Described client collection is the browser that utilizes client, the user is just gathered and rough handling in the relevant information (as playing duration, speed of download etc.) of playing resource (as audio frequency), and the result is transferred back to server immediately.Described method is utilized user's usage behavior Information Monitoring, acquisition cost is decomposed on the users, reduced the cost of hardware, bandwidth, but because the client resource that takies and few, so can not have influence on user's use experience, thereby reduce the cost problem of traditional server background acquisition to greatest extent.
Simultaneously, because a large amount of user's online playing resources arranged, just can under the situation of zero cost almost, collect the real-time information of numerous resources in time, thereby promote the timely degree of collection, and this lifting to be simple server end collection be difficult to accomplish.
And, user's operation instructions current this resource paid close attention to, be recent focus, therefore can provide higher attention rate to hot resource, and this point is the information that is of great value for the ISP, the supplier can process by gathering the information of coming up, thereby promotes the service quality of oneself.
Preferably, relying on client browser broadcast and feedback resources information fully passively can not deal with problems fully, client initiatively audition broadcast can only cover a spot of resource, at this moment just need will need the resources allocation of Information Monitoring to gather in conjunction with the mode of server end scheduling to client browser.The acquisition tasks number of distributing for same client browser is strict control, so, can't bring significant burden for client browser.
Secondly, the invention provides a kind of preferred acquisition method, client collection and server end collection are cooperated use,, further strengthen the promptness of gathering by the mode of backstage scheduling.
Once more, the invention allows for the method that a kind of website tests the speed, be used to gather the resource downloading speed of real-time change.Because at the different linking of same website, speed of download is consistent at the same time, the present invention has adopted the site-level method that tests the speed further to strengthen the real-time of gathering.Described site-level testing the speed is meant no longer with independent link as the unit of testing the speed, the substitute is the affiliated website of link, because the resource on the internet surpasses ten million magnitude, and the website number under it only is 100,000 magnitudes, so just significantly reduced the scope of acquisition target, thereby can under the situation of equal expense, obtain the timely degree of tens of times of liftings.Therefore, the method that tests the speed of website has improved several magnitude with the efficient of gathering.
In sum, the preferred process of the present invention adds the method that server end collection and website test the speed by the client collection, has greatly improved the timely degree of collecting efficiency and collection, and has greatly reduced acquisition cost.If be applied to search engine, can more reasonably sort to resource, the resource of high-quality is more come forward position.
Description of drawings
Fig. 1 is the collecting flowchart figure under the passive drainage pattern of client in the embodiment of the invention one;
Fig. 2 is the collecting flowchart figure under the client active drainage pattern in the embodiment of the invention one;
Fig. 3 is the scheduling flow figure of background server in the embodiment of the invention two;
To be that the preferred embodiment of the present invention three is described utilize website to test the speed to carry out the process flow diagram of information acquisition Fig. 4;
Fig. 5 is the system construction drawing of the described a kind of acquiring resource related information of the embodiment of the invention;
Fig. 6 is the described a kind of resource related information acquisition system structural drawing that is applied to search engine of the embodiment of the invention.
Embodiment
For above-mentioned purpose of the present invention, feature and advantage can be become apparent more, the present invention is further detailed explanation below in conjunction with the drawings and specific embodiments.
Contrast the method that the traditional server end is gathered, the invention provides a kind of novel resource related information acquisition mode, add the mode that server end is gathered, solve the problem that the traditional approach acquisition cost is big, collection is untimely by the client collection.Wherein, described resource mainly refers to multimedia resource, and as audio frequency, video etc., the relevant information of resource mainly refers to information such as the playing duration, speed of download of resource.To be that example is elaborated below with the collection of audio resource.
The invention provides a plurality of embodiment, below explanation respectively.
Embodiment one:
The mode that adopts client to gather, specific as follows:
Client is gathered:
Mainly comprise passive drainage pattern and active drainage pattern.Under the passive drainage pattern, the player that client embeds by the page is gathered current just in the relevant information of playing resource.With reference to Fig. 1, be the collecting flowchart figure under the passive drainage pattern of client, step is as follows:
S101, user initiatively click the audition resource, and when opening audition page audition song, the player that this page embeds is triggered; Described " embedding " is meant the player of page broadcast audition resource;
S102, API (the Application Programming Interface that described player (for example Windows Media Player) provides by webpage connector, application programming interfaces) obtain the relevant information of current audition resource, as playing duration, bit rate etc., but can't directly obtain speed of download information;
The player of customer terminal webpage is not owing to providing the API that directly obtains speed of download, so can't be as the speed of download information of the direct collection resource of server end collection.But client can be obtained speed of download indirectly by other modes.For example, player can obtain the network bandwidth of current use, and it is similar to the speed of download of resource, but shortcoming is to need the regular intervals of time collection, as 2 seconds once, and for downloading too soon such as the resource of having downloaded in 1 second with regard to seizure less than information.Also have a kind of method to be, player provides an interface can obtain the current number-of-packet that receives, and can approximate treatment go out speed of download according to described number-of-packet, but that shortcoming is the size of packet is fixing, and therefore the speed of download that obtains is inaccurate.
Preferably, present embodiment obtains the mode of speed of download information and is: calculate by bit rate, playing duration and download proportional meter.Wherein, bit rate, playing duration and download ratio can directly be obtained by the API of player.After tested, described result of calculation is very near the true speed of download of resource, so the accuracy of this method is all higher than above-mentioned several method.
S103, this page sends to background server by page script with the data in real time that collects.
At this, described " background server " is different from the server described in above-mentioned " server end collection ": background server herein is corresponding to client, and client is real-time transmitted to this server after collecting information; And the server described in " server end collection " is used to realize acquisition function.
Under the active drainage pattern, need be in conjunction with the task scheduling of background server.Client is to background server request acquisition tasks, and the player by hiding in page embedding, carries out the acquisition tasks that background server distributes.With reference to Fig. 2, be the collecting flowchart figure under the client active drainage pattern, step is as follows:
S201 during some pages of user capture, embeds a hiding player and is used for Information Monitoring, and send the request of collection to background server in the page;
These pages can be any pages, but the page that normally can not turned off immediately by the user, for example the audition page can have the abundant time to be used for Information Monitoring.
S202, background server distributes acquisition tasks according to described request, notifies client with the resource object that needs are gathered;
S203, the player that customer terminal webpage is hidden is carried out acquisition tasks, acquiring resource related information;
As previously mentioned, the hiding player of the page also is a relevant information of obtaining resource by the API that webpage connector provides.But, the player plays that the page is hidden be to need the resource of gathering, if in an audition page, hidden a player, usually the resource of this player plays is different with the resource of the player plays of audition page embedding, so this player collection of hiding is the resource related information of server-assignment.This active drainage pattern has mainly utilized the audition page initiatively to be obtained resource related information by the behavior that the user clicks.
S204, collection sends to background server by page script with the data that collect after finishing.
Under the passive drainage pattern of client, user initiatively audition broadcast can only be covered a spot of resource, if the user does not click audition, client just can't collect the resource that does not have broadcast.Therefore, initiatively drainage pattern is a kind of preferred mode.Under aggressive mode,, can initiatively gather the resource related information that needs by distributing acquisition tasks to client.And the acquisition tasks number of distributing for same client is strict control, so, can't bring significant burden for client.
In actual applications, client can adopt the Passive Mode collection separately, also can adopt the aggressive mode collection separately.Preferably, if aggressive mode and Passive Mode are combined use, then can reach better collection effect.
In the above-mentioned client gatherer process, no matter be initiatively to gather or passive collection, can gather by client software or client browser.The client software collection is applicable to the application demand of huge client software installation, and for the application that does not possess the client software acquisition condition, uses the client browser collection also can.
The acquisition method that the foregoing description one is introduced, usage behavior Information Monitoring by the client user, acquisition cost is decomposed on the users, reduced the cost of hardware, bandwidth, but because the client resource that takies and few, so can not have influence on user's use experience, thereby reduce the cost problem of traditional server background acquisition to greatest extent.
Simultaneously, because a large amount of user's online playing resources arranged, just can under the situation of zero cost almost, collect the real-time information of numerous resources in time, thereby promote the timely degree of collection, and this lifting to be simple server end collection be difficult to accomplish.
And, user's operation instructions current this resource paid close attention to, be recent focus, therefore can provide higher attention rate to hot resource.And this point is the information that is of great value for the ISP, and the supplier can process by gathering the information of coming up, thereby promotes the service quality of oneself.
Embodiment two:
Adopt the client collection to add the mode that server end is gathered, can further strengthen the promptness of gathering, specific as follows:
Server end is gathered:
Go to grasp resource to be collected by one group of server, take after the resource, obtain the information such as text, bit rate of resource correspondence by analysis to resource.Simultaneously in the process that resource grasps, by the analog network agreement request, for example HTTP (Hypertext Transfer Protocol, HTML (Hypertext Markup Language)) request, obtain in the acquisition interval (as 10 seconds) to have grabbed back what bytes, just can directly obtain the speed of download information of resource.For example, for audio resource, through gathering, analyze information such as the playing duration that obtains resource and bit rate, title of the song, singer, special edition, school, speed of download.
Client is gathered:
As described in embodiment one, be not described in detail in this.
Above-mentioned client collection adds in the mode that server end gathers, if the customer end adopted aggressive mode then all needs the task scheduling of background server in the process of client collection and server end collection.And if the customer end adopted Passive Mode does not then relate to the task scheduling of background server.Describe the scheduling process of client background server under the active drainage pattern below in detail.
With reference to Fig. 3, be the scheduling flow figure of described background server.Different with the implication of " server end collection " indication for difference " background server ", following step is called acquisition server with the server of " server end collection " indication, to be different from background server.Concrete steps are as follows:
S301 gathers client or acquisition server and initiates the request of gathering;
As previously mentioned, during some pages of user capture, gather client and send the request of collection, and hiding player of embedding is used for Information Monitoring in this page to background server;
S302, the background server corresponding requests is distributed acquisition tasks, and the resource URL (Uniform Resource Locator, URL(uniform resource locator)) that needs are gathered sends to collection client or acquisition server;
Preferably, background server is dispatched according to the priority of resource, and the resource priority scheduling that rank is high guarantees that simultaneously all resources can both be scheduled.
Behind the resource URL that S303, collection client or acquisition server obtain to distribute, gather the relevant information of this resource;
As previously mentioned, gather client and gather relevant information by the player that is hidden in the page, acquisition server is by grasping resource and analyzing and obtain relevant information.
S304 after collection client or acquisition server are finished acquisition tasks, sends to background server with the resource related information that collects.
Preferably, background server to the resource related information that collects analyze, verification, processing such as comprehensive, the data that are applied and finally can directly use.If above-mentioned collection result is applied to search engine, then the result who handles through processing can be used as the foundation of resource ordering, relevant informations such as the playing duration of search engine by resource, speed of download can rationally sort to resource, and the resource of high-quality is more come forward position.
In above-mentioned gatherer process, more special is the collection of resource downloading speed, because the speed of download of resource is a kind of information of real-time change, and for the testing the speed in real time of server end, its cost will be far longer than obtaining information such as text message, bit rates.Based on this, the present invention proposes the method that a kind of website tests the speed and gathers the speed of download of resource, thereby further strengthens the real-time of gathering.
Preferred embodiment three:
The method that adopts website to test the speed is gathered, and is specific as follows:
A distinctive points of resource related information such as speed of download and playing duration is a real-time change, and another distinctive points is the different linking at same website, and speed of download is consistent at the same time.Described website tests the speed and utilizes second distinctive points just, no longer with independent link as the unit of testing the speed, but be that unit carries out speed acquisition with the website under linking.Because the resource on the internet surpasses ten million magnitude, and the website number under it only is 100,000 magnitudes, has so just significantly reduced the scope of acquisition target, thereby can obtain the timely degree of tens of times of liftings under the situation of equal expense.Therefore, the method that tests the speed of website has improved several magnitude with the efficient of gathering.
Illustrate,,, can think that then all resources under the website http://www.fun.com/ are all visited slowly if a resource http://www.fun.com/a.mp3 visit is arranged slowly by information acquisition.Promptly one belongs to site-level other resource downloading speed, can represent the speed of download of all resources under the affiliated website of this resource.And in the relevant information of resource, except that speed of download is site-level other resource related information, also have some information also to belong to site-level other resource related information, DNS (Domain Name Server for example, name server) parse error, connect overtime etc., the method that the collection of these relevant informations also can utilize website to test the speed.
Describe in detail below and how in the process of gathering, to carry out website and test the speed.With reference to Fig. 4, be describedly to utilize website to test the speed to carry out the process flow diagram of information acquisition, concrete steps are as follows:
S401 gathers client or acquisition server and initiates the request of gathering;
As previously mentioned, the collection customer end adopted aggressive mode of this moment, active request acquisition tasks.
S402, background server distributes a resource URL in each request, and described URL is a sample URL of website under this resource;
S403 gathers client or acquisition server the URL that distributes is carried out the collection of relevant information, and described relevant information comprises site-level other resource related information (as download speed) and non-site-level other resource related information (as playing duration);
S404 after collection client or acquisition server are finished acquisition tasks, sends to background server with the resource related information that collects;
S405, background server is site-level other resource related information wherein, is labeled as the relevant information of all resources under the affiliated website of this resource.For example, if find that a resource downloading is very fast, then can be labeled as the speed of download of whole website very fast; If it is overtime to find that resource connects, confirm overtime after, just can be labeled as website overtime.
Wherein, described non-site-level other resource related information may belong to other ranks again, as linking rank etc.The i.e. thought that tests the speed according to website, the various relevant informations of resource may belong to different ranks, for example which information be site-level other, which information be again connect level other, or the like.In a word, the thought that website tests the speed be collection result mark with the sample resource to appropriate level, thereby reduce the scope of acquisition target, improve collecting efficiency.
Preferably, background server distributes according to the priority of resource when distributing acquisition tasks.Especially for the speed acquisition task, at first website is divided into some grades according to the number of resources that popular degree and website comprise, to carry out speed acquisition initiatively, guarantees that simultaneously all websites all have an opportunity to carry out speed acquisition for the high website priority scheduling of rank.For example, if several link abbreviations of ten million are become tens0000 website, then need this website of tens0000 is reasonably dispatched, allow the timely degree of popular website can reach several seconds, and the website of unexpected winner also can be collected in the time of lacking (as several hrs).
The foregoing description two adds the method that server end collection and website test the speed by the client collection, has greatly improved the timely degree of collecting efficiency and collection, and has greatly reduced acquisition cost.Because the method that method that website tests the speed and client collection add the server end collection is two kinds of modes arranged side by side, so embodiment two is the preferred embodiments of the present invention.Certainly, add the array mode that simple server end is gathered if adopt website to test the speed, also can improve the timely degree of collecting efficiency and collection, reduce acquisition cost, this combination also belongs to protection scope of the present invention.And the protection domain of embodiment two can also extend to the mode that comprises the passive collection of client, is about to the passive resource related information mark that collects of client to appropriate level.
Illustrate the application in search engine below.
Suppose user's audition Zhou Jielun " blue and white porcelain ", the failure of audition as a result, client can send to background server.Background server is initiated verification, finds that the website under this URL can not find, so this website is labeled as dead website, search engine will come the back to the resource of this website in ordering then.
At above-mentioned acquisition method, the present invention also provides a kind of system embodiment of acquiring resource related information.With reference to Fig. 5, be the system construction drawing of the described a kind of acquiring resource related information of embodiment.Described system comprises the first collecting unit U501, is used for gathering by user's triggering in client; The second collecting unit U502 is used at server end by obtaining acquisition tasks collection.
Wherein, the first collecting unit U501 comprises passive collection and initiatively gathers two kinds of drainage patterns.Under Passive Mode, the first collecting unit U501 clicks after customer terminal webpage is triggered by the user, and the player that embeds by the page is gathered current just in the relevant information of playing resource.
Under aggressive mode, described system also comprises scheduling unit U503, is used for distributing acquisition tasks to the first collecting unit U501 and/or the second collecting unit U502.The first collecting unit U501 clicks customer terminal webpage by the user and is triggered, and sends the request of collection to scheduling unit U503, and embeds the player of hiding at described customer terminal webpage; The acquisition tasks of distributing by described hiding player operation dispatching unit U503 is obtained the resource related information that acquisition tasks needs then.The second collecting unit U502 also can send the request of collection to scheduling unit U503, and the acquisition tasks of operation dispatching unit U503 distribution is obtained the resource related information that acquisition tasks needs then.Wherein, the resource of described acquisition tasks needs is meant the resource that scheduling unit U503 distributes.
Described system adds the mode that server end is gathered by above client collection, has improved the timely degree of collecting efficiency and collection, and has reduced acquisition cost.
Preferably, described system also comprises information process unit U504, is used for the resource related information that the first collecting unit U501 and/or the second collecting unit U502 collect is handled, and obtains being used for the result of search engine ordering.
Preferably, at the collection singularity of resource downloading speed, the method that described system adopts website to test the speed is further strengthened the real-time of gathering.Be specially: scheduling unit U503 as acquisition target, distributes to the first collecting unit U501 and/or the second collecting unit U502 with the sample resource under the website as acquisition tasks with the website under the resource when distributing acquisition tasks; The first collecting unit U501 and/or the second collecting unit U502 be according to gathering with upper type, and submit the resource related information that collects to information process unit U504, and described resource related information comprises site-level other resource related information; Described information process unit U504 is labeled as the relevant information of all resources under the affiliated website of this resource with described site-level other resource related information, notifies scheduling unit U503 acquisition tasks to finish simultaneously.
Wherein, described site-level other resource related information comprises the speed of download of resource.The first collecting unit U501 calculates speed of download by gathering bit rate, playing duration and the download ratio of resource; The second collecting unit U502 directly gathers speed of download by analog network agreement request (as the HTTP request).
Preferably, described scheduling unit U503 distributes according to task priority when distributing acquisition tasks; Wherein, described priority is that the number of resources that popular degree and website according to website comprise is provided with.
Above-mentioned client collection can be gathered by client software or client browser, and therefore the described first collecting unit U501 can be client browser, also can be client software.
In search engine, the present invention also provides a kind of resource related information acquisition system embodiment that is applied to search engine with system applies shown in Figure 5.With reference to Fig. 6, be the described a kind of resource related information acquisition system structural drawing that is applied to search engine of embodiment.
Described system comprises information acquisition unit U601, is used for acquiring resource related information; Scheduling unit U602 is used for distributing acquisition tasks to information acquisition unit; Information process unit U603 is used for the resource related information that collects is handled, and obtains being used for the result of search engine ordering.
Annexation between each unit is as follows:
Scheduler module U602 provides unified Task Distribution and has returned interface, no matter be server end collection or client collection, all obtain acquisition tasks, after data acquisition is finished by same interface, return by same interface again, so just can add different collection sources easily.Be specially: scheduling unit U602 provides two interfaces, is respectively first interface and second interface; Wherein, first interface is used for Task Distribution, according to the request output acquisition tasks of information acquisition unit U601; Second interface is used for task to be given back, and the task of receiving information process unit U603 input is finished notice.
Described information acquisition unit U601 provides an interface, is used for obtaining acquisition tasks by first interface of scheduling unit U602, and submits collection result to information process unit U603.
Described information process unit U603 provides an interface, is used to receive the collection result that information acquisition unit U601 submits to, and second interface notice acquisition tasks of calling scheduling unit U602 is finished.
Wherein, information acquisition unit U601 comprises collection client and acquisition server end.Gathering client gathers by aggressive mode and/or Passive Mode; Under aggressive mode, gather client to scheduling unit U602 request acquisition tasks, obtain the resource related information that acquisition tasks needs by the player of hiding; Under Passive Mode, gather client and obtain current just in the relevant information of playing resource.And the acquisition server end also adopts initiatively acquisition mode, to scheduling unit U602 request acquisition tasks, by grasping the resource of distributing and analyzing, obtains the resource related information that needs.
Preferably, the method that adopts website to test the speed, scheduling unit U602 as acquisition target, distributes to information acquisition unit U601 with the sample resource under the website as acquisition tasks with the website under the resource.Simultaneously, distribute acquisition tasks according to task priority; Wherein, described priority is that the number of resources that popular degree and website according to website comprise is provided with.
Described information acquisition unit U601 carries out the acquisition tasks of distributing, and submits the resource related information that collects to information process unit U603, and described resource related information comprises site-level other resource related information; Described information process unit U603 is labeled as the relevant information of all resources under the affiliated website of this resource with described site-level other resource related information.Wherein, described site-level other resource related information comprises the speed of download of resource.
Described system can more reasonably sort to resource, and the resource of high-quality is more come forward position.
The part that does not describe in detail in Fig. 5, the system shown in Figure 6 can be considered for length referring to the relevant portion of Fig. 1~method shown in Figure 4, is not described in detail in this.
More than to method, system and the application in search engine thereof of acquiring resource related information provided by the present invention, be described in detail, used specific case herein principle of the present invention and embodiment are set forth, the explanation of above embodiment just is used for helping to understand method of the present invention and core concept thereof; Simultaneously, for one of ordinary skill in the art, according to thought of the present invention, part in specific embodiments and applications all can change.In sum, this description should not be construed as limitation of the present invention.

Claims (20)

1. the method for acquiring resource related information is characterized in that, comprising:
The hiding player that utilizes the page to embed sends the request of collection to background server;
Receive the acquisition tasks that described background server distributes according to described request, determine the resource object of needs collection relevant information according to described acquisition tasks;
The described resource object of hiding player plays that utilizes the described page to embed;
The hiding player that utilizes the described page to embed, the relevant information of gathering described resource object in client;
The relevant information of the resource object that collects is submitted to described background server.
2. method according to claim 1 is characterized in that: described background server distributes acquisition tasks according to task priority.
3. method according to claim 1, it is characterized in that: described background server with the website under the resource as acquisition target, as acquisition tasks, described relevant information with the resource object that collects also comprises after being submitted to described background server with the sample resource under the described website:
With the relevant information of the sample resource under the website that collects, be labeled as the relevant information of all resources under the described website.
4. method according to claim 3 is characterized in that: the relevant information of the sample resource under the described website that collects comprises: the speed of download of resource.
5. method according to claim 4 is characterized in that, gathers the speed of download of resource object in the following manner:
By gathering bit rate, playing duration and the download ratio of resource object, calculate speed of download.
6. method according to claim 1 is characterized in that, the described hiding player that utilizes the described page to embed, and the relevant information of gathering described resource object in client comprises:
The application programming interfaces that provide by the player webpage connector obtain the relevant information of described resource object.
7. method according to claim 1 is characterized in that:
Described resource related information comprises the speed of download of resource; By gathering bit rate, playing duration and the download ratio of resource, calculate speed of download.
8. method according to claim 1 is characterized in that, also comprises:
The resource related information that collects is handled, obtained being used for the result of search engine ordering.
9. the method for acquiring resource related information is characterized in that, comprising:
When the user clicked the audition resource, the resource that described user is clicked was defined as needing to gather the resource object of relevant information;
The described resource object of player plays that utilizes the page to embed;
The player that utilizes the described page to embed, the relevant information of gathering described resource object in client;
The relevant information of the resource object that collects is submitted to background server.
10. method according to claim 9 is characterized in that, the described player that utilizes the described page to embed, and the relevant information of gathering described resource object in client comprises:
The application programming interfaces that provide by the player webpage connector obtain the relevant information of described resource object.
11. method according to claim 9 is characterized in that:
Described resource related information comprises the speed of download of resource; By gathering bit rate, playing duration and the download ratio of resource, calculate speed of download.
12. method according to claim 9 is characterized in that, also comprises:
The resource related information that collects is handled, obtained being used for the result of search engine ordering.
13. the system of acquiring resource related information is characterized in that, comprising:
The hiding player that is used to utilize the page to embed sends the unit that collection is asked to background server;
Be used to receive the acquisition tasks that described background server distributes according to described request, determine the unit of the resource object of needs collection relevant information according to described acquisition tasks;
Be used to utilize the unit of the described resource object of hiding player plays that the described page embeds;
Be used to utilize the hiding player of described page embedding, gather the unit of the relevant information of described resource object in client;
The relevant information that is used for the resource object that will collect is submitted to the unit of described background server.
14. system according to claim 13 is characterized in that, described background server, as acquisition tasks, also comprises the sample resource under the described website as acquisition target with the website under the resource:
Be used for described relevant information and be submitted to after the background server,, be labeled as the unit of the relevant information of all resources under the described website the relevant information of the sample resource under the website that collects with the resource object that collects.
15. system according to claim 14, it is characterized in that the described player that is used to utilize described page embedding is gathered the unit of the relevant information of described resource object in client, by gathering bit rate, playing duration and the download ratio of resource object, calculate speed of download.
16. system according to claim 13, it is characterized in that, the described hiding player that is used to utilize described page embedding, gather the unit of the relevant information of described resource object in client, specifically be used for obtaining the relevant information of described resource object by the application programming interfaces that the player webpage connector provides.
17. system according to claim 13 is characterized in that, also comprises:
Be used for the resource related information that collects is handled, obtain being used for the unit of the result of search engine ordering.
18. the system of acquiring resource related information is characterized in that, comprising:
Be used for when the user clicks the audition resource, the resource of described user's click be defined as gathering the unit of the resource object of relevant information;
Be used to utilize the unit of the described resource object of player plays that the page embeds;
Be used to utilize the player of described page embedding, gather the unit of the relevant information of described resource object in client;
The relevant information that is used for the resource object that will collect is submitted to the unit of background server.
19. system according to claim 18, it is characterized in that, the described player that is used to utilize described page embedding, gather the unit of the relevant information of described resource object in client, specifically be used for obtaining the relevant information of described resource object by the application programming interfaces that the player webpage connector provides.
20. system according to claim 18 is characterized in that, also comprises:
Be used for the resource related information that collects is handled, obtain being used for the unit of the result of search engine ordering.
CN2008101117924A 2008-05-16 2008-05-16 Method, system for acquiring resource related information and application in search engine Active CN101276377B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2008101117924A CN101276377B (en) 2008-05-16 2008-05-16 Method, system for acquiring resource related information and application in search engine

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2008101117924A CN101276377B (en) 2008-05-16 2008-05-16 Method, system for acquiring resource related information and application in search engine

Publications (2)

Publication Number Publication Date
CN101276377A CN101276377A (en) 2008-10-01
CN101276377B true CN101276377B (en) 2011-09-14

Family

ID=39995818

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2008101117924A Active CN101276377B (en) 2008-05-16 2008-05-16 Method, system for acquiring resource related information and application in search engine

Country Status (1)

Country Link
CN (1) CN101276377B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103870503B (en) * 2012-12-14 2017-11-24 北京音之邦文化科技有限公司 Search method and equipment in online broadcasting
CN103152651B (en) * 2013-01-31 2016-04-20 广东欧珀移动通信有限公司 A kind of automatic adjustment streaming media buffer district plays the method and system of thresholding
CN106294395B (en) * 2015-05-20 2019-11-29 无锡天脉聚源传媒科技有限公司 A kind of method and device of task processing
CN107480181B (en) * 2017-07-05 2020-11-24 百度在线网络技术(北京)有限公司 Audio playing method, device, equipment and server

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20070060997A (en) * 2005-12-09 2007-06-13 한국전자통신연구원 System and method for providing media-contents of home network using web services

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20070060997A (en) * 2005-12-09 2007-06-13 한국전자통신연구원 System and method for providing media-contents of home network using web services

Also Published As

Publication number Publication date
CN101276377A (en) 2008-10-01

Similar Documents

Publication Publication Date Title
EP1887732B1 (en) A method and system for content charging
CN103686237B (en) Recommend the method and system of video resource
CN102957712B (en) Site resource loading method and system
CN101635826B (en) Method for acquiring addresses of network video programs
CN101322113B (en) Grid computing control method for testing application program capacity of server and service method thereof
US20110239103A1 (en) Detecting virality paths and supporting referral monetization
CN101408877B (en) System and method for loading tree node
CN102314455A (en) Method and system for calculating click flow of web page
CN1264477A (en) Monitoring of remot fill access on public computer network
CN101729288B (en) Method and device for counting network access behaviours of internet users
CN107885777A (en) A kind of control method and system of the crawl web data based on collaborative reptile
CN102497452B (en) Online streaming media service method based on embedded terminal
CN101222349A (en) Method and system for collecting web user action and performance data
JP2004164573A (en) Device and method for automated aggregation, device and method for delivering electronic personal information or data and transaction including electronic personal information or data
CN103246654A (en) Display processing method and display processing apparatus of search results
CN102129632A (en) Method, device and system for capturing webpage information
WO2010021479A2 (en) System and method for sharing profits with one or more content providers
CN101364995A (en) Web server system
CN101276377B (en) Method, system for acquiring resource related information and application in search engine
CN105872006A (en) Appointment reminding system and appointment reminding method
CN102043679A (en) Method and system for acquiring performance analysis data of application system
CN109891839A (en) System and method for the incoming network flow request that throttles
CN101699860A (en) Implement method for mixing network TV stream media server of peer-to-peer computing network
JP5001682B2 (en) Mining system and mining method
CN103718179A (en) Information processing apparatus, information processing method, information processing program, and storage medium having information processing program stored therein

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant