[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN106777088A - The method for sequencing search engines and system of iteratively faster - Google Patents

The method for sequencing search engines and system of iteratively faster Download PDF

Info

Publication number
CN106777088A
CN106777088A CN201611149705.5A CN201611149705A CN106777088A CN 106777088 A CN106777088 A CN 106777088A CN 201611149705 A CN201611149705 A CN 201611149705A CN 106777088 A CN106777088 A CN 106777088A
Authority
CN
China
Prior art keywords
user
order models
search
line
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201611149705.5A
Other languages
Chinese (zh)
Inventor
张洪岩
黄永军
王金明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Feihu Information Technology Tianjin Co Ltd
Original Assignee
Feihu Information Technology Tianjin Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Feihu Information Technology Tianjin Co Ltd filed Critical Feihu Information Technology Tianjin Co Ltd
Priority to CN201611149705.5A priority Critical patent/CN106777088A/en
Publication of CN106777088A publication Critical patent/CN106777088A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of method for sequencing search engines of iteratively faster, including off-line step and on-line steps, described off-line step includes, multiple candidates are trained to reach the standard grade order models, to the storage of each order models and pro rate, order models and ratio are regularly stored in search server cache database in case on-line steps read;Described on-line steps include, receive user's request and according to user profile distribution sort model, retrieve associated documents from index, read the order models in search server cache database and calculate sequence and be then returned to user, count the search behavior of this user.Present invention design can describe an order models by set order models describing mode with character string.Then the model is sequentially stored into relevant database and key value databases by graphical interfaces and timed task, had so not only can guarantee that the persistence of data storage but also can allow online service quick obtaining data.

Description

The method for sequencing search engines and system of iteratively faster
Technical field
The present invention relates to searching order technical field, the method for sequencing search engines of more particularly to a kind of iteratively faster and System.
Background technology
With the fast development of big data technology, the use in search engine system to feature is more and more deep, text phase Guan Xing, webpage PageRank value and URL link length are all good sequencing features.The feature of selection is more, is more possible to visitor The Behavior preference of the reaction user of sight.The search engine ordering system of Google is more the use of up to more than 200 and plants feature, and These features are not simple linear, additive, but are characterized by complicated neutral net, can not only so be made full use of Each feature of document, moreover it is possible to using the relation between feature and feature.But now look to artificial fitting to go out the power of each feature The neural network model of weight or even complexity has become unrealistic, and sequence learning art change is arisen at the historic moment.
Sequence study is based on traditional machine learning techniques, the value that whether document related and document is in each dimension or Used as training sample, actual correlation compares setting loss function to the parameter of neutral net with document, then based on optimization skill Art such as gradient decline etc. makes loss function minimum.This makes it possible in substantial amounts of data, according to every document and the phase of inquiry Score in closing property and every document each feature, calculates the search engine sort formula of optimization.
The training of algorithm is divided into two kinds of on-line training and off-line training.The all processes of on-line training are complete by computer program Into, user is read when training starts and clicks on record generation training set, then train sequence mould with the training algorithm finished writing in advance Sort algorithm in type, more new line, finally according to the artificial evaluation algorithms performance of evaluation index for calculating.The reality of this training method Existing automaticity is higher, not easy break-down, and manual intervention is less.But more important cross-validation process in training process Have to omit, because computer is difficult to provide suitable solution according to cross validation results.Off-line learning is by people's industry control Make time, parameter etc. of study, can whether judgment models suitable model is reached the standard grade before, and can be moved according to cross validation results State adjusting training parameter, it is ensured that the quality of algorithm of reaching the standard grade.During but the renewal of each algorithm model of off-line learning algorithm is required for Disconnected service, flow is relatively complicated, the iteration cycle of project is significantly extended.
The content of the invention
The purpose of the present invention is directed to technological deficiency present in prior art, and provides a kind of search of iteratively faster and draw Hold up sort method and system.
To realize that the technical scheme that the purpose of the present invention is used is:
A kind of method for sequencing search engines of iteratively faster, including off-line step and on-line steps,
Described off-line step includes,
Multiple candidates are trained to reach the standard grade order models,
To the storage of each order models and pro rate, order models and ratio are regularly stored in search server data cached Storehouse is in case on-line steps read;
Described on-line steps include,
Receive user's request and according to user profile distribution sort model,
Associated documents are retrieved from index, the order models in search server cache database is read and is calculated sequence It is then returned to user,
Count the search behavior of this user.
Order models and ratio are stored in the key-value databases of search server for timing.
Described multiple candidates order models of reaching the standard grade that train include following sub-step,
Collect user and click on record,
Record also original subscriber's search scene generation training data is clicked on according to user,
Multiple candidates are trained using predetermined algorithms of different and training parameter to reach the standard grade order models.
Described on-line steps are that user's distribution sort model is distributed with ensureing same user according to the cookie of user Fixed order models.
Described search behavior includes user's query word, the file of user's click and this document in output file list Position.
A kind of search engine ordering system of iteratively faster, including,
Including off-line module and in wire module,
Described off-line module includes,
Training submodule, is used to train multiple candidates and reaches the standard grade order models,
Model management submodule, is used to the storage of each order models and pro rate, and regularly by order models and ratio Search server cache database is stored in case being read in wire module;
Described on-line steps include,
A/B tests submodule, is used to receive user's request and according to user profile distribution sort model,
Information retrieval submodule, to retrieve associated documents from index, calculates sequence and then returns according to order models Back to user,
Statistic submodule, is used to count the search behavior of this user.
Order models and ratio are stored in the key-value databases of search server for timing.
Described training submodule,
Collection module, is used to collect user's click record,
Message processing module;To click on record also original subscriber's search scene according to user and generate training data,
Generation module, is used to train multiple candidates using predetermined algorithms of different and training parameter and reaches the standard grade order models.
Described on-line steps are that user's distribution sort model is distributed with ensureing same user according to the cookie of user Fixed order models.
Described search behavior includes user's query word, the file of user's click and this document in output file list Position.
Compared with prior art, the beneficial effects of the invention are as follows:
Present invention design can describe an order models by set order models describing mode with character string. Then the model is sequentially stored into relevant database and key-value databases by graphical interfaces and timed task, so Not only the persistence of data storage had been can guarantee that but also can allow online service quick obtaining data.Key- is read in search service timing on line Value databases, reduce order models and update existing order models, while discarding expired mould according to the character string for reading Type, thus can not more fresh code and interrupt Service controll line on sequence.When user initiate access request when, according to above also Former order models and the ratio of each model, can just provide the user correct ranking results.
Brief description of the drawings
Fig. 1 show the structural representation of the search engine ordering system of iteratively faster of the invention;
Fig. 2 show process control chart.
Specific embodiment
The present invention is described in further detail below in conjunction with the drawings and specific embodiments.It should be appreciated that described herein Specific embodiment be only used to explain the present invention, be not intended to limit the present invention.
As illustrated, the method for sequencing search engines of iteratively faster of the present invention includes off-line step and on-line steps,
Described off-line step includes,
Step 101, trains multiple candidates and reaches the standard grade order models,
In the step, including following sub-step,
Collect user and click on record;The user of collection clicks on record includes the text that user's original search keyword, user are clicked on Shelves, are the packet of identical search keyword, according to how much degrees of correlation for going out under the inquiry for document calculations of user clicks.
Record also original subscriber's search scene is clicked on according to user, training data is generated;Wherein to calculate related and not simultaneously Score in relevant documentation each dimension, it is completely real to recover user's search scene, trained with training data;
Using various training algorithms and training parameter, obtain some candidates and reach the standard grade order models,
Whether the order models for then being gone out with the mode training of judgement of cross validation meet the requirements;
If undesirable, the training parameter before adjustment, adjustable parameter includes vector dimension, neural net layer Number and from algorithm etc., satisfactory order models, it is necessary to the index on training set and test set all reaches certain threshold value, And the difference of the index of training set and test set is less than certain threshold value.
Order models and ratio, to the storage of each order models and pro rate, are regularly stored in search server by step 102 Cache database is in case on-line steps read;
In this step, the ratio of the model and character are described by write-in relevant database by background interface first; Then the data of relevant database are write into key-value databases by the strategy timing such as timed task;Using key-value Ratio and the character description of data inventory model, it is therefore an objective to improve the access speed of hot spot data.By key-value in the present invention Database is used for the storage of order models, and used as offline part and the interface of online part, write-in, online portion are responsible in offline part Divide responsible reading, be the committed step that continual service updates online order models.
Described on-line steps include,
Step 201, receives user's request and according to user profile distribution sort model,
Wherein, no matter using which kind of sort algorithm and evaluation index, it is required for the test of real system.A/B test systems It is a part of user's distribution A algorithm by unitary variant principle, is another part user distribution B algorithms, it is identical in other variables Under conditions of contrast two kinds of performances of algorithm of A, B.The part of most critical is shunting part in A/B test systems, how to be two kinds The same user of algorithm distributive condition is to test whether successful key.Because search engine has the demand of page turning, necessary It is required that for each user distributes identical sort algorithm.The present invention is that each user distribution is unique by the way of cookie ID, the distribution of algorithm is completed according to this ID.The uniform of distribution was so not only ensured but also had ensured that the algorithm of distribution was fixed.
Step 202, retrieves associated documents from index, and the order models in reading search server cache database are simultaneously Calculate sequence and be then returned to user, associated documents are document, video web-pages etc.,
In this step, first to calculate the score in file each dimension, will file vector, then use order models Calculate basis of the final score of each file as sequence.
Step 203, counts the search behavior of this user.Statistical content includes the document that user's query word, user are clicked on With position of the document in output document list.
Present invention design can describe an order models by set order models describing mode with character string. Then the model is sequentially stored into relevant database and key-value databases by graphical interfaces and timed task, so Not only the persistence of data storage had been can guarantee that but also can allow online service quick obtaining data.Key- is read in search service timing on line Value databases, reduce order models and update existing order models, while discarding expired mould according to the character string for reading Type, thus can not more fresh code and interrupt Service controll line on sequence.When user initiate access request when, according to above also Former order models and the ratio of each model, can just provide the user correct ranking results.
Traditional off-line training model quality is controllable, but iteration cycle is more long;On-line training iteration cycle is shorter, but model Quality is uncontrollable.Present invention employs the mode of off-line training, can be by way of cross validation after the completion of model training The quality of artificial control model, services during model modification on line also without modification code and interrupting, can be changing faster Reached the standard grade high-quality order models for the cycle.
The present invention further simultaneously discloses the search engine ordering system of iteratively faster, including,
Including off-line module and in wire module,
Described off-line module includes,
Training submodule, is used to train multiple candidates and reaches the standard grade order models,
Model management submodule, is used to the storage of each order models and pro rate, and regularly by order models and ratio Search server caching key-value databases are stored in case being read in wire module;
Described training submodule,
Collection module, is used to collect user's click record,
Message processing module;To click on record also original subscriber's search scene according to user and generate training data,
Generation module, is used to train the upper line model of multiple candidates using predetermined algorithms of different and training parameter.
Described on-line steps include,
A/B tests submodule, is used to receive user's request and according to user profile distribution sort model,
Information retrieval submodule, to retrieve associated documents from index, calculates sequence and then returns according to order models Back to user,
Statistic submodule, is used to count the search behavior of this user.
A kind of framework of search engine ordering system is proposed, by relevant database and key-value databases The order models and ratio of oneself are updated using search engine timing on line is made.The present invention can be by manually coming on control line to sort The quality of model, the ratio of the upper line model of dynamic adjustment, while code need not be changed again in more new model and interrupted service, Reach the purpose for shortening iteration cycle.
The present invention by the way of off-line learning, both can artificial line model in control, again can be in more new line during model Code need not be changed and interrupted and serviced, moreover it is possible to the ratio of each model of dynamic adjustment, reach the purpose for shortening iteration cycle.
The above is only the preferred embodiment of the present invention, it is noted that for the common skill of the art For art personnel, under the premise without departing from the principles of the invention, some improvements and modifications can also be made, these improvements and modifications Also should be regarded as protection scope of the present invention.

Claims (10)

1. a kind of method for sequencing search engines of iteratively faster, it is characterised in that including off-line step and on-line steps,
Described off-line step includes,
Multiple candidates are trained to reach the standard grade order models,
To each order models storage and pro rate, regularly by order models and ratio be stored in search server cache database with Standby on-line steps read;
Described on-line steps include,
Receive user's request and according to user profile distribution sort model,
Associated documents are retrieved from index, the order models in search server cache database is read and is calculated sequence then Return to user,
Count the search behavior of this user.
2. method for sequencing search engines as claimed in claim 1, it is characterised in that regularly order models and ratio are stored in and are searched The key-value databases of rope server.
3. method for sequencing search engines as claimed in claim 1, it is characterised in that described to train multiple candidates and reach the standard grade row Sequence model includes following sub-step,
Collect user and click on record,
Record also original subscriber's search scene generation training data is clicked on according to user,
Multiple candidates are trained using predetermined algorithms of different and training parameter to reach the standard grade order models.
4. method for sequencing search engines as claimed in claim 1, it is characterised in that described on-line steps are according to user's Cookie is user's distribution sort model order models fixed to ensure same user's distribution.
5. method for sequencing search engines as claimed in claim 1, it is characterised in that described search behavior includes user's inquiry Position of the file and this document that word, user click in output file list.
6. the search engine ordering system of a kind of iteratively faster, it is characterised in that including,
Including off-line module and in wire module,
Described off-line module includes,
Training submodule, is used to train multiple candidates and reaches the standard grade order models,
Model management submodule, is used to order models and ratio are stored in the storage of each order models and pro rate, and timing Search server cache database in wire module in case read;
Described on-line steps include,
A/B tests submodule, is used to receive user's request and according to user profile distribution sort model,
Information retrieval submodule, to retrieve associated documents from index, calculates sequence and is then returned to according to order models User,
Statistic submodule, is used to count the search behavior of this user.
7. the search engine ordering system of iteratively faster as claimed in claim 6, it is characterised in that regularly by order models and Ratio is stored in the key-value databases of search server.
8. the search engine ordering system of iteratively faster as claimed in claim 6, it is characterised in that described training submodule Block,
Collection module, is used to collect user's click record,
Message processing module;To click on record also original subscriber's search scene according to user and generate training data,
Generation module, is used to train multiple candidates using predetermined algorithms of different and training parameter and reaches the standard grade order models.
9. the search engine ordering system of iteratively faster as claimed in claim 6, it is characterised in that described on-line steps root It is user's distribution sort model order models fixed to ensure same user's distribution according to the cookie of user.
10. method for sequencing search engines as claimed in claim 6, it is characterised in that described search behavior is looked into including user Ask the position of the file and this document of word, user's click in output file list.
CN201611149705.5A 2016-12-13 2016-12-13 The method for sequencing search engines and system of iteratively faster Pending CN106777088A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611149705.5A CN106777088A (en) 2016-12-13 2016-12-13 The method for sequencing search engines and system of iteratively faster

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611149705.5A CN106777088A (en) 2016-12-13 2016-12-13 The method for sequencing search engines and system of iteratively faster

Publications (1)

Publication Number Publication Date
CN106777088A true CN106777088A (en) 2017-05-31

Family

ID=58880959

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611149705.5A Pending CN106777088A (en) 2016-12-13 2016-12-13 The method for sequencing search engines and system of iteratively faster

Country Status (1)

Country Link
CN (1) CN106777088A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107528727A (en) * 2017-08-22 2017-12-29 上海幻电信息科技有限公司 Support the information state verification method and system that online and offline mode switches
CN111581546A (en) * 2020-05-13 2020-08-25 北京达佳互联信息技术有限公司 Method, device, server and medium for determining multimedia resource sequencing model
CN111797928A (en) * 2017-09-08 2020-10-20 第四范式(北京)技术有限公司 Method and system for generating combined features of machine learning samples
WO2021228264A1 (en) * 2020-05-15 2021-11-18 第四范式(北京)技术有限公司 Machine learning application method, device, electronic apparatus, and storage medium
CN115130008A (en) * 2022-08-31 2022-09-30 喀斯玛(北京)科技有限公司 Search ordering method based on machine learning model algorithm

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100325105A1 (en) * 2009-06-19 2010-12-23 Alibaba Group Holding Limited Generating ranked search results using linear and nonlinear ranking models
CN103744913A (en) * 2013-12-27 2014-04-23 高新兴科技集团股份有限公司 Database retrieval method based on search engine technology
CN104462293A (en) * 2014-11-27 2015-03-25 百度在线网络技术(北京)有限公司 Search processing method and method and device for generating search result ranking model
CN104615767A (en) * 2015-02-15 2015-05-13 百度在线网络技术(北京)有限公司 Searching-ranking model training method and device and search processing method
CN104636407A (en) * 2013-11-15 2015-05-20 腾讯科技(深圳)有限公司 Parameter choice training and search request processing method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100325105A1 (en) * 2009-06-19 2010-12-23 Alibaba Group Holding Limited Generating ranked search results using linear and nonlinear ranking models
CN104636407A (en) * 2013-11-15 2015-05-20 腾讯科技(深圳)有限公司 Parameter choice training and search request processing method and device
CN103744913A (en) * 2013-12-27 2014-04-23 高新兴科技集团股份有限公司 Database retrieval method based on search engine technology
CN104462293A (en) * 2014-11-27 2015-03-25 百度在线网络技术(北京)有限公司 Search processing method and method and device for generating search result ranking model
CN104615767A (en) * 2015-02-15 2015-05-13 百度在线网络技术(北京)有限公司 Searching-ranking model training method and device and search processing method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
祝云凯: ""基于统计特征的语义搜索引擎的研究与实现"", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107528727A (en) * 2017-08-22 2017-12-29 上海幻电信息科技有限公司 Support the information state verification method and system that online and offline mode switches
CN111797928A (en) * 2017-09-08 2020-10-20 第四范式(北京)技术有限公司 Method and system for generating combined features of machine learning samples
CN111581546A (en) * 2020-05-13 2020-08-25 北京达佳互联信息技术有限公司 Method, device, server and medium for determining multimedia resource sequencing model
CN111581546B (en) * 2020-05-13 2023-10-03 北京达佳互联信息技术有限公司 Method, device, server and medium for determining multimedia resource ordering model
WO2021228264A1 (en) * 2020-05-15 2021-11-18 第四范式(北京)技术有限公司 Machine learning application method, device, electronic apparatus, and storage medium
CN115130008A (en) * 2022-08-31 2022-09-30 喀斯玛(北京)科技有限公司 Search ordering method based on machine learning model algorithm
CN115130008B (en) * 2022-08-31 2022-11-25 喀斯玛(北京)科技有限公司 Search ordering method based on machine learning model algorithm

Similar Documents

Publication Publication Date Title
CN104166668B (en) News commending system and method based on FOLFM models
US20210209109A1 (en) Method, apparatus, device, and storage medium for intention recommendation
CN106777088A (en) The method for sequencing search engines and system of iteratively faster
CN109829104B (en) Semantic similarity based pseudo-correlation feedback model information retrieval method and system
JP6073345B2 (en) Method and apparatus for ranking search results, and search method and apparatus
CN105989040B (en) Intelligent question and answer method, device and system
KR101700352B1 (en) Generating improved document classification data using historical search results
US8145623B1 (en) Query ranking based on query clustering and categorization
CN109189990B (en) Search word generation method and device and electronic equipment
CN105005578A (en) Multimedia target information visual analysis system
CN110471939A (en) Data access method, device, computer equipment and storage medium
WO2014085776A2 (en) Web search ranking
CN107578292A (en) A kind of user's portrait constructing system
CN105320719A (en) Crowdfunding website project recommendation method based on project tag and graphical relationship
CN110795613B (en) Commodity searching method, device and system and electronic equipment
CN105760443A (en) Project recommending system, device and method
CN109359302A (en) A kind of optimization method of field term vector and fusion sort method based on it
CN104268142A (en) Meta search result ranking algorithm based on rejection strategy
CN114691986A (en) Cross-modal retrieval method based on subspace adaptive spacing and storage medium
CN110737432A (en) script aided design method and device based on root list
CN104834719A (en) Database system applied to real-time big data scene
CN111078944A (en) Video content heat prediction method and device
CN104077555A (en) Method and device for identifying badcase in image search
Sun et al. Research on question retrieval method for community question answering
CN103500219B (en) The control method that a kind of label is adaptively precisely matched

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20170531

RJ01 Rejection of invention patent application after publication