CN106777088A - The method for sequencing search engines and system of iteratively faster - Google Patents
The method for sequencing search engines and system of iteratively faster Download PDFInfo
- Publication number
- CN106777088A CN106777088A CN201611149705.5A CN201611149705A CN106777088A CN 106777088 A CN106777088 A CN 106777088A CN 201611149705 A CN201611149705 A CN 201611149705A CN 106777088 A CN106777088 A CN 106777088A
- Authority
- CN
- China
- Prior art keywords
- user
- order models
- search
- line
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of method for sequencing search engines of iteratively faster, including off-line step and on-line steps, described off-line step includes, multiple candidates are trained to reach the standard grade order models, to the storage of each order models and pro rate, order models and ratio are regularly stored in search server cache database in case on-line steps read;Described on-line steps include, receive user's request and according to user profile distribution sort model, retrieve associated documents from index, read the order models in search server cache database and calculate sequence and be then returned to user, count the search behavior of this user.Present invention design can describe an order models by set order models describing mode with character string.Then the model is sequentially stored into relevant database and key value databases by graphical interfaces and timed task, had so not only can guarantee that the persistence of data storage but also can allow online service quick obtaining data.
Description
Technical field
The present invention relates to searching order technical field, the method for sequencing search engines of more particularly to a kind of iteratively faster and
System.
Background technology
With the fast development of big data technology, the use in search engine system to feature is more and more deep, text phase
Guan Xing, webpage PageRank value and URL link length are all good sequencing features.The feature of selection is more, is more possible to visitor
The Behavior preference of the reaction user of sight.The search engine ordering system of Google is more the use of up to more than 200 and plants feature, and
These features are not simple linear, additive, but are characterized by complicated neutral net, can not only so be made full use of
Each feature of document, moreover it is possible to using the relation between feature and feature.But now look to artificial fitting to go out the power of each feature
The neural network model of weight or even complexity has become unrealistic, and sequence learning art change is arisen at the historic moment.
Sequence study is based on traditional machine learning techniques, the value that whether document related and document is in each dimension or
Used as training sample, actual correlation compares setting loss function to the parameter of neutral net with document, then based on optimization skill
Art such as gradient decline etc. makes loss function minimum.This makes it possible in substantial amounts of data, according to every document and the phase of inquiry
Score in closing property and every document each feature, calculates the search engine sort formula of optimization.
The training of algorithm is divided into two kinds of on-line training and off-line training.The all processes of on-line training are complete by computer program
Into, user is read when training starts and clicks on record generation training set, then train sequence mould with the training algorithm finished writing in advance
Sort algorithm in type, more new line, finally according to the artificial evaluation algorithms performance of evaluation index for calculating.The reality of this training method
Existing automaticity is higher, not easy break-down, and manual intervention is less.But more important cross-validation process in training process
Have to omit, because computer is difficult to provide suitable solution according to cross validation results.Off-line learning is by people's industry control
Make time, parameter etc. of study, can whether judgment models suitable model is reached the standard grade before, and can be moved according to cross validation results
State adjusting training parameter, it is ensured that the quality of algorithm of reaching the standard grade.During but the renewal of each algorithm model of off-line learning algorithm is required for
Disconnected service, flow is relatively complicated, the iteration cycle of project is significantly extended.
The content of the invention
The purpose of the present invention is directed to technological deficiency present in prior art, and provides a kind of search of iteratively faster and draw
Hold up sort method and system.
To realize that the technical scheme that the purpose of the present invention is used is:
A kind of method for sequencing search engines of iteratively faster, including off-line step and on-line steps,
Described off-line step includes,
Multiple candidates are trained to reach the standard grade order models,
To the storage of each order models and pro rate, order models and ratio are regularly stored in search server data cached
Storehouse is in case on-line steps read;
Described on-line steps include,
Receive user's request and according to user profile distribution sort model,
Associated documents are retrieved from index, the order models in search server cache database is read and is calculated sequence
It is then returned to user,
Count the search behavior of this user.
Order models and ratio are stored in the key-value databases of search server for timing.
Described multiple candidates order models of reaching the standard grade that train include following sub-step,
Collect user and click on record,
Record also original subscriber's search scene generation training data is clicked on according to user,
Multiple candidates are trained using predetermined algorithms of different and training parameter to reach the standard grade order models.
Described on-line steps are that user's distribution sort model is distributed with ensureing same user according to the cookie of user
Fixed order models.
Described search behavior includes user's query word, the file of user's click and this document in output file list
Position.
A kind of search engine ordering system of iteratively faster, including,
Including off-line module and in wire module,
Described off-line module includes,
Training submodule, is used to train multiple candidates and reaches the standard grade order models,
Model management submodule, is used to the storage of each order models and pro rate, and regularly by order models and ratio
Search server cache database is stored in case being read in wire module;
Described on-line steps include,
A/B tests submodule, is used to receive user's request and according to user profile distribution sort model,
Information retrieval submodule, to retrieve associated documents from index, calculates sequence and then returns according to order models
Back to user,
Statistic submodule, is used to count the search behavior of this user.
Order models and ratio are stored in the key-value databases of search server for timing.
Described training submodule,
Collection module, is used to collect user's click record,
Message processing module;To click on record also original subscriber's search scene according to user and generate training data,
Generation module, is used to train multiple candidates using predetermined algorithms of different and training parameter and reaches the standard grade order models.
Described on-line steps are that user's distribution sort model is distributed with ensureing same user according to the cookie of user
Fixed order models.
Described search behavior includes user's query word, the file of user's click and this document in output file list
Position.
Compared with prior art, the beneficial effects of the invention are as follows:
Present invention design can describe an order models by set order models describing mode with character string.
Then the model is sequentially stored into relevant database and key-value databases by graphical interfaces and timed task, so
Not only the persistence of data storage had been can guarantee that but also can allow online service quick obtaining data.Key- is read in search service timing on line
Value databases, reduce order models and update existing order models, while discarding expired mould according to the character string for reading
Type, thus can not more fresh code and interrupt Service controll line on sequence.When user initiate access request when, according to above also
Former order models and the ratio of each model, can just provide the user correct ranking results.
Brief description of the drawings
Fig. 1 show the structural representation of the search engine ordering system of iteratively faster of the invention;
Fig. 2 show process control chart.
Specific embodiment
The present invention is described in further detail below in conjunction with the drawings and specific embodiments.It should be appreciated that described herein
Specific embodiment be only used to explain the present invention, be not intended to limit the present invention.
As illustrated, the method for sequencing search engines of iteratively faster of the present invention includes off-line step and on-line steps,
Described off-line step includes,
Step 101, trains multiple candidates and reaches the standard grade order models,
In the step, including following sub-step,
Collect user and click on record;The user of collection clicks on record includes the text that user's original search keyword, user are clicked on
Shelves, are the packet of identical search keyword, according to how much degrees of correlation for going out under the inquiry for document calculations of user clicks.
Record also original subscriber's search scene is clicked on according to user, training data is generated;Wherein to calculate related and not simultaneously
Score in relevant documentation each dimension, it is completely real to recover user's search scene, trained with training data;
Using various training algorithms and training parameter, obtain some candidates and reach the standard grade order models,
Whether the order models for then being gone out with the mode training of judgement of cross validation meet the requirements;
If undesirable, the training parameter before adjustment, adjustable parameter includes vector dimension, neural net layer
Number and from algorithm etc., satisfactory order models, it is necessary to the index on training set and test set all reaches certain threshold value,
And the difference of the index of training set and test set is less than certain threshold value.
Order models and ratio, to the storage of each order models and pro rate, are regularly stored in search server by step 102
Cache database is in case on-line steps read;
In this step, the ratio of the model and character are described by write-in relevant database by background interface first;
Then the data of relevant database are write into key-value databases by the strategy timing such as timed task;Using key-value
Ratio and the character description of data inventory model, it is therefore an objective to improve the access speed of hot spot data.By key-value in the present invention
Database is used for the storage of order models, and used as offline part and the interface of online part, write-in, online portion are responsible in offline part
Divide responsible reading, be the committed step that continual service updates online order models.
Described on-line steps include,
Step 201, receives user's request and according to user profile distribution sort model,
Wherein, no matter using which kind of sort algorithm and evaluation index, it is required for the test of real system.A/B test systems
It is a part of user's distribution A algorithm by unitary variant principle, is another part user distribution B algorithms, it is identical in other variables
Under conditions of contrast two kinds of performances of algorithm of A, B.The part of most critical is shunting part in A/B test systems, how to be two kinds
The same user of algorithm distributive condition is to test whether successful key.Because search engine has the demand of page turning, necessary
It is required that for each user distributes identical sort algorithm.The present invention is that each user distribution is unique by the way of cookie
ID, the distribution of algorithm is completed according to this ID.The uniform of distribution was so not only ensured but also had ensured that the algorithm of distribution was fixed.
Step 202, retrieves associated documents from index, and the order models in reading search server cache database are simultaneously
Calculate sequence and be then returned to user, associated documents are document, video web-pages etc.,
In this step, first to calculate the score in file each dimension, will file vector, then use order models
Calculate basis of the final score of each file as sequence.
Step 203, counts the search behavior of this user.Statistical content includes the document that user's query word, user are clicked on
With position of the document in output document list.
Present invention design can describe an order models by set order models describing mode with character string.
Then the model is sequentially stored into relevant database and key-value databases by graphical interfaces and timed task, so
Not only the persistence of data storage had been can guarantee that but also can allow online service quick obtaining data.Key- is read in search service timing on line
Value databases, reduce order models and update existing order models, while discarding expired mould according to the character string for reading
Type, thus can not more fresh code and interrupt Service controll line on sequence.When user initiate access request when, according to above also
Former order models and the ratio of each model, can just provide the user correct ranking results.
Traditional off-line training model quality is controllable, but iteration cycle is more long;On-line training iteration cycle is shorter, but model
Quality is uncontrollable.Present invention employs the mode of off-line training, can be by way of cross validation after the completion of model training
The quality of artificial control model, services during model modification on line also without modification code and interrupting, can be changing faster
Reached the standard grade high-quality order models for the cycle.
The present invention further simultaneously discloses the search engine ordering system of iteratively faster, including,
Including off-line module and in wire module,
Described off-line module includes,
Training submodule, is used to train multiple candidates and reaches the standard grade order models,
Model management submodule, is used to the storage of each order models and pro rate, and regularly by order models and ratio
Search server caching key-value databases are stored in case being read in wire module;
Described training submodule,
Collection module, is used to collect user's click record,
Message processing module;To click on record also original subscriber's search scene according to user and generate training data,
Generation module, is used to train the upper line model of multiple candidates using predetermined algorithms of different and training parameter.
Described on-line steps include,
A/B tests submodule, is used to receive user's request and according to user profile distribution sort model,
Information retrieval submodule, to retrieve associated documents from index, calculates sequence and then returns according to order models
Back to user,
Statistic submodule, is used to count the search behavior of this user.
A kind of framework of search engine ordering system is proposed, by relevant database and key-value databases
The order models and ratio of oneself are updated using search engine timing on line is made.The present invention can be by manually coming on control line to sort
The quality of model, the ratio of the upper line model of dynamic adjustment, while code need not be changed again in more new model and interrupted service,
Reach the purpose for shortening iteration cycle.
The present invention by the way of off-line learning, both can artificial line model in control, again can be in more new line during model
Code need not be changed and interrupted and serviced, moreover it is possible to the ratio of each model of dynamic adjustment, reach the purpose for shortening iteration cycle.
The above is only the preferred embodiment of the present invention, it is noted that for the common skill of the art
For art personnel, under the premise without departing from the principles of the invention, some improvements and modifications can also be made, these improvements and modifications
Also should be regarded as protection scope of the present invention.
Claims (10)
1. a kind of method for sequencing search engines of iteratively faster, it is characterised in that including off-line step and on-line steps,
Described off-line step includes,
Multiple candidates are trained to reach the standard grade order models,
To each order models storage and pro rate, regularly by order models and ratio be stored in search server cache database with
Standby on-line steps read;
Described on-line steps include,
Receive user's request and according to user profile distribution sort model,
Associated documents are retrieved from index, the order models in search server cache database is read and is calculated sequence then
Return to user,
Count the search behavior of this user.
2. method for sequencing search engines as claimed in claim 1, it is characterised in that regularly order models and ratio are stored in and are searched
The key-value databases of rope server.
3. method for sequencing search engines as claimed in claim 1, it is characterised in that described to train multiple candidates and reach the standard grade row
Sequence model includes following sub-step,
Collect user and click on record,
Record also original subscriber's search scene generation training data is clicked on according to user,
Multiple candidates are trained using predetermined algorithms of different and training parameter to reach the standard grade order models.
4. method for sequencing search engines as claimed in claim 1, it is characterised in that described on-line steps are according to user's
Cookie is user's distribution sort model order models fixed to ensure same user's distribution.
5. method for sequencing search engines as claimed in claim 1, it is characterised in that described search behavior includes user's inquiry
Position of the file and this document that word, user click in output file list.
6. the search engine ordering system of a kind of iteratively faster, it is characterised in that including,
Including off-line module and in wire module,
Described off-line module includes,
Training submodule, is used to train multiple candidates and reaches the standard grade order models,
Model management submodule, is used to order models and ratio are stored in the storage of each order models and pro rate, and timing
Search server cache database in wire module in case read;
Described on-line steps include,
A/B tests submodule, is used to receive user's request and according to user profile distribution sort model,
Information retrieval submodule, to retrieve associated documents from index, calculates sequence and is then returned to according to order models
User,
Statistic submodule, is used to count the search behavior of this user.
7. the search engine ordering system of iteratively faster as claimed in claim 6, it is characterised in that regularly by order models and
Ratio is stored in the key-value databases of search server.
8. the search engine ordering system of iteratively faster as claimed in claim 6, it is characterised in that described training submodule
Block,
Collection module, is used to collect user's click record,
Message processing module;To click on record also original subscriber's search scene according to user and generate training data,
Generation module, is used to train multiple candidates using predetermined algorithms of different and training parameter and reaches the standard grade order models.
9. the search engine ordering system of iteratively faster as claimed in claim 6, it is characterised in that described on-line steps root
It is user's distribution sort model order models fixed to ensure same user's distribution according to the cookie of user.
10. method for sequencing search engines as claimed in claim 6, it is characterised in that described search behavior is looked into including user
Ask the position of the file and this document of word, user's click in output file list.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611149705.5A CN106777088A (en) | 2016-12-13 | 2016-12-13 | The method for sequencing search engines and system of iteratively faster |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611149705.5A CN106777088A (en) | 2016-12-13 | 2016-12-13 | The method for sequencing search engines and system of iteratively faster |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106777088A true CN106777088A (en) | 2017-05-31 |
Family
ID=58880959
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611149705.5A Pending CN106777088A (en) | 2016-12-13 | 2016-12-13 | The method for sequencing search engines and system of iteratively faster |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106777088A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107528727A (en) * | 2017-08-22 | 2017-12-29 | 上海幻电信息科技有限公司 | Support the information state verification method and system that online and offline mode switches |
CN111581546A (en) * | 2020-05-13 | 2020-08-25 | 北京达佳互联信息技术有限公司 | Method, device, server and medium for determining multimedia resource sequencing model |
CN111797928A (en) * | 2017-09-08 | 2020-10-20 | 第四范式(北京)技术有限公司 | Method and system for generating combined features of machine learning samples |
WO2021228264A1 (en) * | 2020-05-15 | 2021-11-18 | 第四范式(北京)技术有限公司 | Machine learning application method, device, electronic apparatus, and storage medium |
CN115130008A (en) * | 2022-08-31 | 2022-09-30 | 喀斯玛(北京)科技有限公司 | Search ordering method based on machine learning model algorithm |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100325105A1 (en) * | 2009-06-19 | 2010-12-23 | Alibaba Group Holding Limited | Generating ranked search results using linear and nonlinear ranking models |
CN103744913A (en) * | 2013-12-27 | 2014-04-23 | 高新兴科技集团股份有限公司 | Database retrieval method based on search engine technology |
CN104462293A (en) * | 2014-11-27 | 2015-03-25 | 百度在线网络技术(北京)有限公司 | Search processing method and method and device for generating search result ranking model |
CN104615767A (en) * | 2015-02-15 | 2015-05-13 | 百度在线网络技术(北京)有限公司 | Searching-ranking model training method and device and search processing method |
CN104636407A (en) * | 2013-11-15 | 2015-05-20 | 腾讯科技(深圳)有限公司 | Parameter choice training and search request processing method and device |
-
2016
- 2016-12-13 CN CN201611149705.5A patent/CN106777088A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100325105A1 (en) * | 2009-06-19 | 2010-12-23 | Alibaba Group Holding Limited | Generating ranked search results using linear and nonlinear ranking models |
CN104636407A (en) * | 2013-11-15 | 2015-05-20 | 腾讯科技(深圳)有限公司 | Parameter choice training and search request processing method and device |
CN103744913A (en) * | 2013-12-27 | 2014-04-23 | 高新兴科技集团股份有限公司 | Database retrieval method based on search engine technology |
CN104462293A (en) * | 2014-11-27 | 2015-03-25 | 百度在线网络技术(北京)有限公司 | Search processing method and method and device for generating search result ranking model |
CN104615767A (en) * | 2015-02-15 | 2015-05-13 | 百度在线网络技术(北京)有限公司 | Searching-ranking model training method and device and search processing method |
Non-Patent Citations (1)
Title |
---|
祝云凯: ""基于统计特征的语义搜索引擎的研究与实现"", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107528727A (en) * | 2017-08-22 | 2017-12-29 | 上海幻电信息科技有限公司 | Support the information state verification method and system that online and offline mode switches |
CN111797928A (en) * | 2017-09-08 | 2020-10-20 | 第四范式(北京)技术有限公司 | Method and system for generating combined features of machine learning samples |
CN111581546A (en) * | 2020-05-13 | 2020-08-25 | 北京达佳互联信息技术有限公司 | Method, device, server and medium for determining multimedia resource sequencing model |
CN111581546B (en) * | 2020-05-13 | 2023-10-03 | 北京达佳互联信息技术有限公司 | Method, device, server and medium for determining multimedia resource ordering model |
WO2021228264A1 (en) * | 2020-05-15 | 2021-11-18 | 第四范式(北京)技术有限公司 | Machine learning application method, device, electronic apparatus, and storage medium |
CN115130008A (en) * | 2022-08-31 | 2022-09-30 | 喀斯玛(北京)科技有限公司 | Search ordering method based on machine learning model algorithm |
CN115130008B (en) * | 2022-08-31 | 2022-11-25 | 喀斯玛(北京)科技有限公司 | Search ordering method based on machine learning model algorithm |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104166668B (en) | News commending system and method based on FOLFM models | |
US20210209109A1 (en) | Method, apparatus, device, and storage medium for intention recommendation | |
CN106777088A (en) | The method for sequencing search engines and system of iteratively faster | |
CN109829104B (en) | Semantic similarity based pseudo-correlation feedback model information retrieval method and system | |
JP6073345B2 (en) | Method and apparatus for ranking search results, and search method and apparatus | |
CN105989040B (en) | Intelligent question and answer method, device and system | |
KR101700352B1 (en) | Generating improved document classification data using historical search results | |
US8145623B1 (en) | Query ranking based on query clustering and categorization | |
CN109189990B (en) | Search word generation method and device and electronic equipment | |
CN105005578A (en) | Multimedia target information visual analysis system | |
CN110471939A (en) | Data access method, device, computer equipment and storage medium | |
WO2014085776A2 (en) | Web search ranking | |
CN107578292A (en) | A kind of user's portrait constructing system | |
CN105320719A (en) | Crowdfunding website project recommendation method based on project tag and graphical relationship | |
CN110795613B (en) | Commodity searching method, device and system and electronic equipment | |
CN105760443A (en) | Project recommending system, device and method | |
CN109359302A (en) | A kind of optimization method of field term vector and fusion sort method based on it | |
CN104268142A (en) | Meta search result ranking algorithm based on rejection strategy | |
CN114691986A (en) | Cross-modal retrieval method based on subspace adaptive spacing and storage medium | |
CN110737432A (en) | script aided design method and device based on root list | |
CN104834719A (en) | Database system applied to real-time big data scene | |
CN111078944A (en) | Video content heat prediction method and device | |
CN104077555A (en) | Method and device for identifying badcase in image search | |
Sun et al. | Research on question retrieval method for community question answering | |
CN103500219B (en) | The control method that a kind of label is adaptively precisely matched |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170531 |
|
RJ01 | Rejection of invention patent application after publication |