[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

US8751481B2 - Adaptive multi-channel content selection with behavior-aware query analysis - Google Patents

Adaptive multi-channel content selection with behavior-aware query analysis Download PDF

Info

Publication number
US8751481B2
US8751481B2 US12/148,190 US14819008A US8751481B2 US 8751481 B2 US8751481 B2 US 8751481B2 US 14819008 A US14819008 A US 14819008A US 8751481 B2 US8751481 B2 US 8751481B2
Authority
US
United States
Prior art keywords
channel
user
query
channels
web
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related, expires
Application number
US12/148,190
Other versions
US20090265325A1 (en
Inventor
Orhan Camoglu
Wahid Chrabakh
Lei Ding
Jingheo Miao
Divyakant Agrawal
Neeraj Bhatnagar
Tao Yang
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
IAC Search and Media Inc
Original Assignee
IAC Search and Media Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by IAC Search and Media Inc filed Critical IAC Search and Media Inc
Priority to US12/148,190 priority Critical patent/US8751481B2/en
Assigned to IAC SEARCH & MEDIA, INC. reassignment IAC SEARCH & MEDIA, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: AGRAWAL, DIVYAKANT, BHATNAGAR, NEERAJ, CAMOGLU, ORHAN, CHRABAKH, WAHID, DING, LEI, MIAO, JINGHAO, YANG, TAO
Publication of US20090265325A1 publication Critical patent/US20090265325A1/en
Application granted granted Critical
Publication of US8751481B2 publication Critical patent/US8751481B2/en
Expired - Fee Related legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Definitions

  • Embodiments of this invention relate to the field of search engines and, in particular, to systems and methods for adaptive multi-content channel selection.
  • the Internet is a global network of computer systems and websites. These computer systems include a variety of documents, files, databases, and the like, which include information covering a variety of topics. It can be difficult for users of the Internet to locate this information on the Internet.
  • Search engines are used by most people to locate this information on the Internet. Users also often use search engines to answer simple questions. Thus, search engines also desire to provide a service which provides answers to these simple questions.
  • search results are presented on a results page that aggregates information from multiple channels.
  • the web results associated with the search query are often presented along with advertisements and other similar information.
  • Embodiments of the invention relate to a method including receiving a search query from a user; identifying web search results corresponding to the search query; optimizing content selection from a plurality of channels based on the query and the user; and presenting the web search results and selected content from the plurality of channels to the user.
  • Optimizing content selection from a plurality of channels based on the query and an identification of the user may include optimizing content selection from a plurality of channels based on the query, the user, and a time range.
  • the time range may be an event.
  • Optimizing content selection from a plurality of channels may include analyzing historical data based on the query and the user.
  • the historical data may include past click results of the user.
  • the historical data may include past click results associated with the query.
  • the method may also include identifying query concepts from the received query; analyzing historical data based on each of the query concepts to determine a recommended channel selection; and aggregating the recommended channel selection for each of the query concepts.
  • Optimizing content selection from a plurality of channels based on the query and an identification of the user may include dynamically determining a recommended channel selection.
  • the method may also include determining a recommended amount of information from each recommended channel.
  • Embodiments of the invention also relate to a system including a web search engine; and a channel selection system coupled with the web search engine.
  • the system may also include a plurality of databases, the plurality of databases coupled with the web search engine and the channel selection system.
  • the system may also include a plurality of channels coupled with the channel selection system over a network.
  • the plurality of databases, the web search engine and the channel selection system may be located within a web search server.
  • At least one of the plurality of databases may include historical data about a plurality of users of the system and a plurality of queries received at the web search engine. At least one of the plurality of databases may include vertical information.
  • Embodiments of the invention also relate to a method including storing historical data including a plurality of queries and user data from a plurality of users in a database; analyzing the stored historical data to classify a relevancy of a channel to each of the plurality of queries and user data; and storing the classification.
  • the method may also include receiving a search query from a user and analyzing the received query.
  • Analyzing the received query may include partitioning the received query to identify concept blocks; classifying each concept block to determine a classification decision; aggregating the classification decision of each concept block; and computing a channel relevancy for the query.
  • Analyzing the received query may include deriving the probability of click for a channel based on the user and the received query; and computing a channel relevancy for the query.
  • Deriving the probability of click for a channel may include deriving the probability of click based on an access time range.
  • the method may also include determining a recommendation of channels based on the computed channel relevancy.
  • Analyzing the stored historical data to determine a recommendation of a channel selection may include determining an amount of information from each recommended channel.
  • FIG. 1 is a block diagram illustrating a system for natural language service searching in accordance with one embodiment of the invention
  • FIG. 2 is a schematic drawing of a user interface for entering a search query in accordance with one embodiment of the invention
  • FIGS. 3A-B are schematic drawings of a user interface for presenting search results in accordance with one embodiment of the invention.
  • FIG. 4 is a block diagram of a mining and classification system in accordance with one embodiment of the invention.
  • FIG. 5 is a block diagram of a method of mining and classifying in accordance with one embodiment of the invention.
  • FIG. 6 is a block diagram of a channel selection search engine in accordance with one embodiment of the invention.
  • FIG. 7 is a block diagram of a method for determining a channel relevancy for a query in accordance with one embodiment of the invention.
  • FIG. 8 is a block diagram of a method for query partitioning in accordance with one embodiment of the invention.
  • FIG. 9 is a block diagram of a method for computing a channel relevancy degree in accordance with one embodiment of the invention.
  • FIG. 10 is a block diagram of a method for aggregating channel click probability of components of a query in accordance with one embodiment of the invention.
  • FIG. 11 is a block diagram of a method of classifying channel paths in accordance with one embodiment of the invention.
  • FIG. 12 is a block diagram of a method of classifying user bias in accordance with one embodiment of the invention.
  • FIG. 13 is a schematic drawing of a user interface illustrating a search results page for a first exemplary query in accordance with one embodiment of the invention
  • FIG. 14 is a schematic drawing of a user interface illustrating a search results page for a second exemplary query in accordance with one embodiment of the invention.
  • FIG. 15 is a block diagram of an exemplary computer system in accordance with one embodiment of the invention.
  • FIG. 1 shows a network system 10 which can be used in accordance with one embodiment of the present invention.
  • the network system 10 includes a search system 12 , a search engine 14 , a network 16 , and a plurality of client systems 18 .
  • the search system 12 includes a server 20 , a database 22 , an indexer 24 , and a crawler 26 .
  • the plurality of client systems 18 includes a plurality of web search applications 28 a - f , located on each of the plurality of client systems 18 .
  • the server 20 includes a plurality of databases 30 a - d.
  • the server 20 is connected to the search engine 14 .
  • the search engine 14 is connected to the plurality of client systems 18 via the network 16 .
  • the server 20 is in communication with the database 22 which is in communication with the indexer 24 .
  • the indexer 24 is in communication with the crawler 26 .
  • the crawler 26 is capable of communicating with the plurality of client systems 18 via the network 16 as well.
  • the web search server 20 is typically a computer system, and may be an HTTP server. It is envisioned that the search engine 14 may be located at the web search server 20 .
  • the web search server 20 typically includes at least processing logic and memory.
  • the indexer 24 is typically a software program which is used to create an index, which is then stored in storage media.
  • the index is typically a table of alphanumeric terms with a corresponding list of the related documents or the location of the related documents (e.g., a pointer).
  • An exemplary pointer is a Uniform Resource Locator (URL).
  • the indexer 24 may build a hash table, in which a numerical value is attached to each of the terms.
  • the database 22 is stored in a storage media, which typically includes the documents which are indexed by the indexer 24 .
  • the index may be included in the same storage media as the database 22 or in a different storage media.
  • the storage media may be volatile or non-volatile memory that includes, for example, read only memory (ROM), random access memory (RAM), magnetic disk storage media, optical storage media, flash memory devices and zip drives.
  • the crawler 26 is a software program or software robot, which is typically used to build lists of the information found on Web sites. Another common term for the crawler 26 is a spider.
  • the crawler 26 typically searches Web sites on the Internet and keeps track of the information located in its search and the location of the information.
  • the network 16 is a local area network (LAN), wide area network (WAN), a telephone network, such as the Public Switched Telephone Network (PSTN), an intranet, the Internet, or combinations thereof.
  • LAN local area network
  • WAN wide area network
  • PSTN Public Switched Telephone Network
  • intranet the Internet
  • Internet the Internet
  • the plurality of client systems 18 may be mainframes, minicomputers, personal computers, laptops, personal digital assistants (PDA), cell phones, and the like.
  • the plurality of client systems 18 are characterized in that they are capable of being connected to the network 16 .
  • Web sites may also be located on the client systems 18 .
  • the web search application 28 a - f is typically an Internet browser or other software.
  • the databases 30 a - d are stored in storage media located at the server 20 .
  • the storage media may be volatile or non-volatile memory that includes, for example, read only memory (ROM), random access memory (RAM), magnetic disk storage media, optical storage media, flash memory devices and zip drives.
  • the crawler 26 crawls websites, such as the websites of the plurality of client systems 18 , to locate information on the web.
  • the crawler 26 employs software robots to build lists of the information.
  • the crawler 26 may include one or more crawlers to search the web.
  • the crawler 26 typically extracts the information and stores it in the database 22 .
  • the indexer 24 creates an index of the information stored in the database 22 .
  • the indexer 24 creates an index of the information and where the information is located in the Internet (typically a URL).
  • the search is communicated to the search engine 14 over the network 16 .
  • the search engine 14 communicates the search to the server 20 at the search system 12 .
  • the server 20 accesses the index and/or database to provide a search result, which is communicated to the user via the search engine 14 and network 16 .
  • the databases 30 a - d can be searched, as will be described hereinafter.
  • FIG. 2 illustrates an exemplary user interface 200 .
  • the illustrated user interface 200 includes a user input box 204 and a button 208 .
  • the user input box 204 is configured to receive textual user input.
  • the button 208 is designated by “Search” in FIG. 2 and is configured to be selectable by the user.
  • a user accesses the interface 200 to enter a search query in the user input box 204 .
  • the user presses the “enter” key on a keyboard or selects (e.g. mouse clicks, mouses over, etc.) the button 208 . Selection of the button 208 or pressing the “enter” key on a keyboard results in a communication of the text entered in the input box 204 from the client system 18 to the search engine 14 and the search system 12 .
  • FIGS. 3A and 3B illustrate an exemplary search results page 300 .
  • FIGS. 3A and 3B together form the same results page 300 , but views of the page from different scrolling points. It will be appreciated that the results pages of FIGS. 3A and 3B would typically appear as a single page on the user's web application.
  • the illustrated search results page 300 includes several regions including a suggested search term region 304 , an advertisements region 306 , a sponsored results region 308 , a web results region 312 , a news images region 316 , an images region 320 , an encyclopedia region 324 , a news region 326 and a video region 328 .
  • Each of the advertisements region 306 , sponsored results region 308 , news images region 316 , images region 320 , encyclopedia region 324 , news region 326 and video region 328 are representative of different channels of information (e.g., advertisements, sponsors, news images, images, news, encyclopedia, video, etc.). It will be appreciated that the channels presented are not limited to the above channels and may include a fewer number or greater number of channels. In addition, the amount of information from each channel may vary from that illustrated.
  • the different channels of information are sometimes referred to as vertical information and some or all of the channels of information may come from vertical databases.
  • This information may be located at the search system (e.g., one or more of databases 30 a - d ) or may be received from other servers (not shown) connected to the search system over a network.
  • the results of the web search engine, an advertisement engine and special information channel engines (or combined advertisement and special information channel engine) are combined to determine the channels to be used and the composition of the page presented to the user (e.g., search results page 300 of FIGS. 3A and 3B ).
  • each source of channel information can provide a large amount of information in responding to a query, the overall display space of a results page is limited.
  • information from each channel may not be relevant to each query.
  • the quality of results tends to degrade when an excessive amount of information is supplied from a specific channel. For example, the top three results of a web search may be more relevant to a query than the bottom three results.
  • the top result of an advertisement link is often more relevant than the second or third advertisement links.
  • Embodiments of the present invention relate to optimization in selecting content from multiple channels, even though the engines behind these channels may come from different organizations and service providers.
  • the search system combines information from independent information providers.
  • the search system determines the relationship among the different channels.
  • the search system can optimize channel selection even though the web search engine does not have direct access to information in backend channel providers.
  • the channel selection is a contextual optimization of multi-channel information selection using user behavior and/or query context analysis.
  • the web search engine increases the amount of content selected from one specific channel if such content is more attractive to users and optimizes the overall content acceptance of the composed query responses.
  • the web search engine suppresses content from a channel.
  • users tend to be more interested in commercial products; thus, the number of sponsored advertisement links can be increased for such users and reduced for a category that has less relation to money, while maintaining the overall advertisement inventory consumption, resulting in increased user satisfaction and, therefore, improved user retention.
  • the search system optimizes content selection from multiple channels adaptively based on a user's query, the user's recent clicks, and/or user's recent pick behavior on selected channels.
  • the search system identifies user-contexts in which a user is more or less likely to click on a specific channel.
  • the search system can then retrieve more or less content depending on the users to determine the relevancy of the information. The search system, therefore, increases user satisfaction towards the selected content.
  • FIG. 4 illustrates a system 400 for mining and classifying data.
  • the system 400 may be part of the search system 12 of FIG. 1 .
  • the illustrated system 400 is configured to track and mine user profiles and preferences for use of each channel based on a user's historical data, track and mine relevancy information appropriateness of each data channel for a given query and a user, or a group of users under different categories, track and mine click path among these channels, and optimize channel placement for best retention and pick rates for targeted channels.
  • the system 400 includes historical click-stream data 404 , user-centric click records 408 , query-centric click records 412 , a DMDB (Data Mining DataBase) 416 , a query-channel relevancy classifier 420 , a channel-path classifier 424 , a user bias classifier 428 and a recommendation aggregator 432 .
  • DMDB Data Mining DataBase
  • the historical click-stream data 404 is collected from a data log feed.
  • the data 404 is inserted into the DMDB 416 along with user-centric click records 408 and query-centric click records 412 .
  • the information in DMDB 416 is used as training data to develop independent classifiers (e.g., query-channel relevancy classifier 420 , channel path classifier 424 , user bias classifier 428 ).
  • the query-channel relevancy classifier 420 is configured to analyze the categorization of queries into groups, such as, for example, shopping, auto, finance, health, education, etc.
  • the queries can also be grouped into query sessions.
  • a query session includes any sequence of search engine actions (activities that can be recorded by the search engine) of a given user.
  • the query-channel relevancy classifier 420 is also configured to determine the access path from one channel to another. For example, a user that accesses information associated with a “Dictionary” channel to learn the meaning of words is less likely to read advertisement content from an advertisement channel.
  • the query classifier identifies the relevance degree of each channel's content to a query or a class of queries, which can potentially encourage the acceptance of its content for a user or a group of users.
  • the user bias classifier 428 is configured to analyze users classified by, for example, geolocations, a user's past preference for specified channels, whether the user accesses several channels, whether a user never accesses certain channels, whether the user is a new user, whether the user is a frequent user, whether the user is a casual user, and the like.
  • the user bias classifier 428 is also configured to analyze users based on their access times and events in selecting certain channels.
  • Exemplary access times include, for example, morning, evening, nights, weekends, weekdays, etc.
  • Exemplary events include holidays and major events, such as the Super Bowl, 9/11, Christmas, etc.
  • the recommendation aggregator 432 is configured to dynamically collect live data for instant recommendation channel resources based on the classification results from the query-channel relevancy classifier 420 , channel-path classifier 424 and user bias classifier 428 along with runtime behavior tracking.
  • the recommendation aggregator 432 computes a recommendation value F for increasing or decreasing use of content from a targeted channel under conditions Q, T and U (i.e., F(Q, T, U), where Q is a query, or a class of queries; T is time of access or an event; and, U is a user or class of users.
  • FIG. 5 is a flow diagram illustrating a method associated with the system of FIG. 4 in further detail.
  • the method 500 begins at block 504 by tracking and mining user profiles and preferences for use of each channel based on historical data of the user.
  • the method 500 continues by tracking and mining relevancy information appropriateness of each data channel for a given query and a user, or a group of users under different categories (block 508 ).
  • the method 500 continues by tracking and mining click, path among channels (block 512 ).
  • the method 500 continues by offline classification of query-channel relevancy, user past interests and channel usage patterns (block 516 ).
  • the method 500 continues by dynamic collection of live data for instant recommendation of channel sources based on offline classification results and runtime behavior tracking (block 520 ).
  • the method 500 continues by optimizing channel placement for best retention and pick rates for targeted channels (block 524 ).
  • FIG. 6 schematically illustrates interaction between the search engine 600 , the channel wise 604 (a channel selection system) and the user interfaces 608 , 612 .
  • the channel wise 604 is also coupled with channels 616 , 618 and 620 .
  • the information channel 1 , 616 may be an advertisement channel
  • the information channel 2 , 618 may be a dictionary channel
  • the information channel n, 620 may be any information channel, such as, for example, an encyclopedia channel, an images channel, a news images channel, a sponsored results channel, etc.
  • information channel 1 616 may be located in a database 30 a of the search system 12 of FIG. 1
  • information channels 618 and 620 may be coupled to the search system 12 through the network 16 .
  • the user interface 608 corresponds to an interface in which the user enters a search query for the search engine.
  • the search query is communicated to the search engine web site 600 .
  • the web site 600 communicates the user query to the channel wise 604 .
  • the channel wise 604 determines an optimized content channel selection based on the user and the query, and communicates the result to the web site 600 .
  • channel wise 604 may indicate that an increase or no change in the channel content should be performed by the web site 600 .
  • the web site 600 then provides the user with the search results and information from the selected channels with the user interface 612 .
  • the channel wise 604 is co-run within the web site 600 .
  • a web server front end captures the contextual information needed to trigger the recommendation from the channel wise 604 .
  • the contextual information may include one or more of query text, prior click history and user ID.
  • the channel wise 604 generates its recommendation based on, for example, the F(Q,T,U) function and provides the recommendation to the web site 600 .
  • the web site 600 provides the recommended amount of information to the user from the recommended channels.
  • FIG. 7 illustrates a method of contextual optimization of content selection 700 in accordance with one embodiment of the invention.
  • the method 700 begins by receiving a query (block 704 ).
  • the method 700 continues by computing a historical channel pick rate for the query (block 708 ). For example, if many users in the past click results presented from a specific channel, the past click results indicate relevancy of content from the selected channel(s) for the given query.
  • the aggregated behavior information is computed for the tuple (Q, U, T), in which Q is a query, U is a user and T is a time or event.
  • the method 700 continues by computing aggregated behavior information (block 712 ) and deriving the probability of click for a channel given a group of users (or a user), query category (or a query) and access time range (block 716 ). The method 700 continues by computing channel relevancy for the query (block 720 ).
  • the method 700 continues by partitioning the query to identify concept blocks (block 724 ). Although some queries may not be found in the historical data database, parts of a new query (i.e., sub-queries or concept blocks) may have been submitted in the past. The channel relevancy of these sub-queries or concept blocks can be used to estimate the channel relevancy for the new query as a whole.
  • the method 700 continues by classifying each concept block separately (block 728 ).
  • the concept blocks may include, for example, special phrases, words, bi-grams, tri-grams, k-grams, and the like. Stemming techniques, for example, can be applied in concept block extraction.
  • the method 700 continues by aggregating classification decisions (block 732 ).
  • the method 700 continues by computing channel relevancy for the query (block 720 ).
  • FIG. 8 illustrates the method of extracting concept blocks 800 in further detail.
  • the method 800 begins by receiving a query (block 804 ).
  • the method 800 continues by applying stemming techniques to the query to extract concept blocks from the query (block 808 ).
  • the method 800 continues by identifying relevancy properties of each concept block (block 812 ).
  • the method 800 continues by aggregating relevancy properties of each concept block (block 816 ).
  • FIG. 9 illustrates a method 900 for computing the channel relevancy degree of each concept block in a dataset for each user group and a time range in accordance with one embodiment of the invention.
  • the method 900 begins by for a channel c_i (block 904 ) and for each pair ⁇ q, n>(block 908 ), removing stop words from q to get q′ (block 912 ).
  • Channel c_i corresponds to a channel
  • q is a query
  • n is the number of clicks associated with a query
  • q is a query
  • q′ is a searchable query.
  • the method 900 continues by finding the set, U, of concept blocks from q′ (block 916 ).
  • the method 900 continues by for each u in U, incrementing NO_OCC(u) by 1 and incrementing NO_CL(u, c_i) by n (block 920 ), wherein NO_OCC(u) is the number of occurrences of u and NO_CL(u, c_i) is the number of clicks of u on c_i.
  • the method 900 continues by for any set of concept blocks, U, the probability of a channel click if a query contains all the concept blocks in U is P_CL(q, c_i) as SUM(P_CL (u, c_i) for all u in U (block 932 ).
  • the method 900 continues with block 932 when concept blocks are used.
  • Block 932 computes the channel relevancy degree of the query by aggregating the channel click probability of its components (e.g., concept blocks).
  • the SUM function used is a weighted function. For example, special phrases have very large weights and the tri-grams have larger weights than bi-grams and words; and bi-grams have larger weights than words.
  • FIG. 10 schematically illustrates the flow of the process within the system.
  • Historical data 1000 is used as training for user groups and access time 1004 .
  • Concept blocks 1008 and probability of click 1012 are determined.
  • a query group q 1016 is analyzed to determine the concept blocks of q 1020 .
  • the probability of the channel click for q 1024 is determined based on the concept blocks of q 1020 and the concept blocks 1008 and probability of click 1012 .
  • FIG. 11 illustrates a method 1100 for classifying channel paths in accordance with one embodiment of the invention.
  • the classifier computes the likelihood of clicking a specific channel after a user has accessed other channels. There is a strong preference among users in the order of using different channels. For example, generally, if a user chooses a channel, the user is more likely to choose the same channel again in the immediate future. In another example, some channels complement each other or offer similar content. For example, a channel that shows reviews of a product is related to a channel that shows the stores the product is sold. This classifier extracts the preferred user paths between channels using historical data.
  • the method 1100 begins by accumulating user records grouped by session and sorted by time (block 1104 ).
  • the method 1100 continues by computing the probability of a channel click following each channel path (block 1108 ).
  • the computation 1108 may include for consecutive queries with a session, considering the overall click rate of each channel (block 1112 ).
  • the ratio r(path, c_i) measures the tendency of a user who followed the path to select channel c_i on the next query.
  • the r values are stored, and when a user performs a new query, the path(s) corresponding to r are fetched.
  • FIG. 12 illustrates a method for user biased classification 1200 in accordance with one embodiment of the invention.
  • This classifier is to derive the aggregative behavior of a user or a user group at a certain time range or following a certain event.
  • the classifier determines if a user or a user group has a bias using or not using certain channels under such a temporal condition.
  • FIG. 13 is an exemplary user interface 1300 illustrating a search results page including optimized channel selection in accordance with one embodiment of the invention.
  • FIG. 13 illustrates a search results page for a search query for “credit report.”
  • the channels that are illustrated for a “credit report” query include: five sponsored results, six images, one encyclopedia entry and one news entry.
  • FIG. 14 is an exemplary user interface 1400 illustrating a search results page including optimized channel selection in accordance with one embodiment of the invention.
  • FIG. 14 illustrates search results for a search query that is “ucsb.”
  • the channels for the search “ucsb” include no sponsored results, one encyclopedia entry, two videos and one blog.
  • the channels are picked and the amount of content is determined based on an analysis of the query or query concept blocks, user history and time.
  • the web search engine determines how to display the results page based on the channel recommendation provided by the channel recommendation system and the web search results.
  • FIG. 15 is one embodiment of a computer system on which embodiments of the present invention may be implemented. It will be apparent to those of ordinary skill in the art, however, that other alternative systems of various system architectures may also be used.
  • the data processing system illustrated in FIG. 15 includes a bus or other internal communication means 1565 for communicating information, and a processor 1560 coupled to the bus 1565 for processing information.
  • the system further comprises a random access memory (RAM) or other volatile storage device 1550 (referred to as memory), coupled to bus 1565 for storing information and instructions to be executed by processor 1560 .
  • Main memory 1550 also may be used for storing temporary variables or other intermediate information during execution of instructions by processor 1560 .
  • the system also comprises a read only memory (ROM) and/or static storage device 1520 coupled to bus 1565 for storing static information and instructions for processor 1560 , and a data storage device 1525 such as a magnetic disk or optical disk and its corresponding disk drive.
  • Data storage device 1525 is coupled to bus 1565 for storing information and instructions.
  • the system may further be coupled to a display device 1570 , such as a cathode ray tube (CRT) or a liquid crystal display (LCD) coupled to bus 1565 through bus 1565 for displaying information to a computer user.
  • a display device 1570 such as a cathode ray tube (CRT) or a liquid crystal display (LCD) coupled to bus 1565 through bus 1565 for displaying information to a computer user.
  • An alphanumeric input device 1575 may also be coupled to bus 1565 through bus 1565 for communicating information and command selections to processor 1560 .
  • cursor control device 1580 such as a mouse, a trackball, stylus, or cursor direction keys coupled to bus 1565 through bus 1565 for communicating direction information and command selections to processor 1560 , and for controlling cursor movement on display device 1570 .
  • the communication device 1590 may include any of a number of commercially available networking peripheral devices such as those used for coupling to an Ethernet, token ring, Internet, or wide area network.
  • the communication device 1590 may further be a null-modem connection, or any other mechanism that provides connectivity between the computer system 1500 and the outside world. Note that any or all of the components of this system illustrated in FIG. 15 and associated hardware may be used in various embodiments of the present invention.
  • control logic or software implementing the present invention can be stored in main memory 1550 , mass storage device 1525 , or other storage medium locally or remotely accessible to processor 1560 .
  • the present invention may also be embodied in a handheld or portable device containing a subset of the computer hardware components described above.
  • the handheld device may be configured to contain only the bus 1565 , the processor 1560 , and memory 1550 and/or data storage device 1525 .
  • the handheld device may also be configured to include a set of buttons or input signaling components with which a user may select from a set of available options.
  • the handheld device may also be configured to include an output apparatus such as a liquid crystal display (LCD) or display element matrix for displaying information to a user of the handheld device. Conventional methods may be used to implement such a handheld device.
  • LCD liquid crystal display
  • Conventional methods may be used to implement such a handheld device.
  • the implementation of the present invention for such a device would be apparent to one of ordinary skill in the art given the disclosure of the present invention as provided herein.
  • the present invention may also be embodied in a special purpose appliance including a subset of the computer hardware components described above.
  • the appliance may include a processor 1560 , a data storage device 1525 , a bus 1565 , and memory 1550 , and only rudimentary communications mechanisms, such as a small touch-screen that permits the user to communicate in a basic manner with the device.
  • a processor 1560 the more special-purpose the device is, the fewer of the elements need be present for the device to function.
  • communications with the user may be through a touch-based screen, or similar mechanism.
  • a machine-readable medium includes any mechanism for storing or transmitting information in a form readable by a machine (e.g. a computer).
  • a machine readable medium includes read-only memory (ROM), random access memory (RAM), magnetic disk storage media, optical storage media, flash memory devices, electrical, optical, acoustical or other forms of propagated signals (e.g. infrared signals, digital signals, etc.).

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Preferred embodiments of the invention include systems and methods for selecting content from a plurality of channels in response to a received query are described. The invention embodiments include systems and methods that optimize content selection based on received queries and an identification of the user together with analysis of historical data. Embodiments of the invention also include methods for storing historical data including a plurality of queries and user data from a plurality of user databases, analyzing the stored historical data to classify the relevancy of a channel to each of the plurality of queries and user data, storing the classification, and displaying content to user based on such factors as user habits, queries, timing of searches, and content preferences.

Description

FIELD
Embodiments of this invention relate to the field of search engines and, in particular, to systems and methods for adaptive multi-content channel selection.
BACKGROUND
The Internet is a global network of computer systems and websites. These computer systems include a variety of documents, files, databases, and the like, which include information covering a variety of topics. It can be difficult for users of the Internet to locate this information on the Internet.
Search engines are used by most people to locate this information on the Internet. Users also often use search engines to answer simple questions. Thus, search engines also desire to provide a service which provides answers to these simple questions.
Often, the search results are presented on a results page that aggregates information from multiple channels. For example, the web results associated with the search query are often presented along with advertisements and other similar information.
SUMMARY
Embodiments of the invention relate to a method including receiving a search query from a user; identifying web search results corresponding to the search query; optimizing content selection from a plurality of channels based on the query and the user; and presenting the web search results and selected content from the plurality of channels to the user.
Optimizing content selection from a plurality of channels based on the query and an identification of the user may include optimizing content selection from a plurality of channels based on the query, the user, and a time range. The time range may be an event.
Optimizing content selection from a plurality of channels may include analyzing historical data based on the query and the user.
The historical data may include past click results of the user. The historical data may include past click results associated with the query.
The method may also include identifying query concepts from the received query; analyzing historical data based on each of the query concepts to determine a recommended channel selection; and aggregating the recommended channel selection for each of the query concepts.
Optimizing content selection from a plurality of channels based on the query and an identification of the user may include dynamically determining a recommended channel selection.
The method may also include determining a recommended amount of information from each recommended channel.
Embodiments of the invention also relate to a system including a web search engine; and a channel selection system coupled with the web search engine.
The system may also include a plurality of databases, the plurality of databases coupled with the web search engine and the channel selection system.
The system may also include a plurality of channels coupled with the channel selection system over a network.
The plurality of databases, the web search engine and the channel selection system may be located within a web search server.
At least one of the plurality of databases may include historical data about a plurality of users of the system and a plurality of queries received at the web search engine. At least one of the plurality of databases may include vertical information.
Embodiments of the invention also relate to a method including storing historical data including a plurality of queries and user data from a plurality of users in a database; analyzing the stored historical data to classify a relevancy of a channel to each of the plurality of queries and user data; and storing the classification.
The method may also include receiving a search query from a user and analyzing the received query.
Analyzing the received query may include partitioning the received query to identify concept blocks; classifying each concept block to determine a classification decision; aggregating the classification decision of each concept block; and computing a channel relevancy for the query.
Analyzing the received query may include deriving the probability of click for a channel based on the user and the received query; and computing a channel relevancy for the query.
Deriving the probability of click for a channel may include deriving the probability of click based on an access time range.
The method may also include determining a recommendation of channels based on the computed channel relevancy.
Analyzing the stored historical data to determine a recommendation of a channel selection may include determining an amount of information from each recommended channel.
Other features of the present invention will be apparent from the accompanying drawings and from the detailed description which follows.
BRIEF DESCRIPTION OF THE DRAWINGS
The invention is described by way of example with reference to the accompanying drawings, wherein:
FIG. 1 is a block diagram illustrating a system for natural language service searching in accordance with one embodiment of the invention;
FIG. 2 is a schematic drawing of a user interface for entering a search query in accordance with one embodiment of the invention;
FIGS. 3A-B are schematic drawings of a user interface for presenting search results in accordance with one embodiment of the invention;
FIG. 4 is a block diagram of a mining and classification system in accordance with one embodiment of the invention;
FIG. 5 is a block diagram of a method of mining and classifying in accordance with one embodiment of the invention;
FIG. 6 is a block diagram of a channel selection search engine in accordance with one embodiment of the invention;
FIG. 7 is a block diagram of a method for determining a channel relevancy for a query in accordance with one embodiment of the invention;
FIG. 8 is a block diagram of a method for query partitioning in accordance with one embodiment of the invention;
FIG. 9 is a block diagram of a method for computing a channel relevancy degree in accordance with one embodiment of the invention;
FIG. 10 is a block diagram of a method for aggregating channel click probability of components of a query in accordance with one embodiment of the invention;
FIG. 11 is a block diagram of a method of classifying channel paths in accordance with one embodiment of the invention;
FIG. 12 is a block diagram of a method of classifying user bias in accordance with one embodiment of the invention;
FIG. 13 is a schematic drawing of a user interface illustrating a search results page for a first exemplary query in accordance with one embodiment of the invention;
FIG. 14 is a schematic drawing of a user interface illustrating a search results page for a second exemplary query in accordance with one embodiment of the invention; and
FIG. 15 is a block diagram of an exemplary computer system in accordance with one embodiment of the invention.
DETAILED DESCRIPTION
FIG. 1, of the accompanying drawings, shows a network system 10 which can be used in accordance with one embodiment of the present invention. The network system 10 includes a search system 12, a search engine 14, a network 16, and a plurality of client systems 18. The search system 12 includes a server 20, a database 22, an indexer 24, and a crawler 26. The plurality of client systems 18 includes a plurality of web search applications 28 a-f, located on each of the plurality of client systems 18. The server 20 includes a plurality of databases 30 a-d.
The server 20 is connected to the search engine 14. The search engine 14 is connected to the plurality of client systems 18 via the network 16. The server 20 is in communication with the database 22 which is in communication with the indexer 24. The indexer 24 is in communication with the crawler 26. The crawler 26 is capable of communicating with the plurality of client systems 18 via the network 16 as well.
The web search server 20 is typically a computer system, and may be an HTTP server. It is envisioned that the search engine 14 may be located at the web search server 20. The web search server 20 typically includes at least processing logic and memory.
The indexer 24 is typically a software program which is used to create an index, which is then stored in storage media. The index is typically a table of alphanumeric terms with a corresponding list of the related documents or the location of the related documents (e.g., a pointer). An exemplary pointer is a Uniform Resource Locator (URL). The indexer 24 may build a hash table, in which a numerical value is attached to each of the terms. The database 22 is stored in a storage media, which typically includes the documents which are indexed by the indexer 24. The index may be included in the same storage media as the database 22 or in a different storage media. The storage media may be volatile or non-volatile memory that includes, for example, read only memory (ROM), random access memory (RAM), magnetic disk storage media, optical storage media, flash memory devices and zip drives.
The crawler 26 is a software program or software robot, which is typically used to build lists of the information found on Web sites. Another common term for the crawler 26 is a spider. The crawler 26 typically searches Web sites on the Internet and keeps track of the information located in its search and the location of the information.
The network 16 is a local area network (LAN), wide area network (WAN), a telephone network, such as the Public Switched Telephone Network (PSTN), an intranet, the Internet, or combinations thereof.
The plurality of client systems 18 may be mainframes, minicomputers, personal computers, laptops, personal digital assistants (PDA), cell phones, and the like. The plurality of client systems 18 are characterized in that they are capable of being connected to the network 16. Web sites may also be located on the client systems 18. The web search application 28 a-f is typically an Internet browser or other software.
The databases 30 a-d are stored in storage media located at the server 20. The storage media may be volatile or non-volatile memory that includes, for example, read only memory (ROM), random access memory (RAM), magnetic disk storage media, optical storage media, flash memory devices and zip drives.
In use, the crawler 26 crawls websites, such as the websites of the plurality of client systems 18, to locate information on the web. The crawler 26 employs software robots to build lists of the information. The crawler 26 may include one or more crawlers to search the web. The crawler 26 typically extracts the information and stores it in the database 22. The indexer 24 creates an index of the information stored in the database 22. Alternatively, if a database 22 is not used, the indexer 24 creates an index of the information and where the information is located in the Internet (typically a URL).
When a user of one of the plurality of client systems 18 enters a search on the web search application 28, the search is communicated to the search engine 14 over the network 16. The search engine 14 communicates the search to the server 20 at the search system 12. The server 20 accesses the index and/or database to provide a search result, which is communicated to the user via the search engine 14 and network 16.
Alternatively or in addition to accessing the index and/or database to provide the search result, the databases 30 a-d can be searched, as will be described hereinafter.
FIG. 2 illustrates an exemplary user interface 200. The illustrated user interface 200 includes a user input box 204 and a button 208. The user input box 204 is configured to receive textual user input. The button 208 is designated by “Search” in FIG. 2 and is configured to be selectable by the user. A user accesses the interface 200 to enter a search query in the user input box 204. The user presses the “enter” key on a keyboard or selects (e.g. mouse clicks, mouses over, etc.) the button 208. Selection of the button 208 or pressing the “enter” key on a keyboard results in a communication of the text entered in the input box 204 from the client system 18 to the search engine 14 and the search system 12.
FIGS. 3A and 3B illustrate an exemplary search results page 300. FIGS. 3A and 3B together form the same results page 300, but views of the page from different scrolling points. It will be appreciated that the results pages of FIGS. 3A and 3B would typically appear as a single page on the user's web application.
The illustrated search results page 300 includes several regions including a suggested search term region 304, an advertisements region 306, a sponsored results region 308, a web results region 312, a news images region 316, an images region 320, an encyclopedia region 324, a news region 326 and a video region 328. Each of the advertisements region 306, sponsored results region 308, news images region 316, images region 320, encyclopedia region 324, news region 326 and video region 328 are representative of different channels of information (e.g., advertisements, sponsors, news images, images, news, encyclopedia, video, etc.). It will be appreciated that the channels presented are not limited to the above channels and may include a fewer number or greater number of channels. In addition, the amount of information from each channel may vary from that illustrated.
The different channels of information are sometimes referred to as vertical information and some or all of the channels of information may come from vertical databases. This information may be located at the search system (e.g., one or more of databases 30 a-d) or may be received from other servers (not shown) connected to the search system over a network. The results of the web search engine, an advertisement engine and special information channel engines (or combined advertisement and special information channel engine) are combined to determine the channels to be used and the composition of the page presented to the user (e.g., search results page 300 of FIGS. 3A and 3B).
While each source of channel information can provide a large amount of information in responding to a query, the overall display space of a results page is limited. In addition, information from each channel may not be relevant to each query. Similarly, the quality of results tends to degrade when an excessive amount of information is supplied from a specific channel. For example, the top three results of a web search may be more relevant to a query than the bottom three results. Similarly, in another example, the top result of an advertisement link is often more relevant than the second or third advertisement links.
Embodiments of the present invention relate to optimization in selecting content from multiple channels, even though the engines behind these channels may come from different organizations and service providers. The search system combines information from independent information providers. The search system determines the relationship among the different channels. The search system can optimize channel selection even though the web search engine does not have direct access to information in backend channel providers.
In one embodiment, the channel selection is a contextual optimization of multi-channel information selection using user behavior and/or query context analysis. For example, the web search engine increases the amount of content selected from one specific channel if such content is more attractive to users and optimizes the overall content acceptance of the composed query responses.
In another embodiment, the web search engine suppresses content from a channel. For example, in certain query categories, users tend to be more interested in commercial products; thus, the number of sponsored advertisement links can be increased for such users and reduced for a category that has less relation to money, while maintaining the overall advertisement inventory consumption, resulting in increased user satisfaction and, therefore, improved user retention.
The search system optimizes content selection from multiple channels adaptively based on a user's query, the user's recent clicks, and/or user's recent pick behavior on selected channels. The search system identifies user-contexts in which a user is more or less likely to click on a specific channel. The search system can then retrieve more or less content depending on the users to determine the relevancy of the information. The search system, therefore, increases user satisfaction towards the selected content.
FIG. 4 illustrates a system 400 for mining and classifying data. The system 400 may be part of the search system 12 of FIG. 1. The illustrated system 400 is configured to track and mine user profiles and preferences for use of each channel based on a user's historical data, track and mine relevancy information appropriateness of each data channel for a given query and a user, or a group of users under different categories, track and mine click path among these channels, and optimize channel placement for best retention and pick rates for targeted channels.
The system 400 includes historical click-stream data 404, user-centric click records 408, query-centric click records 412, a DMDB (Data Mining DataBase) 416, a query-channel relevancy classifier 420, a channel-path classifier 424, a user bias classifier 428 and a recommendation aggregator 432.
The historical click-stream data 404 is collected from a data log feed. The data 404 is inserted into the DMDB 416 along with user-centric click records 408 and query-centric click records 412. The information in DMDB 416 is used as training data to develop independent classifiers (e.g., query-channel relevancy classifier 420, channel path classifier 424, user bias classifier 428).
The query-channel relevancy classifier 420 is configured to analyze the categorization of queries into groups, such as, for example, shopping, auto, finance, health, education, etc. The queries can also be grouped into query sessions. A query session includes any sequence of search engine actions (activities that can be recorded by the search engine) of a given user.
The query-channel relevancy classifier 420 is also configured to determine the access path from one channel to another. For example, a user that accesses information associated with a “Dictionary” channel to learn the meaning of words is less likely to read advertisement content from an advertisement channel. The query classifier identifies the relevance degree of each channel's content to a query or a class of queries, which can potentially encourage the acceptance of its content for a user or a group of users.
The user bias classifier 428 is configured to analyze users classified by, for example, geolocations, a user's past preference for specified channels, whether the user accesses several channels, whether a user never accesses certain channels, whether the user is a new user, whether the user is a frequent user, whether the user is a casual user, and the like.
The user bias classifier 428 is also configured to analyze users based on their access times and events in selecting certain channels. Exemplary access times include, for example, morning, evening, nights, weekends, weekdays, etc. Exemplary events include holidays and major events, such as the Super Bowl, 9/11, Christmas, etc.
The recommendation aggregator 432 is configured to dynamically collect live data for instant recommendation channel resources based on the classification results from the query-channel relevancy classifier 420, channel-path classifier 424 and user bias classifier 428 along with runtime behavior tracking.
The recommendation aggregator 432 computes a recommendation value F for increasing or decreasing use of content from a targeted channel under conditions Q, T and U (i.e., F(Q, T, U), where Q is a query, or a class of queries; T is time of access or an event; and, U is a user or class of users.
FIG. 5 is a flow diagram illustrating a method associated with the system of FIG. 4 in further detail. The method 500 begins at block 504 by tracking and mining user profiles and preferences for use of each channel based on historical data of the user. The method 500 continues by tracking and mining relevancy information appropriateness of each data channel for a given query and a user, or a group of users under different categories (block 508). The method 500 continues by tracking and mining click, path among channels (block 512). The method 500 continues by offline classification of query-channel relevancy, user past interests and channel usage patterns (block 516). The method 500 continues by dynamic collection of live data for instant recommendation of channel sources based on offline classification results and runtime behavior tracking (block 520). The method 500 continues by optimizing channel placement for best retention and pick rates for targeted channels (block 524).
FIG. 6 schematically illustrates interaction between the search engine 600, the channel wise 604 (a channel selection system) and the user interfaces 608, 612. The channel wise 604 is also coupled with channels 616, 618 and 620. For example, the information channel 1, 616, may be an advertisement channel, the information channel 2, 618, may be a dictionary channel and the information channel n, 620, may be any information channel, such as, for example, an encyclopedia channel, an images channel, a news images channel, a sponsored results channel, etc. It will be appreciated that any number or combination of channels may be provided. It will also be appreciated that some of the channels may be located at the search system while others are located external the search system. For example, information channel 1 616 may be located in a database 30 a of the search system 12 of FIG. 1, while information channels 618 and 620 may be coupled to the search system 12 through the network 16.
The user interface 608 corresponds to an interface in which the user enters a search query for the search engine. The search query is communicated to the search engine web site 600. The web site 600 communicates the user query to the channel wise 604. The channel wise 604 determines an optimized content channel selection based on the user and the query, and communicates the result to the web site 600. For example, channel wise 604 may indicate that an increase or no change in the channel content should be performed by the web site 600. The web site 600 then provides the user with the search results and information from the selected channels with the user interface 612.
In one embodiment, the channel wise 604 is co-run within the web site 600. In one embodiment, a web server front end captures the contextual information needed to trigger the recommendation from the channel wise 604. For example, the contextual information may include one or more of query text, prior click history and user ID. The channel wise 604 generates its recommendation based on, for example, the F(Q,T,U) function and provides the recommendation to the web site 600. The web site 600, in turn, provides the recommended amount of information to the user from the recommended channels.
FIG. 7 illustrates a method of contextual optimization of content selection 700 in accordance with one embodiment of the invention. The method 700 begins by receiving a query (block 704).
The method 700 continues by computing a historical channel pick rate for the query (block 708). For example, if many users in the past click results presented from a specific channel, the past click results indicate relevancy of content from the selected channel(s) for the given query. The aggregated behavior information is computed for the tuple (Q, U, T), in which Q is a query, U is a user and T is a time or event.
If historical data is available, the method 700 continues by computing aggregated behavior information (block 712) and deriving the probability of click for a channel given a group of users (or a user), query category (or a query) and access time range (block 716). The method 700 continues by computing channel relevancy for the query (block 720).
If historical data is not available, the method 700 continues by partitioning the query to identify concept blocks (block 724). Although some queries may not be found in the historical data database, parts of a new query (i.e., sub-queries or concept blocks) may have been submitted in the past. The channel relevancy of these sub-queries or concept blocks can be used to estimate the channel relevancy for the new query as a whole.
The method 700 continues by classifying each concept block separately (block 728). The concept blocks may include, for example, special phrases, words, bi-grams, tri-grams, k-grams, and the like. Stemming techniques, for example, can be applied in concept block extraction. The method 700 continues by aggregating classification decisions (block 732). The method 700 continues by computing channel relevancy for the query (block 720).
Thus, for a set of query contexts (query or block concepts) in the historical data, the following tuples: <Query or Query Group, Number of past clicks, User Group, Time Range (or Event)> are extracted for each channel.
FIG. 8 illustrates the method of extracting concept blocks 800 in further detail. The method 800 begins by receiving a query (block 804). The method 800 continues by applying stemming techniques to the query to extract concept blocks from the query (block 808). The method 800 continues by identifying relevancy properties of each concept block (block 812). The method 800 continues by aggregating relevancy properties of each concept block (block 816).
FIG. 9 illustrates a method 900 for computing the channel relevancy degree of each concept block in a dataset for each user group and a time range in accordance with one embodiment of the invention. The method 900 begins by for a channel c_i (block 904) and for each pair <q, n>(block 908), removing stop words from q to get q′ (block 912). Channel c_i corresponds to a channel, q is a query, n is the number of clicks associated with a query, q is a query and q′ is a searchable query.
The method 900 continues by finding the set, U, of concept blocks from q′ (block 916).
The method 900 continues by for each u in U, incrementing NO_OCC(u) by 1 and incrementing NO_CL(u, c_i) by n (block 920), wherein NO_OCC(u) is the number of occurrences of u and NO_CL(u, c_i) is the number of clicks of u on c_i.
In one embodiment, the method 900 continues by for any concept block, u, the probability of a channel click if a query contains u is P_CL(u, c_i)=NO_CL(u, c_i)/NO_OCC(u) (block 924).
In another embodiment, the method 900 continues by for any concept block, u, the probability of a channel click if a query contains u is P_CL (u, c_i)=NO_CL(u, c_i)+BASE_CLICK(c_i)/NO_OCC(u)+BASE_OCC, where BASE_CLICK(c_i) defines the base number of clicks on engine c_i on BASE_OCC (block 928). Queries with low frequency tend to have large swings in their values. For example, the probability of a concept block with three occurrences is not as reliable as the probability of a concept block with 300 occurrences. To solve this problem, the method 900 using block 928 can be used. BASE_OCC may be chosen based on the data distribution and may be uniform across all the channels. BASE_CLICK(c_i) is computed based on the historical click data.
In yet another embodiment, the method 900 continues by for any set of concept blocks, U, the probability of a channel click if a query contains all the concept blocks in U is P_CL(q, c_i) as SUM(P_CL (u, c_i) for all u in U (block 932). The method 900 continues with block 932 when concept blocks are used. Block 932 computes the channel relevancy degree of the query by aggregating the channel click probability of its components (e.g., concept blocks). The SUM function used is a weighted function. For example, special phrases have very large weights and the tri-grams have larger weights than bi-grams and words; and bi-grams have larger weights than words.
FIG. 10 schematically illustrates the flow of the process within the system. Historical data 1000 is used as training for user groups and access time 1004. Concept blocks 1008 and probability of click 1012 are determined. A query group q 1016 is analyzed to determine the concept blocks of q 1020. The probability of the channel click for q 1024 is determined based on the concept blocks of q 1020 and the concept blocks 1008 and probability of click 1012.
FIG. 11 illustrates a method 1100 for classifying channel paths in accordance with one embodiment of the invention. The classifier computes the likelihood of clicking a specific channel after a user has accessed other channels. There is a strong preference among users in the order of using different channels. For example, generally, if a user chooses a channel, the user is more likely to choose the same channel again in the immediate future. In another example, some channels complement each other or offer similar content. For example, a channel that shows reviews of a product is related to a channel that shows the stores the product is sold. This classifier extracts the preferred user paths between channels using historical data.
The method 1100 begins by accumulating user records grouped by session and sorted by time (block 1104). The method 1100 continues by computing the probability of a channel click following each channel path (block 1108). The computation 1108 may include for consecutive queries with a session, considering the overall click rate of each channel (block 1112). The computation 1108 may also include for each channel c_i, finding consecutive records in format <path, New_query, c_i>, and computing the ratio r(path, c_i)=COUNT(<path, New_query, c_i>) s.t. channel c_i is selected/COUNT<path, New_query>) (block 1116). The ratio r(path, c_i) measures the tendency of a user who followed the path to select channel c_i on the next query. The r values are stored, and when a user performs a new query, the path(s) corresponding to r are fetched.
FIG. 12 illustrates a method for user biased classification 1200 in accordance with one embodiment of the invention. This classifier is to derive the aggregative behavior of a user or a user group at a certain time range or following a certain event. The classifier determines if a user or a user group has a bias using or not using certain channels under such a temporal condition.
The method 1200 begins by accumulating user records grouped by session and sorted by time (block 1204). The method 1200 continues by for each channel c_i and user or user group u, computing bias b with the following expression: B(u, c_i)=COUNT (c_i selection by user u)/COUNT (New query results by user u) (block 1208). These bias b values are stored for users and channels and the bias b values are then used to make channel information determinations.
FIG. 13 is an exemplary user interface 1300 illustrating a search results page including optimized channel selection in accordance with one embodiment of the invention. FIG. 13 illustrates a search results page for a search query for “credit report.” The channels that are illustrated for a “credit report” query include: five sponsored results, six images, one encyclopedia entry and one news entry.
FIG. 14 is an exemplary user interface 1400 illustrating a search results page including optimized channel selection in accordance with one embodiment of the invention. FIG. 14 illustrates search results for a search query that is “ucsb.” For example, the channels for the search “ucsb” include no sponsored results, one encyclopedia entry, two videos and one blog.
When the user interfaces 1300 and 1400 of FIGS. 13 and 14, respectively, are compared, it can be seen that different amounts and types of information content from different channels are presented for different search queries.
As described above, the channels are picked and the amount of content is determined based on an analysis of the query or query concept blocks, user history and time. The web search engine determines how to display the results page based on the channel recommendation provided by the channel recommendation system and the web search results.
FIG. 15 is one embodiment of a computer system on which embodiments of the present invention may be implemented. It will be apparent to those of ordinary skill in the art, however, that other alternative systems of various system architectures may also be used.
The data processing system illustrated in FIG. 15 includes a bus or other internal communication means 1565 for communicating information, and a processor 1560 coupled to the bus 1565 for processing information. The system further comprises a random access memory (RAM) or other volatile storage device 1550 (referred to as memory), coupled to bus 1565 for storing information and instructions to be executed by processor 1560. Main memory 1550 also may be used for storing temporary variables or other intermediate information during execution of instructions by processor 1560. The system also comprises a read only memory (ROM) and/or static storage device 1520 coupled to bus 1565 for storing static information and instructions for processor 1560, and a data storage device 1525 such as a magnetic disk or optical disk and its corresponding disk drive. Data storage device 1525 is coupled to bus 1565 for storing information and instructions.
The system may further be coupled to a display device 1570, such as a cathode ray tube (CRT) or a liquid crystal display (LCD) coupled to bus 1565 through bus 1565 for displaying information to a computer user. An alphanumeric input device 1575, including alphanumeric and other keys, may also be coupled to bus 1565 through bus 1565 for communicating information and command selections to processor 1560. An additional user input device is cursor control device 1580, such as a mouse, a trackball, stylus, or cursor direction keys coupled to bus 1565 through bus 1565 for communicating direction information and command selections to processor 1560, and for controlling cursor movement on display device 1570.
Another device, which may optionally be coupled to computer system 1500, is a communication device 1590 for accessing other nodes of a distributed system via a network. The communication device 1590 may include any of a number of commercially available networking peripheral devices such as those used for coupling to an Ethernet, token ring, Internet, or wide area network. The communication device 1590 may further be a null-modem connection, or any other mechanism that provides connectivity between the computer system 1500 and the outside world. Note that any or all of the components of this system illustrated in FIG. 15 and associated hardware may be used in various embodiments of the present invention.
It will be appreciated by those of ordinary skill in the art that any configuration of the system may be used for various purposes according to the particular implementation. The control logic or software implementing the present invention can be stored in main memory 1550, mass storage device 1525, or other storage medium locally or remotely accessible to processor 1560.
It will be apparent to those of ordinary skill in the art that the system, method, and process described herein can be implemented as software stored in main memory 1550 or read only memory 1520 and executed by processor 1560. This control logic or software may also be resident on an article of manufacture comprising a computer readable medium having computer readable program code embodied therein and being readable by the mass storage device 1525 and for causing the processor 1560 to operate in accordance with the methods and teachings herein.
The present invention may also be embodied in a handheld or portable device containing a subset of the computer hardware components described above. For example, the handheld device may be configured to contain only the bus 1565, the processor 1560, and memory 1550 and/or data storage device 1525. The handheld device may also be configured to include a set of buttons or input signaling components with which a user may select from a set of available options. The handheld device may also be configured to include an output apparatus such as a liquid crystal display (LCD) or display element matrix for displaying information to a user of the handheld device. Conventional methods may be used to implement such a handheld device. The implementation of the present invention for such a device would be apparent to one of ordinary skill in the art given the disclosure of the present invention as provided herein.
The present invention may also be embodied in a special purpose appliance including a subset of the computer hardware components described above. For example, the appliance may include a processor 1560, a data storage device 1525, a bus 1565, and memory 1550, and only rudimentary communications mechanisms, such as a small touch-screen that permits the user to communicate in a basic manner with the device. In general, the more special-purpose the device is, the fewer of the elements need be present for the device to function. In some devices, communications with the user may be through a touch-based screen, or similar mechanism.
It will be appreciated by those of ordinary skill in the art that any configuration of the system may be used for various purposes according to the particular implementation. The control logic or software implementing the present invention can be stored on any machine-readable medium locally or remotely accessible to processor 1560. A machine-readable medium includes any mechanism for storing or transmitting information in a form readable by a machine (e.g. a computer). For example, a machine readable medium includes read-only memory (ROM), random access memory (RAM), magnetic disk storage media, optical storage media, flash memory devices, electrical, optical, acoustical or other forms of propagated signals (e.g. infrared signals, digital signals, etc.).
The foregoing description with attached drawings is only illustrative of possible embodiments of the described method and should only be construed as such. Other persons of ordinary skill in the art will realize that many other specific embodiments are possible that fall within the scope and spirit of the present idea. The scope of the invention is indicated by the following claims rather than by the foregoing description. Any and all modifications which come within the meaning and range of equivalency of the following claims are to be considered within their scope.

Claims (10)

The invention claimed is:
1. A method implemented by at least a computer system for adaptive multi-channel content selection with behavior-aware query analysis comprising:
receiving a search query from a user;
determining, for each of a plurality of channels wherein each channel has a plurality of web results, a probability that the user will select a web result of the channel for the search query;
making at least one selection decision among the plurality of channels based on corresponding probabilities that the user will select a respective web result of at least a respective channel based on historical data of the user, wherein
the historical data comprises past click results and access time range associated with the query, and
the selection decision comprises at least selecting channel placement of the at least respective channel for presentation to the user;
determining an optimal number of web results to display for each of the plurality of channels based at least in part on the corresponding probabilities that the user will select the respective web result of the at least respective channel based on the historical data for each channel and the search query; and
presenting a number of web results to the user, a wherein the number of web results displayed for each of the plurality of channels being based on the optimal number determined for each of the plurality of channels.
2. The method of claim 1, wherein the time range comprises an event.
3. The method of claim 1, further comprising:
storing historical data of queries and selections; and
analyzing the historical data based on the search query and the user to determine the probability for each channel.
4. The method of claim 1, further comprising:
identifying query concepts from the search query;
analyzing the historical data based on each of the query concepts to determine a recommended channel selection; and
aggregating the recommended channel selection for each of the query concepts.
5. The method of claim 1, wherein the selection decision comprises dynamically determining a recommended channel selection.
6. The method of claim 1, further comprising:
if the historical data cannot be analyzed then the determining the probability for each channel is implemented by:
partitioning the received search query to identify concept blocks;
classifying each concept block to determine a classification decision;
aggregating the classification decision of each concept block; and
computing a channel relevancy for the received search query.
7. The method of claim 1, further comprising:
tracking and mining historical user data associated with a plurality of data channels;
tracking and mining relevancy information appropriate of each data channel based on historical data of each user;
tracking and mining relevancy information appropriate of each data channel for a given query and user or group of queries and users;
tracking and mining click paths among the plurality of channels;
classifying query-channel relevancy, user past interests, and channel usage patterns;
collecting data for instant recommendation of channel sources based on classification results; and
optimizing channel placement for best retention and pick rates for targeted.
8. A computer implemented adaptive multi-channel content selection system comprising:
a computer memory for storing a set of instructions;
a computer processor connected to the memory;
a web search engine module stored in the computer memory and executable by the computer processor to receive a search query from a user and present web results corresponding to the search query to the user; and
a channel selection system module stored in the computer memory and executable by the computer processor to:
(i) determine, for each of a plurality of channels wherein each channel has a plurality of web results, a probability that the user will select a web result of the channel for the search query,
(ii) make at least one selection decision among the plurality of channels based on corresponding probabilities that the user will select a respective web result of at least a respective channel based on historical data of the user, wherein
the historical data comprises past click results and access time range associated with the query, and
the selection decision comprises at least selecting channel placement of the at least respective channel for presentation to the user and
(iii) determine an optimal number of the web results to display for each of the plurality of channels based at least in part on the corresponding probabilities that the user will select the respective web result of the at least respective channel based on the historical data for each channel and the search query, and
(iv) display a number of the web results to the user, wherein the number being based on the optimal number determined for each of the plurality of channels.
9. The system of claim 8, further comprising a plurality of databases located in the computer memory, the plurality of databases coupled with the web search engine module.
10. The system of claim 8, further comprising:
a plurality of database records consisting of multi-channel content;
a user-centric database, consisting of user centric click records, the click records used as training data to develop independent classifiers;
a query-channel relevancy classifier configured to analyze categorization of queries into groups and identify a relevancy degree of each channel's content to a the search query or class of queries;
a channel-path classifier configured to determine an access path from one channel to another;
a user-bias classifier configured to analyze user search habits pertaining to the multi-channel content; and
a recommendations aggregator configured to collect database records from multi-channel content sources based on the classification results from the query-channel classifier, channel-path classifier and user-bias classifier.
US12/148,190 2008-04-16 2008-04-16 Adaptive multi-channel content selection with behavior-aware query analysis Expired - Fee Related US8751481B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US12/148,190 US8751481B2 (en) 2008-04-16 2008-04-16 Adaptive multi-channel content selection with behavior-aware query analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US12/148,190 US8751481B2 (en) 2008-04-16 2008-04-16 Adaptive multi-channel content selection with behavior-aware query analysis

Publications (2)

Publication Number Publication Date
US20090265325A1 US20090265325A1 (en) 2009-10-22
US8751481B2 true US8751481B2 (en) 2014-06-10

Family

ID=41201970

Family Applications (1)

Application Number Title Priority Date Filing Date
US12/148,190 Expired - Fee Related US8751481B2 (en) 2008-04-16 2008-04-16 Adaptive multi-channel content selection with behavior-aware query analysis

Country Status (1)

Country Link
US (1) US8751481B2 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100211565A1 (en) * 2008-10-20 2010-08-19 Facility Italia S.P.A. Method for searching for multimedia content items on the internet
US20130246385A1 (en) * 2012-03-13 2013-09-19 Microsoft Corporation Experience recommendation system based on explicit user preference
US20150046354A1 (en) * 2013-08-07 2015-02-12 20/20 Profiles, LLC Pet matching system and method
US20170083584A1 (en) * 2015-09-23 2017-03-23 Motorola Solutions, Inc. Apparatus, system, and method for responding to a user-initiated query with a context-based response
US10965573B1 (en) * 2014-09-09 2021-03-30 Wells Fargo Bank, N.A. Systems and methods for online user path analysis

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8266135B2 (en) * 2009-01-05 2012-09-11 International Business Machines Corporation Indexing for regular expressions in text-centric applications
US20110320424A1 (en) * 2010-06-29 2011-12-29 Intuit Inc. Assessing and adapting component parameters
US8577973B2 (en) 2010-06-30 2013-11-05 International Business Machines Corporation Accelerated micro blogging using correlated history and targeted item actions
CN102591880B (en) * 2011-01-14 2015-02-18 阿里巴巴集团控股有限公司 Information providing method and device
US8977629B2 (en) * 2011-05-24 2015-03-10 Ebay Inc. Image-based popularity prediction
CN102890683B (en) * 2011-07-21 2016-01-20 阿里巴巴集团控股有限公司 Information providing method and device
US20130246415A1 (en) * 2012-03-13 2013-09-19 Microsoft Corporation Searching based on others' explicitly preferred sources
CN105117383A (en) * 2015-08-14 2015-12-02 百度在线网络技术(北京)有限公司 Search result providing method and apparatus
CN112507213B (en) * 2020-11-26 2022-09-30 杭州讯酷科技有限公司 Method for recommending optimized system scheme based on behavior big data analysis

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050004889A1 (en) * 1999-12-08 2005-01-06 Bailey David R. Search engine system and associated content analysis methods for locating web pages with product offerings
US20060036581A1 (en) * 2004-08-13 2006-02-16 Microsoft Corporation Automatic categorization of query results
US20060224496A1 (en) * 2005-03-31 2006-10-05 Combinenet, Inc. System for and method of expressive sequential auctions in a dynamic environment on a network
US20070192309A1 (en) * 2005-10-12 2007-08-16 Gordon Fischer Method and system for identifying sentence boundaries
US20070233671A1 (en) * 2006-03-30 2007-10-04 Oztekin Bilgehan U Group Customized Search
US20070260624A1 (en) * 2006-03-29 2007-11-08 Chung Christina Y Incremental update of long-term and short-term user profile scores in a behavioral targeting system
US20070288473A1 (en) * 2006-06-08 2007-12-13 Rajat Mukherjee Refining search engine data based on client requests
US20070294225A1 (en) * 2006-06-19 2007-12-20 Microsoft Corporation Diversifying search results for improved search and personalization
US20080016040A1 (en) * 2006-07-14 2008-01-17 Chacha Search Inc. Method and system for qualifying keywords in query strings
US20080033982A1 (en) * 2006-08-04 2008-02-07 Yahoo! Inc. System and method for determining concepts in a content item using context
US20080154870A1 (en) * 2006-12-26 2008-06-26 Voice Signal Technologies, Inc. Collection and use of side information in voice-mediated mobile search
US20080172374A1 (en) * 2007-01-17 2008-07-17 Google Inc. Presentation of Local Results
US20080201413A1 (en) * 2005-05-24 2008-08-21 Sullivan Alan T Enhanced Features for Direction of Communication Traffic
US20080222283A1 (en) * 2007-03-08 2008-09-11 Phorm Uk, Inc. Behavioral Networking Systems And Methods For Facilitating Delivery Of Targeted Content
US20080250035A1 (en) * 2007-02-05 2008-10-09 Smith Daniel C Systems and methods for organizing content for mobile media services

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050004889A1 (en) * 1999-12-08 2005-01-06 Bailey David R. Search engine system and associated content analysis methods for locating web pages with product offerings
US20060036581A1 (en) * 2004-08-13 2006-02-16 Microsoft Corporation Automatic categorization of query results
US20060224496A1 (en) * 2005-03-31 2006-10-05 Combinenet, Inc. System for and method of expressive sequential auctions in a dynamic environment on a network
US20080201413A1 (en) * 2005-05-24 2008-08-21 Sullivan Alan T Enhanced Features for Direction of Communication Traffic
US20070192309A1 (en) * 2005-10-12 2007-08-16 Gordon Fischer Method and system for identifying sentence boundaries
US20070260624A1 (en) * 2006-03-29 2007-11-08 Chung Christina Y Incremental update of long-term and short-term user profile scores in a behavioral targeting system
US20070233671A1 (en) * 2006-03-30 2007-10-04 Oztekin Bilgehan U Group Customized Search
US20070288473A1 (en) * 2006-06-08 2007-12-13 Rajat Mukherjee Refining search engine data based on client requests
US20070294225A1 (en) * 2006-06-19 2007-12-20 Microsoft Corporation Diversifying search results for improved search and personalization
US20080016040A1 (en) * 2006-07-14 2008-01-17 Chacha Search Inc. Method and system for qualifying keywords in query strings
US20080033982A1 (en) * 2006-08-04 2008-02-07 Yahoo! Inc. System and method for determining concepts in a content item using context
US20080154870A1 (en) * 2006-12-26 2008-06-26 Voice Signal Technologies, Inc. Collection and use of side information in voice-mediated mobile search
US20080172374A1 (en) * 2007-01-17 2008-07-17 Google Inc. Presentation of Local Results
US20080250035A1 (en) * 2007-02-05 2008-10-09 Smith Daniel C Systems and methods for organizing content for mobile media services
US20080222283A1 (en) * 2007-03-08 2008-09-11 Phorm Uk, Inc. Behavioral Networking Systems And Methods For Facilitating Delivery Of Targeted Content

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100211565A1 (en) * 2008-10-20 2010-08-19 Facility Italia S.P.A. Method for searching for multimedia content items on the internet
US9519713B2 (en) * 2008-10-20 2016-12-13 Facilitylive S.R.L. Method for searching for multimedia content items on the internet
US20130246385A1 (en) * 2012-03-13 2013-09-19 Microsoft Corporation Experience recommendation system based on explicit user preference
US20150046354A1 (en) * 2013-08-07 2015-02-12 20/20 Profiles, LLC Pet matching system and method
US9972056B2 (en) * 2013-08-07 2018-05-15 20/20 Profiles, LLC Pet matching system and method
US10965573B1 (en) * 2014-09-09 2021-03-30 Wells Fargo Bank, N.A. Systems and methods for online user path analysis
US11368383B1 (en) 2014-09-09 2022-06-21 Wells Fargo Bank, N.A. Systems and methods for online user path analysis
US11695675B1 (en) 2014-09-09 2023-07-04 Wells Fargo Bank, N.A. Systems and methods for online user path analysis
US20170083584A1 (en) * 2015-09-23 2017-03-23 Motorola Solutions, Inc. Apparatus, system, and method for responding to a user-initiated query with a context-based response
US11868354B2 (en) * 2015-09-23 2024-01-09 Motorola Solutions, Inc. Apparatus, system, and method for responding to a user-initiated query with a context-based response

Also Published As

Publication number Publication date
US20090265325A1 (en) 2009-10-22

Similar Documents

Publication Publication Date Title
US8751481B2 (en) Adaptive multi-channel content selection with behavior-aware query analysis
US11023513B2 (en) Method and apparatus for searching using an active ontology
US7685091B2 (en) System and method for online information analysis
US8494897B1 (en) Inferring profiles of network users and the resources they access
US9047341B2 (en) Method, apparatus and system of intelligent navigation
US9251279B2 (en) Methods and systems for using community defined facets or facet values in computer networks
Shokouhi et al. From queries to cards: Re-ranking proactive card recommendations based on reactive search history
CN100401292C (en) Systems and methods for search query processing using trend analysis
US8015065B2 (en) Systems and methods for assigning monetary values to search terms
US20090100015A1 (en) Web-based workspace for enhancing internet search experience
Billsus et al. Improving proactive information systems
US20110320437A1 (en) Infinite Browse
US20080244429A1 (en) System and method of presenting search results
US20090327913A1 (en) Using web revisitation patterns to support web interaction
WO2001025947A1 (en) Method of dynamically recommending web sites and answering user queries based upon affinity groups
US20090313217A1 (en) Systems and methods for classifying search queries
US8712999B2 (en) Systems and methods for online search recirculation and query categorization
US7831474B2 (en) System and method for associating an unvalued search term with a valued search term
TW201447797A (en) Method and system for multi-phase ranking for content personalization
US20200294071A1 (en) Determining user intents related to websites based on site search user behavior
US20190205465A1 (en) Determining document snippets for search results based on implicit user interactions
Gasparetti Modeling user interests from web browsing activities
WO2021179481A1 (en) Cold start method and apparatus for personalizing and pushing data content, device and storage medium
US20100169316A1 (en) Search query concept based recommendations
US20140181096A1 (en) Entity name disambiguation

Legal Events

Date Code Title Description
AS Assignment

Owner name: IAC SEARCH & MEDIA, INC., CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:CAMOGLU, ORHAN;CHRABAKH, WAHID;DING, LEI;AND OTHERS;REEL/FRAME:020860/0065;SIGNING DATES FROM 20080326 TO 20080411

Owner name: IAC SEARCH & MEDIA, INC., CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:CAMOGLU, ORHAN;CHRABAKH, WAHID;DING, LEI;AND OTHERS;SIGNING DATES FROM 20080326 TO 20080411;REEL/FRAME:020860/0065

STCF Information on status: patent grant

Free format text: PATENTED CASE

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551)

Year of fee payment: 4

FEPP Fee payment procedure

Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

LAPS Lapse for failure to pay maintenance fees

Free format text: PATENT EXPIRED FOR FAILURE TO PAY MAINTENANCE FEES (ORIGINAL EVENT CODE: EXP.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

STCH Information on status: patent discontinuation

Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362

FP Lapsed due to failure to pay maintenance fee

Effective date: 20220610