[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN106815228B - Method and device for selecting class name of search keyword - Google Patents

Method and device for selecting class name of search keyword Download PDF

Info

Publication number
CN106815228B
CN106815228B CN201510850384.0A CN201510850384A CN106815228B CN 106815228 B CN106815228 B CN 106815228B CN 201510850384 A CN201510850384 A CN 201510850384A CN 106815228 B CN106815228 B CN 106815228B
Authority
CN
China
Prior art keywords
search
keywords
target website
search keywords
search keyword
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201510850384.0A
Other languages
Chinese (zh)
Other versions
CN106815228A (en
Inventor
贺达
冯鸳鹤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Gridsum Technology Co Ltd
Original Assignee
Beijing Gridsum Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Gridsum Technology Co Ltd filed Critical Beijing Gridsum Technology Co Ltd
Priority to CN201510850384.0A priority Critical patent/CN106815228B/en
Publication of CN106815228A publication Critical patent/CN106815228A/en
Application granted granted Critical
Publication of CN106815228B publication Critical patent/CN106815228B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a method and a device for selecting class names of search keywords. Wherein, the method comprises the following steps: clustering search keywords of a target website to obtain multiple types of search keywords, wherein the search keywords are keywords adopted when the target website is searched in a site, and the target website divides pages in the site through columns; inquiring a landing page landed when search keywords are searched in a target website, and determining column names of columns where the landing pages corresponding to the search keywords are located; and for each type of search keywords in the multi-type search keywords, selecting a column name from column names of columns where landing pages corresponding to the search keywords contained in each type of search keywords are located, and using the column name as the type name of the search keywords. The method and the device solve the technical problem that the name of the class selected by the existing selecting mode cannot reflect the characteristics of the class.

Description

搜索关键词的类名选取方法和装置Method and device for selecting class name for search keywords

技术领域technical field

本申请涉及互联网领域,具体而言,涉及一种搜索关键词的类名选取方法和装置。The present application relates to the field of the Internet, and in particular, to a method and device for selecting a class name for a search keyword.

背景技术Background technique

在互联网领域,网站是用于向用户提供信息的重要平台。大部分网站都提供了站内搜索,以供用户在网站内搜索相关信息。通过记录用户所搜索的内容即可了解用户比较关注的信息,及其需求。为了更好地了解用户的关注点和需求,网站运营商通常会对用户在网站的站内搜索关键词进行归类,将一系列相关的搜索关键词分到一个类下,并给每个类定义其类名。In the Internet domain, a website is an important platform for providing information to users. Most websites provide in-site search for users to search for relevant information within the website. By recording the content searched by users, you can understand the information that users are more concerned about and their needs. In order to better understand users' concerns and needs, website operators usually classify users' search keywords on the website, group a series of related search keywords into one category, and define each category. its class name.

然而,现有的搜索关键词的类名的选取方式,通常是选取在一类搜索关键词中一定范围内与该类其他词联系最多的词作为类名,而用这样选择出来的类名通常是与大部分词都有关联但是却不能反映所在类的特点的词,例如在搜索关键词的类【房地产,房屋,地产,商品房,物业,购置税】中通过以上现有的方式选择的类名会是“购置税”,因为“购置税”与其他词都有关联,而其他词之间因为是近义词可以相互替代反而联系较少。然而通过对这些搜索关键词进行分析,可以很明显的看出使用房地产作为类名的效果会更好。However, the existing method of selecting the class name of a search keyword is usually to select the word that is most related to other words of this class within a certain range of a class of search keywords as the class name, and the class name selected in this way is usually It is a word that is related to most words but does not reflect the characteristics of the category. For example, the category selected by the above existing methods in the category of search keywords [real estate, housing, real estate, commercial housing, property, purchase tax] The name will be "acquisition tax", because "acquisition tax" is related to other words, and other words can be substituted for each other because they are synonyms, but there is less connection. However, by analyzing these search keywords, it is obvious that using real estate as a class name will be more effective.

针对上述的问题,目前尚未提出有效的解决方案。For the above problems, no effective solution has been proposed yet.

发明内容SUMMARY OF THE INVENTION

本申请实施例提供了一种搜索关键词的类名选取方法和装置,以至少解决现有的选取方式选出的类名不能反映所在类的特点的技术问题。The embodiments of the present application provide a method and device for selecting a class name for a search keyword, so as to at least solve the technical problem that the class name selected by the existing selection method cannot reflect the characteristics of the class in which it belongs.

根据本申请实施例的一个方面,提供了一种搜索关键词的类名选取方法,包括:对目标网站的搜索关键词进行聚类,得到多类搜索关键词,其中,所述搜索关键词为对所述目标网站进行站内搜索时所采用的关键词,所述目标网站通过栏目划分站内页面;查询所述搜索关键词在所述目标网站进行站内搜索时所着陆的着陆页面,确定所述搜索关键词对应的着陆页面所在栏目的栏目名称;对于所述多类搜索关键词中每一类搜索关键词,从所述每一类搜索关键词所包含的搜索关键词对应的着陆页面所在栏目的栏目名称中选择一个栏目名称,作为该类搜索关键词的类名。According to an aspect of the embodiments of the present application, a method for selecting class names of search keywords is provided, including: clustering search keywords of a target website to obtain multiple types of search keywords, wherein the search keywords are The keyword used when the target website is searched on the site, and the target website is divided into pages by columns; query the landing page that the search keyword landed on when the target website is searched on the site, and determine the search The column name of the column where the landing page corresponding to the keyword is located; for each type of search keyword in the multi-type search keywords, from the column where the landing page corresponding to the search keyword contained in each type of search keyword is located Select a column name from Column Name as the class name of this type of search keyword.

进一步地,从所述每一类搜索关键词所包含的搜索关键词对应的着陆页面所在栏目的栏目名称中选择一个栏目名称,作为该类搜索关键词的类名包括:统计所述每一类搜索关键词中搜索关键词对应的着陆页面所在栏目的栏目名称的出现次数;以及对于所述每一类搜索关键词,选择统计后出现次数最多的栏目名称作为该类搜索关键词的类名。Further, select a column name from the column name of the column where the landing page corresponding to the search keyword contained in each category of search keywords is located, and the category name as the category name of the search keyword includes: statistics of each category In the search keywords, the number of occurrences of the column name of the column where the landing page corresponding to the search keyword is located; and for each type of search keyword, the column name with the largest number of occurrences after statistics is selected as the category name of this type of search keyword.

进一步地,在对目标网站的搜索关键词进行聚类,得到多类搜索关键词之前,所述方法还包括:获取所述目标网站的历史访问数据;对所述历史访问数据进行解析,得到所述目标网站的搜索关键词及其对应的着陆页面。Further, before the search keywords of the target website are clustered to obtain multiple types of search keywords, the method further includes: acquiring historical access data of the target website; analyzing the historical access data to obtain the Describe the search keywords of the target website and their corresponding landing pages.

进一步地,在对所述历史访问数据进行解析,得到所述目标网站的搜索关键词及其对应的着陆页面之后,所述方法还包括:建立所述搜索关键词与所述着陆页面的对应关系;其中,查询所述搜索关键词在所述目标网站进行站内搜索时所着陆的着陆页面包括:以所述搜索关键词为索引,利用所述对应关系查询所述搜索关键词对应的着陆页面。Further, after analyzing the historical access data to obtain the search keywords of the target website and their corresponding landing pages, the method further includes: establishing a corresponding relationship between the search keywords and the landing pages wherein, querying the landing page that the search keyword landed on when the target website performs in-site search includes: using the search keyword as an index, and using the corresponding relationship to query the landing page corresponding to the search keyword.

进一步地,对目标网站的搜索关键词进行聚类,得到多类搜索关键词包括:用K-means聚类算法对所述目标网站的搜索关键词进行聚类,得到所述多类搜索关键词。Further, clustering the search keywords of the target website to obtain multiple types of search keywords includes: using a K-means clustering algorithm to cluster the search keywords of the target website to obtain the multiple types of search keywords. .

根据本申请实施例的另一方面,还提供了一种搜索关键词的类名选取装置,包括:聚类单元,用于对目标网站的搜索关键词进行聚类,得到多类搜索关键词,其中,所述搜索关键词为对所述目标网站进行站内搜索时所采用的关键词,所述目标网站通过栏目划分站内页面;查询单元,用于查询所述搜索关键词在所述目标网站进行站内搜索时所着陆的着陆页面,确定所述搜索关键词对应的着陆页面所在栏目的栏目名称;选择单元,用于对于所述多类搜索关键词中每一类搜索关键词,从所述每一类搜索关键词所包含的搜索关键词对应的着陆页面所在栏目的栏目名称中选择一个栏目名称,作为该类搜索关键词的类名。According to another aspect of the embodiments of the present application, a device for selecting class names for search keywords is also provided, including: a clustering unit configured to cluster search keywords of a target website to obtain multiple types of search keywords, Wherein, the search keyword is a keyword used when performing an in-site search on the target website, and the target website is divided into pages in the website by columns; a query unit is used to query the search keyword in the target website. The landing page that is landed in the site search, determine the column name of the column where the landing page corresponding to the search keyword is located; the selection unit is used for each type of search keyword in the multi-type search keywords, from the each type of search keyword. A column name is selected from the column names of the column on which the landing page corresponding to the search keywords contained in a class of search keywords is located, as the class name of the class of search keywords.

进一步地,所述选择单元包括:统计模块,用于统计所述每一类搜索关键词中搜索关键词对应的着陆页面所在栏目的栏目名称的出现次数;以及选择模块,用于对于所述每一类搜索关键词,选择统计后出现次数最多的栏目名称作为该类搜索关键词的类名。Further, the selection unit includes: a statistics module for counting the number of occurrences of the column name of the column where the landing page corresponding to the search keyword is located in each type of search keywords; and a selection module for each type of search keyword. For a type of search keyword, select the column name with the most occurrences after statistics as the category name of this type of search keyword.

进一步地,所述装置还包括:获取单元,用于在对目标网站的搜索关键词进行聚类,得到多类搜索关键词之前,获取所述目标网站的历史访问数据;解析单元,用于对所述历史访问数据进行解析,得到所述目标网站的搜索关键词及其对应的着陆页面。Further, the device further includes: an obtaining unit, used for obtaining historical access data of the target website before clustering the search keywords of the target website to obtain multiple types of search keywords; The historical access data is analyzed to obtain the search keywords of the target website and their corresponding landing pages.

进一步地,所述装置还包括:建立单元,用于在对所述历史访问数据进行解析,得到所述目标网站的搜索关键词及其对应的着陆页面之后,建立所述搜索关键词与所述着陆页面的对应关系;其中,所述查询单元具体用于以所述搜索关键词为索引,利用所述对应关系查询所述搜索关键词对应的着陆页面。Further, the device further includes: an establishment unit, configured to establish the search keyword and the The corresponding relationship of landing pages; wherein, the query unit is specifically configured to use the search keyword as an index, and use the corresponding relationship to query the landing page corresponding to the search keyword.

进一步地,所述聚类单元具体用于用K-means聚类算法对所述目标网站的搜索关键词进行聚类,得到所述多类搜索关键词。Further, the clustering unit is specifically configured to use the K-means clustering algorithm to cluster the search keywords of the target website to obtain the multi-category search keywords.

根据本申请实施例,通过对目标网站的搜索关键词进行聚类,得到多类搜索关键词,其中,搜索关键词为对目标网站进行站内搜索时所采用的关键词,目标网站通过栏目划分站内页面,查询搜索关键词在目标网站进行站内搜索时所着陆的着陆页面,确定搜索关键词对应的着陆页面所在栏目的栏目名称,对于多类搜索关键词中每一类搜索关键词,从每一类搜索关键词所包含的搜索关键词对应的着陆页面所在栏目的栏目名称中选择一个栏目名称,作为该类搜索关键词的类名,解决现有的选取方式选出的类名不能反映所在类的特点的技术问题,达到了选择的类名能够反映搜索关键词所在类的特点的效果。According to the embodiment of the present application, by clustering the search keywords of the target website, multiple types of search keywords are obtained, wherein the search keywords are the keywords used in the in-site search of the target website, and the target website is divided into the website by columns. page, query the landing page where the search keywords land on the target website for on-site search, and determine the column name of the column where the landing page corresponding to the search keywords is located. Select a column name from the column name of the landing page corresponding to the search keyword contained in the search keyword, as the class name of the search keyword, to solve the problem that the class name selected by the existing selection method cannot reflect the category in which it belongs. The technical problem of the characteristics of the selected category name can reflect the characteristics of the category in which the search keyword is located.

附图说明Description of drawings

此处所说明的附图用来提供对本申请的进一步理解,构成本申请的一部分,本申请的示意性实施例及其说明用于解释本申请,并不构成对本申请的不当限定。在附图中:The drawings described herein are used to provide further understanding of the present application and constitute a part of the present application. The schematic embodiments and descriptions of the present application are used to explain the present application and do not constitute an improper limitation of the present application. In the attached image:

图1是根据本申请实施例的搜索关键词的类名选取方法的流程图;1 is a flowchart of a method for selecting a class name for a search keyword according to an embodiment of the present application;

图2是根据本申请实施例的搜索关键词的类名选取装置的示意图。FIG. 2 is a schematic diagram of an apparatus for selecting a class name for a search keyword according to an embodiment of the present application.

具体实施方式Detailed ways

为了使本技术领域的人员更好地理解本申请方案,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分的实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都应当属于本申请保护的范围。In order to make those skilled in the art better understand the solutions of the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are only The embodiments are part of the present application, but not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the scope of protection of the present application.

需要说明的是,本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本申请的实施例能够以除了在这里图示或描述的那些以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。It should be noted that the terms "first", "second", etc. in the description and claims of the present application and the above drawings are used to distinguish similar objects, and are not necessarily used to describe a specific sequence or sequence. It is to be understood that data so used may be interchanged under appropriate circumstances so that the embodiments of the application described herein can be practiced in sequences other than those illustrated or described herein. Furthermore, the terms "comprising" and "having" and any variations thereof, are intended to cover non-exclusive inclusion, for example, a process, method, system, product or device comprising a series of steps or units is not necessarily limited to those expressly listed Rather, those steps or units may include other steps or units not expressly listed or inherent to these processes, methods, products or devices.

根据本申请实施例,提供了一种搜索关键词的类名选取方法的方法实施例,需要说明的是,在附图的流程图示出的步骤可以在诸如一组计算机可执行指令的计算机系统中执行,并且,虽然在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤。According to an embodiment of the present application, a method embodiment of a method for selecting a class name for a search keyword is provided. It should be noted that the steps shown in the flowchart of the accompanying drawings can be implemented in a computer system such as a set of computer-executable instructions. and, although a logical order is shown in the flowcharts, in some cases the steps shown or described may be performed in an order different from that herein.

图1是根据本申请实施例的搜索关键词的类名选取方法的流程图,如图1所示,该方法包括如下步骤:1 is a flowchart of a method for selecting a class name for a search keyword according to an embodiment of the present application. As shown in FIG. 1 , the method includes the following steps:

步骤S102,对目标网站的搜索关键词进行聚类,得到多类搜索关键词,其中,搜索关键词为对目标网站进行站内搜索时所采用的关键词,目标网站通过栏目划分站内页面。Step S102: Clustering the search keywords of the target website to obtain multiple types of search keywords, wherein the search keywords are keywords used in the in-site search of the target website, and the target website is divided into in-site pages by columns.

本申请实施例中,目标网站可以是任意网站,该网站的站内网页的页面按照栏目划分,其中,每个栏目具有相应的栏目名称,例如,新闻网站中将各类新闻划分为“科技”、“财经”、“体育”等栏目。记录用户在目标网站内进行站内搜索时所用的搜索关键词,对这些搜索关键词进行聚类,得到多类搜索关键词,例如记录的某网站的搜索关键词包括:房地产、房屋、地产、商品房、物业、购置税、雪灾、雾霾、强降雨、高温气候,对这些搜索关键词进行聚类,得到第一类搜索关键词【房地产、房屋、地产、商品房、物业、购置税】和第二类搜索关键词【雪灾、雾霾、强降雨、高温气候】。In the embodiment of the present application, the target website may be any website, and the pages of the web pages of the website are divided into columns, wherein each column has a corresponding column name. "Finance", "Sports" and other columns. Record the search keywords that users use when searching on the target website, and cluster these search keywords to obtain multiple types of search keywords. For example, the recorded search keywords of a website include: real estate, housing, real estate, commercial housing , property, purchase tax, snow disaster, smog, heavy rainfall, high temperature climate, cluster these search keywords to obtain the first type of search keywords [real estate, housing, real estate, commercial housing, property, purchase tax] and the second Category search keywords [snow disaster, smog, heavy rainfall, high temperature climate].

步骤S104,查询搜索关键词在目标网站进行站内搜索时所着陆的着陆页面,确定搜索关键词对应的着陆页面所在栏目的栏目名称。Step S104 , query the landing page on which the search keyword lands when the target website performs an in-site search, and determine the column name of the column where the landing page corresponding to the search keyword is located.

用户在目标网站进行站内搜索时,通常会点击进入相关的页面,即用户最终所着陆的着陆页面。例如,用户在目标网站内利用搜索关键词“雾霾”搜索相关信息,并进入到一篇关于北京近一年的雾霾情况介绍的页面中,那么该用户该次搜索关键词“雾霾”对应的着陆页面为该篇关于北京近一年的雾霾情况介绍的页面。When a user conducts an in-site search on the target website, they usually click to enter the relevant page, that is, the landing page where the user finally landed. For example, if a user uses the search keyword "smog" to search for relevant information on the target website, and enters a page about the introduction of Beijing's smog situation in the past year, then the user searches for the keyword "smog" this time. The corresponding landing page is the page about the smog situation in Beijing in the past year.

需要说明的是,这里的搜索关键词在目标网站进行站内搜索时所着陆的着陆页面可以是指该搜索关键词每次搜索时所着陆的页面,也即是,每个搜索关键词可以对应多个着陆页面,例如,当不同用户采用相同的搜索关键词进行站内搜索时,最终所着陆的着陆页面各不相同,则该搜索关键词对应多个着陆页面,并查询出这些着陆页面;如果不同用户的着陆页面包含有相同的页面时,也可以记录成不同的对应关系,也即是,以搜索关键词使用的次数为单位,确定每次站内搜索时,搜索关键词对应的着陆页面。It should be noted that the landing page on which the search keyword lands when the target website conducts an in-site search may refer to the page that the search keyword lands on each time it is searched, that is, each search keyword may correspond to multiple For example, when different users use the same search keywords to search on the site, the landing pages that end up landing are different, then the search keyword corresponds to multiple landing pages, and these landing pages are queried; if different When the user's landing page contains the same page, it can also be recorded as different correspondences, that is, the number of times the search keyword is used is used as the unit to determine the landing page corresponding to the search keyword for each search on the site.

本实施例中,在查询出每个搜索关键词在每次站内搜索时对应的着陆页面之后,确定出每个着陆页面所在栏目的栏目名称。例如上述关于“雾霾”的搜索结果中,其着陆页面为该篇关于北京近一年的雾霾情况介绍的页面,该页面属于栏目名称为“天气”的栏目,确定出着陆页面所在栏目的栏目名称为“天气”。In this embodiment, after the landing page corresponding to each search keyword is searched in each site, the column name of the column where each landing page is located is determined. For example, in the above search results about "smog", the landing page is the page about the smog situation in Beijing in the past year. This page belongs to the column named "weather". Determine the column where the landing page is located. The column name is "Weather".

步骤S106,对于多类搜索关键词中每一类搜索关键词,从每一类搜索关键词所包含的搜索关键词对应的着陆页面所在栏目的栏目名称中选择一个栏目名称,作为该类搜索关键词的类名。Step S106, for each type of search keyword in the multi-type search keywords, select a column name from the column names of the column where the landing page corresponding to the search keyword contained in each type of search keyword is located as the search key of this type. The class name of the word.

由于已经对搜索关键词进行聚类,得到多个搜索关键词。每一类搜索关键词中每个搜索关键词都有其对应的着陆页面,以及着陆页面所在栏目的栏目名称,因此,每一类搜索关键词对应有多个着陆页面以及着陆页面所在栏目的栏目名称。由于网站中通常都具备较好的栏目结构性,每个栏目都有其栏目名称,用户通过站内搜索关键词想得到的信息就是这个栏目页面的信息,而网站的栏目名称则是对信息的一种分类总结,能够很好地总结归纳这类页面的内容,因此,从栏目名中选择作为搜索关键词的类名能够很好的反映搜索关键词所在类的特点。例如,上述聚类得到的一类搜索关键词【雪灾、雾霾、强降雨、高温气候】中,确定出该类搜索关键词进行站内搜索时,着陆页面所在栏目的栏目名称包括:“天气”、“自然灾害”等,因此,可以从这些栏目名中选择“天气”作为该类搜索关键词的类名,按照此方式选择出每一类搜索关键词的类名。Since the search keywords have been clustered, multiple search keywords are obtained. Each search keyword in each type of search keyword has its corresponding landing page and the column name of the column where the landing page is located. Therefore, each type of search keyword corresponds to multiple landing pages and the column of the column where the landing page is located. name. Because websites usually have good column structure, each column has its own column name. The information that users want to get by searching keywords in the site is the information of this column page, and the column name of the website is a kind of information about the information. Classification and summary can well summarize and summarize the content of such pages. Therefore, selecting the class name as the search keyword from the column name can well reflect the characteristics of the category in which the search keyword is located. For example, in a class of search keywords [snow disaster, smog, heavy rainfall, high temperature climate] obtained from the above clustering, when this type of search keyword is determined for on-site search, the column name of the column where the landing page is located includes: "weather" , "natural disaster", etc., therefore, "weather" can be selected from these column names as the class name of this type of search keyword, and the class name of each type of search keyword can be selected in this way.

根据本申请实施例,通过对目标网站的搜索关键词进行聚类,得到多类搜索关键词,其中,搜索关键词为对目标网站进行站内搜索时所采用的关键词,目标网站通过栏目划分站内页面,查询搜索关键词在目标网站进行站内搜索时所着陆的着陆页面,确定搜索关键词对应的着陆页面所在栏目的栏目名称,对于多类搜索关键词中每一类搜索关键词,从每一类搜索关键词所包含的搜索关键词对应的着陆页面所在栏目的栏目名称中选择一个栏目名称,作为该类搜索关键词的类名,解决现有的选取方式选出的类名不能反映所在类的特点的技术问题,达到了选择的类名能够反映搜索关键词所在类的特点的效果。According to the embodiment of the present application, by clustering the search keywords of the target website, multiple types of search keywords are obtained, wherein the search keywords are the keywords used in the in-site search of the target website, and the target website is divided into the website by columns. page, query the landing page where the search keywords land on the target website for on-site search, and determine the column name of the column where the landing page corresponding to the search keywords is located. Select a column name from the column name of the landing page corresponding to the search keyword contained in the search keyword, as the class name of the search keyword, to solve the problem that the class name selected by the existing selection method cannot reflect the category in which it belongs. The technical problem of the characteristics of the selected category name can reflect the characteristics of the category in which the search keyword is located.

本申请实施例中可以按照预先设定的规则从每类搜索关键词对应的着陆页面所在栏目的栏目名称中选择该类搜索关键词的类名,也可以直接将出现次数最多的栏目名称作为该类搜索关键词的类名。In the embodiment of the present application, the class name of each type of search keyword may be selected from the column names of the column where the landing page corresponding to each type of search keyword is located according to preset rules, or the column name with the most occurrences may be directly used as the column name. Class The class name of the search keyword.

优选地,从每一类搜索关键词所包含的搜索关键词对应的着陆页面所在栏目的栏目名称中选择一个栏目名称,作为该类搜索关键词的类名包括:统计每一类搜索关键词中搜索关键词对应的着陆页面所在栏目的栏目名称的出现次数;以及对于每一类搜索关键词,选择出现次数最多的栏目名称作为该类搜索关键词的类名。Preferably, a column name is selected from the column names of the column where the landing page corresponding to the search keywords contained in each type of search keywords is located. The number of occurrences of the column name of the column where the landing page corresponding to the search keyword is located; and for each type of search keyword, the column name with the most occurrences is selected as the class name of the type of search keyword.

本实施例中,栏目名称出现的次数是指用户利用搜索关键词进行站内搜索的着陆页面在该栏目名称的次数,由于栏目名称出现次数越多,表明该类搜索关键词着陆到该栏目的次数越多,因此,该栏目的栏目名称则能够更好的反映出该类搜索关键词的特点。In this embodiment, the number of occurrences of a column name refers to the number of times the landing page of the user using the search keyword to search the site is in the column name, because the more the column name appears, the more times the search keyword has landed on the column Therefore, the column name of the column can better reflect the characteristics of this type of search keywords.

以上述聚类得到的一类搜索关键词【雪灾、雾霾、强降雨、高温气候】为例,其中,使用搜索关键词“雪灾”、“雾霾”、“强降雨”、“高温气候”中每个词依次搜索的次数为8、4、3、5,其中,“雪灾”中栏目名称“自然灾害”出现了3次,“天气”出现了5次,其他搜索关键词中均为“天气”,因此,统计得到“自然灾害”共出现了3次,“天气”共出现了27次,因此,将“天气”作为该类搜索关键词的类名。Take a class of search keywords [snow disaster, haze, heavy rainfall, high temperature climate] obtained from the above clustering as an example, in which the search keywords "snow disaster", "smog", "heavy rainfall", "high temperature climate" are used as an example. The number of searches for each word in the list is 8, 4, 3, and 5. Among them, the column name "Natural Disaster" in "Snow Disaster" appears 3 times, "Weather" appears 5 times, and all other search keywords are "" "Weather", therefore, the statistics show that "natural disaster" appears 3 times, and "weather" appears 27 times. Therefore, "weather" is used as the class name of this type of search keyword.

优选地,在对目标网站的搜索关键词进行聚类,得到多类搜索关键词之前,方法还包括:获取目标网站的历史访问数据;对历史访问数据进行解析,得到目标网站的搜索关键词及其对应的着陆页面。Preferably, before the search keywords of the target website are clustered to obtain multiple types of search keywords, the method further includes: acquiring historical access data of the target website; analyzing the historical access data to obtain the search keywords of the target website and Its corresponding landing page.

本实施例中,用户在目标网站进行站内搜索所使用的关键词以及其访问行为均记录在目标网站的访问数据中。在进行搜索关键词的类名选取的过程中,先获取目标网站的历史访问数据,并从中解析出在用户使用的搜索关键词以及每次搜索所着陆的着陆页面,以便于后续对搜索关键词的聚类以及栏目名称出现次数的统计。In this embodiment, the keywords used by the user for in-site search on the target website and the access behavior thereof are recorded in the access data of the target website. In the process of selecting the class name of the search keyword, first obtain the historical access data of the target website, and parse out the search keywords used by the user and the landing page for each search, so as to facilitate the subsequent analysis of the search keywords. The clustering and the statistics of the number of occurrences of the column name.

进一步地,在对历史访问数据进行解析,得到目标网站的搜索关键词及其对应的着陆页面之后,方法还包括:建立搜索关键词与着陆页面的对应关系;其中,查询搜索关键词在目标网站进行站内搜索时所着陆的着陆页面包括:以搜索关键词为索引,利用对应关系查询搜索关键词对应的着陆页面。Further, after analyzing the historical access data to obtain the search keywords of the target website and their corresponding landing pages, the method further includes: establishing a corresponding relationship between the search keywords and the landing pages; wherein, the query search keywords are in the target website. The landing page that is landed when performing the in-site search includes: using the search keyword as an index, and using the corresponding relationship to query the landing page corresponding to the search keyword.

本实施例中,在解析出用户所使用的搜索关键词及其着陆页面之后,将每次使用搜索关键词进行站内搜索所着陆的着陆页面关联起来,建立对应关系,这样,对搜索关键词进行聚类之后,可以利用搜索关键词查询到其相应的着陆页面。In this embodiment, after the search keywords used by the user and their landing pages are parsed, the landing pages landed on each time the search keywords are used for in-site search are associated to establish a corresponding relationship. After clustering, the corresponding landing pages can be queried using search keywords.

优选地,对目标网站的搜索关键词进行聚类,得到多类搜索关键词包括:用K-means聚类算法对目标网站的搜索关键词进行聚类,得到多类搜索关键词。Preferably, clustering the search keywords of the target website to obtain multiple types of search keywords includes: using a K-means clustering algorithm to cluster the search keywords of the target website to obtain multiple types of search keywords.

本申请实施例中,优选采用K-means聚类算法对搜索关键词进行聚类,以得到多类搜索关键词。In the embodiment of the present application, the K-means clustering algorithm is preferably used to cluster the search keywords, so as to obtain multiple types of search keywords.

综上,本申请实施例,通过将聚类的搜索关键词和栏目名称联系起来,使用用户通过搜索关键词搜索得到的想要的栏目名称作为聚类关键词类名,很好地反映了聚类结果的特征。To sum up, in the embodiment of the present application, by linking the search keywords of the clustering with the column names, and using the desired column name obtained by the user through the search keyword search as the clustering keyword class name, it reflects the clustering well. characteristics of the result.

本申请实施例还提供了一种搜索关键词的类名选取装置,该装置可以用于执行本申请实施例的搜索关键词的类名选取方法,如图2所示,该装置包括:聚类单元10、查询单元20和选择单元30。An embodiment of the present application further provides a device for selecting a class name for a search keyword, and the device can be used to execute the method for selecting a class name for a search keyword according to the embodiment of the present application. As shown in FIG. 2 , the device includes: clustering unit 10 , query unit 20 and selection unit 30 .

聚类单元10用于对目标网站的搜索关键词进行聚类,得到多类搜索关键词,其中,搜索关键词为对目标网站进行站内搜索时所采用的关键词,目标网站通过栏目划分站内页面。The clustering unit 10 is used for clustering the search keywords of the target website to obtain multiple types of search keywords, wherein the search keywords are the keywords used in the in-site search of the target website, and the target website is divided into in-site pages by columns .

本申请实施例中,目标网站可以是任意网站,该网站的站内网页的页面按照栏目划分,其中,每个栏目具有相应的栏目名称,例如,新闻网站中将各类新闻划分为“科技”、“财经”、“体育”等栏目。记录用户在目标网站内进行站内搜索时所用的搜索关键词,对这些搜索关键词进行聚类,得到多类搜索关键词,例如记录的某网站的搜索关键词包括:房地产、房屋、地产、商品房、物业、购置税、雪灾、雾霾、强降雨、高温气候,对这些搜索关键词进行聚类,得到第一类搜索关键词【房地产、房屋、地产、商品房、物业、购置税】和第二类搜索关键词【雪灾、雾霾、强降雨、高温气候】。In the embodiment of the present application, the target website may be any website, and the pages of the web pages of the website are divided into columns, wherein each column has a corresponding column name. "Finance", "Sports" and other columns. Record the search keywords that users use when searching on the target website, and cluster these search keywords to obtain multiple types of search keywords. For example, the recorded search keywords of a website include: real estate, housing, real estate, commercial housing , property, purchase tax, snow disaster, smog, heavy rainfall, high temperature climate, cluster these search keywords to obtain the first type of search keywords [real estate, housing, real estate, commercial housing, property, purchase tax] and the second Category search keywords [snow disaster, smog, heavy rainfall, high temperature climate].

查询单元20用于查询搜索关键词在目标网站进行站内搜索时所着陆的着陆页面,确定搜索关键词对应的着陆页面所在栏目的栏目名称。The query unit 20 is configured to query the landing page on which the search keyword lands when the target website is searched within the site, and determine the column name of the column on which the landing page corresponding to the search keyword is located.

用户在目标网站进行站内搜索时,通常会点击进入相关的页面,即用户最终所着陆的着陆页面。例如,用户在目标网站内利用搜索关键词“雾霾”搜索相关信息,并进入到一篇关于北京近一年的雾霾情况介绍的页面中,那么该用户该次搜索关键词“雾霾”对应的着陆页面为该篇关于北京近一年的雾霾情况介绍的页面。When a user conducts an in-site search on the target website, they usually click to enter the relevant page, that is, the landing page where the user finally landed. For example, if a user uses the search keyword "smog" to search for relevant information on the target website, and enters a page about the introduction of Beijing's smog situation in the past year, then the user searches for the keyword "smog" this time. The corresponding landing page is the page about the smog situation in Beijing in the past year.

需要说明的是,这里的搜索关键词在目标网站进行站内搜索时所着陆的着陆页面可以是指该搜索关键词每次搜索时所着陆的页面,也即是,每个搜索关键词可以对应多个着陆页面,例如,当不同用户采用相同的搜索关键词进行站内搜索时,最终所着陆的着陆页面各不相同,则该搜索关键词对应多个着陆页面,并查询出这些着陆页面;如果不同用户的着陆页面包含有相同的页面时,也可以记录成不同的对应关系,也即是,以搜索关键词使用的次数为单位,确定每次站内搜索时,搜索关键词对应的着陆页面。It should be noted that the landing page on which the search keyword lands when the target website conducts an in-site search may refer to the page that the search keyword lands on each time it is searched, that is, each search keyword may correspond to multiple For example, when different users use the same search keywords to search on the site, the landing pages that end up landing are different, then the search keyword corresponds to multiple landing pages, and these landing pages are queried; if different When the user's landing page contains the same page, it can also be recorded as different correspondences, that is, the number of times the search keyword is used is used as the unit to determine the landing page corresponding to the search keyword for each search on the site.

本实施例中,在查询出每个搜索关键词在每次站内搜索时对应的着陆页面之后,确定出每个着陆页面所在栏目的栏目名称。例如上述关于“雾霾”的搜索结果中,其着陆页面为该篇关于北京近一年的雾霾情况介绍的页面,该页面属于栏目名为“天气”的栏目,确定出着陆页面所在栏目的栏目名称为“天气”。In this embodiment, after the landing page corresponding to each search keyword is searched in each site, the column name of the column where each landing page is located is determined. For example, in the above search results about "smog", the landing page is the page about the smog situation in Beijing in the past year, and this page belongs to the column named "weather". The column name is "Weather".

选择单元30用于对于多类搜索关键词中每一类搜索关键词,从每一类搜索关键词所包含的搜索关键词对应的着陆页面所在栏目的栏目名称中选择一个栏目名称,作为该类搜索关键词的类名。The selection unit 30 is configured to, for each type of search keyword in the multi-type search keywords, select a column name from the column names of the column where the landing page corresponding to the search keyword contained in each type of search keyword is located, as the type of search keyword. The class name of the search keyword.

由于已经对搜索关键词进行聚类,得到多个搜索关键词。每一类搜索关键词中每个搜索关键词都有其对应的着陆页面,以及着陆页面所在栏目的栏目名称,因此,每一类搜索关键词对应有多个着陆页面以及着陆页面所在栏目的栏目名称。由于网站中通常都具备较好的栏目结构性,每个栏目都有其栏目名称,用户通过站内搜索关键词想得到的信息就是这个栏目页面的信息,而网站的栏目名称则是对信息的一种分类总结,能够很好地总结归纳这类页面的内容,因此,从栏目名中选择作为搜索关键词的类名能够很好的反映搜索关键词所在类的特点。例如,上述聚类得到的一类搜索关键词【雪灾、雾霾、强降雨、高温气候】中,确定出该类搜索关键词进行站内搜索时,着陆页面所在栏目的栏目名称包括:“天气”、“自然灾害”等,因此,可以从这些栏目名中选择“天气”作为该类搜索关键词的类名,按照此方式选择出每一类搜索关键词的类名。Since the search keywords have been clustered, multiple search keywords are obtained. Each search keyword in each type of search keyword has its corresponding landing page and the column name of the column where the landing page is located. Therefore, each type of search keyword corresponds to multiple landing pages and the column of the column where the landing page is located. name. Because websites usually have good column structure, each column has its own column name. The information that users want to get by searching keywords in the site is the information of this column page, and the column name of the website is a kind of information about the information. Classification and summary can well summarize and summarize the content of such pages. Therefore, selecting the class name as the search keyword from the column name can well reflect the characteristics of the category in which the search keyword is located. For example, in a class of search keywords [snow disaster, smog, heavy rainfall, high temperature climate] obtained from the above clustering, when this type of search keyword is determined for on-site search, the column name of the column where the landing page is located includes: "weather" , "natural disaster", etc., therefore, "weather" can be selected from these column names as the class name of this type of search keyword, and the class name of each type of search keyword can be selected in this way.

根据本申请实施例,通过对目标网站的搜索关键词进行聚类,得到多类搜索关键词,其中,搜索关键词为对目标网站进行站内搜索时所采用的关键词,目标网站通过栏目划分站内页面,查询搜索关键词在目标网站进行站内搜索时所着陆的着陆页面,确定搜索关键词对应的着陆页面所在栏目的栏目名称,对于多类搜索关键词中每一类搜索关键词,从每一类搜索关键词所包含的搜索关键词对应的着陆页面所在栏目的栏目名称中选择一个栏目名称,作为该类搜索关键词的类名,解决现有的选取方式选出的类名不能反映所在类的特点的技术问题,达到了选择的类名能够反映搜索关键词所在类的特点的效果。According to the embodiment of the present application, by clustering the search keywords of the target website, multiple types of search keywords are obtained, wherein the search keywords are the keywords used in the in-site search of the target website, and the target website is divided into the website by columns. page, query the landing page where the search keywords land on the target website for on-site search, and determine the column name of the column where the landing page corresponding to the search keywords is located. Select a column name from the column name of the landing page corresponding to the search keyword contained in the search keyword, as the class name of the search keyword, to solve the problem that the class name selected by the existing selection method cannot reflect the category in which it belongs. The technical problem of the characteristics of the selected category name can reflect the characteristics of the category in which the search keyword is located.

本申请实施例中可以按照预先设定的规则从每类搜索关键词对应的着陆页面所在栏目的栏目名称中选择该类搜索关键词的类名,也可以直接将出现次数最多的栏目名称作为该类搜索关键词的类名。In the embodiment of the present application, the class name of each type of search keyword may be selected from the column names of the column where the landing page corresponding to each type of search keyword is located according to preset rules, or the column name with the most occurrences may be directly used as the column name. Class The class name of the search keyword.

优选地,选择单元包括:统计模块,用于统计每一类搜索关键词中搜索关键词对应的着陆页面所在栏目的栏目名称的出现次数;以及选择模块,用于对于每一类搜索关键词,选择出现次数最多的栏目名称作为该类搜索关键词的类名。Preferably, the selection unit includes: a statistics module for counting the number of occurrences of the column name of the column where the landing page corresponding to the search keyword is located in each type of search keyword; and a selection module for each type of search keyword, Select the column name with the most occurrences as the category name of this type of search keyword.

本实施例中,栏目名称出现的次数是指用户利用搜索关键词进行站内搜索的着陆页面在该栏目名称的次数,由于栏目名称出现次数越多,表名该类搜索关键词着陆到该栏目的次数越多,因此,该栏目的栏目名称则能够更好的反映出该类搜索关键词的特点。In this embodiment, the number of occurrences of a column name refers to the number of times the landing page of the user using the search keyword to search on the site is in the column name. Since the more the column name appears, the more search keywords of the table name land on the column. Therefore, the column name of the column can better reflect the characteristics of this type of search keywords.

以上述聚类得到的一类搜索关键词【雪灾、雾霾、强降雨、高温气候】为例,其中,使用搜索关键词“雪灾”、“雾霾”、“强降雨”、“高温气候”中每个词依次搜索的次数为8、4、3、5,其中,“雪灾”中栏目名称“自然灾害”出现了3次,“天气”出现了5次,其他搜索关键词中均为“天气”,因此,统计得到“自然灾害”共出现了3次,“天气”共出现了27次,因此,将“天气”作为该类搜索关键词的类名。Take a class of search keywords [snow disaster, haze, heavy rainfall, high temperature climate] obtained from the above clustering as an example, in which the search keywords "snow disaster", "smog", "heavy rainfall", "high temperature climate" are used as an example. The number of searches for each word in the list is 8, 4, 3, and 5. Among them, the column name "Natural Disaster" in "Snow Disaster" appears 3 times, "Weather" appears 5 times, and all other search keywords are "" "Weather", therefore, the statistics show that "natural disaster" appears 3 times, and "weather" appears 27 times. Therefore, "weather" is used as the class name of this type of search keyword.

优选地,装置还包括:获取单元,用于在对目标网站的搜索关键词进行聚类,得到多类搜索关键词之前,获取目标网站的历史访问数据;解析单元,用于对历史访问数据进行解析,得到目标网站的搜索关键词及其对应的着陆页面。Preferably, the device further includes: an obtaining unit, used for obtaining historical access data of the target website before clustering the search keywords of the target website to obtain multiple types of search keywords; and an analysis unit, used for analyzing the historical access data Parse to obtain the search keywords of the target website and their corresponding landing pages.

本实施例中,用户在目标网站进行站内搜索所使用的关键词以及其访问行为均记录在目标网站的访问数据中。在进行搜索关键词的类名选取的过程中,先获取目标网站的历史访问数据,并从中解析出在用户使用的搜索关键词以及每次搜索所着陆的着陆页面,以便于后续对搜索关键词的聚类以及栏目名称出现次数的统计。In this embodiment, the keywords used by the user for in-site search on the target website and the access behavior thereof are recorded in the access data of the target website. In the process of selecting the class name of the search keyword, first obtain the historical access data of the target website, and parse out the search keywords used by the user and the landing page for each search, so as to facilitate the subsequent analysis of the search keywords. The clustering and the statistics of the number of occurrences of the column name.

进一步地,,装置还包括:建立单元,用于在对历史访问数据进行解析,得到目标网站的搜索关键词及其对应的着陆页面之后,建立搜索关键词与着陆页面的对应关系;其中,查询单元具体用于以搜索关键词为索引,利用对应关系查询搜索关键词对应的着陆页面。Further, the device further includes: an establishment unit, configured to establish a corresponding relationship between the search keywords and the landing pages after analyzing the historical access data to obtain the search keywords of the target website and their corresponding landing pages; wherein the query The unit is specifically used to use the search keyword as an index, and use the corresponding relationship to query the landing page corresponding to the search keyword.

本实施例中,在解析出用户所使用的搜索关键词及其着陆页面之后,将每次使用搜索关键词进行站内搜索所着陆的着陆页面关联起来,建立对应关系,这样,对搜索关键词进行聚类之后,可以利用搜索关键词查询到其相应的着陆页面。In this embodiment, after the search keywords used by the user and their landing pages are parsed, the landing pages landed on each time the search keywords are used for in-site search are associated to establish a corresponding relationship. After clustering, the corresponding landing pages can be queried using search keywords.

优选地,聚类单元具体用于用K-means聚类算法对目标网站的搜索关键词进行聚类,得到多类搜索关键词。Preferably, the clustering unit is specifically configured to use the K-means clustering algorithm to cluster the search keywords of the target website to obtain multiple types of search keywords.

综上,本申请实施例,通过将聚类的搜索关键词和栏目名称联系起来,使用用户通过搜索关键词搜索得到的想要的栏目名称作为聚类关键词类名,很好地反映了聚类结果的特征。To sum up, in the embodiment of the present application, by linking the search keywords of the clustering with the column names, and using the desired column name obtained by the user through the search keyword search as the clustering keyword class name, it reflects the clustering well. characteristics of the result.

所述搜索关键词的类名选取装置包括处理器和存储器,上述聚类单元10、查询单元20和选择单元30等均作为程序单元存储在存储器中,由处理器执行存储在存储器中的上述程序单元。The class name selection device for the search keyword includes a processor and a memory, the above-mentioned clustering unit 10, the query unit 20 and the selection unit 30 are all stored in the memory as program units, and the processor executes the above-mentioned program stored in the memory. unit.

处理器中包含内核,由内核去存储器中调取相应的程序单元。内核可以设置一个或以上,通过调整内核参数来选择每一类搜索关键词的类名。The processor includes a kernel, and the kernel calls the corresponding program unit from the memory. The kernel can be set to one or more, and the class name of each type of search keyword can be selected by adjusting the kernel parameters.

存储器可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM),存储器包括至少一个存储芯片。Memory may include non-persistent memory in computer readable media, random access memory (RAM) and/or non-volatile memory, such as read only memory (ROM) or flash memory (flash RAM), the memory including at least one memory chip.

本申请还提供了一种计算机程序产品的实施例,当在数据处理设备上执行时,适于执行初始化有如下方法步骤的程序代码:对目标网站的搜索关键词进行聚类,得到多类搜索关键词,其中,搜索关键词为对目标网站进行站内搜索时所采用的关键词,目标网站通过栏目划分站内页面,查询搜索关键词在目标网站进行站内搜索时所着陆的着陆页面,确定搜索关键词对应的着陆页面所在栏目的栏目名称,对于多类搜索关键词中每一类搜索关键词,从每一类搜索关键词所包含的搜索关键词对应的着陆页面所在栏目的栏目名称中选择一个栏目名称,作为该类搜索关键词的类名。The present application also provides an embodiment of a computer program product, which, when executed on a data processing device, is suitable for executing program codes initialized with the following method steps: clustering search keywords of a target website to obtain multi-category searches Keywords, among which, the search keywords are the keywords used in the in-site search of the target website. The target website divides the in-site pages by columns, and queries the landing pages where the search keywords land when the target website is in the in-site search to determine the search key. The column name of the column where the landing page corresponding to the word is located. For each type of search keyword in the multi-type search keywords, select one from the column names of the column where the landing page corresponding to the search keyword contained in each type of search keyword is located. Column name, as the class name of this type of search keyword.

上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。The above-mentioned serial numbers of the embodiments of the present application are only for description, and do not represent the advantages or disadvantages of the embodiments.

在本申请的上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。In the above-mentioned embodiments of the present application, the description of each embodiment has its own emphasis. For parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

在本申请所提供的几个实施例中,应该理解到,所揭露的技术内容,可通过其它的方式实现。其中,以上所描述的装置实施例仅仅是示意性的,例如所述单元的划分,可以为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,单元或模块的间接耦合或通信连接,可以是电性或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed technical content can be implemented in other ways. The device embodiments described above are only illustrative, for example, the division of the units may be a logical function division, and there may be other division methods in actual implementation, for example, multiple units or components may be combined or Integration into another system, or some features can be ignored, or not implemented. On the other hand, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of units or modules, and may be in electrical or other forms.

所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and components shown as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.

另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit. The above-mentioned integrated units may be implemented in the form of hardware, or may be implemented in the form of software functional units.

所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可为个人计算机、服务器或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。The integrated unit, if implemented in the form of a software functional unit and sold or used as an independent product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solutions of the present application can be embodied in the form of software products in essence, or the parts that contribute to the prior art, or all or part of the technical solutions, and the computer software products are stored in a storage medium , including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage medium includes: U disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or optical disk and other media that can store program codes .

以上所述仅是本申请的优选实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本申请原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也应视为本申请的保护范围。The above are only the preferred embodiments of the present application. It should be pointed out that for those skilled in the art, without departing from the principles of the present application, several improvements and modifications can also be made. It should be regarded as the protection scope of this application.

Claims (8)

1.一种搜索关键词的类名选取方法,其特征在于,包括:1. a kind of class name selection method of search keyword, is characterized in that, comprises: 对目标网站的搜索关键词进行聚类,得到多类搜索关键词,其中,所述搜索关键词为对所述目标网站进行站内搜索时所采用的关键词,所述目标网站通过栏目划分站内页面;The search keywords of the target website are clustered to obtain multiple types of search keywords, wherein the search keywords are keywords used when the target website is searched on the site, and the target website is divided into pages by columns. ; 查询所述搜索关键词在所述目标网站进行站内搜索时所着陆的着陆页面,确定所述搜索关键词对应的着陆页面所在栏目的栏目名称;Query the landing page where the search keyword lands when the target website performs an in-site search, and determine the column name of the column where the landing page corresponding to the search keyword is located; 对于所述多类搜索关键词中每一类搜索关键词,统计所述每一类搜索关键词中搜索关键词对应的着陆页面所在栏目的栏目名称的出现次数;以及对于所述每一类搜索关键词,选择统计后出现次数最多的栏目名称作为该类搜索关键词的类名。For each type of search keyword in the multi-type search keywords, count the number of occurrences of the column name of the column where the landing page corresponding to the search keyword in each type of search keyword is located; and for each type of search keyword Keywords, select the column name with the most occurrences after statistics as the category name of this type of search keyword. 2.根据权利要求1所述的方法,其特征在于,在对目标网站的搜索关键词进行聚类,得到多类搜索关键词之前,所述方法还包括:2. The method according to claim 1, wherein before the search keywords of the target website are clustered to obtain multiple types of search keywords, the method further comprises: 获取所述目标网站的历史访问数据;Obtain historical access data of the target website; 对所述历史访问数据进行解析,得到所述目标网站的搜索关键词及其对应的着陆页面。The historical access data is analyzed to obtain the search keywords of the target website and their corresponding landing pages. 3.根据权利要求2所述的方法,其特征在于,在对所述历史访问数据进行解析,得到所述目标网站的搜索关键词及其对应的着陆页面之后,所述方法还包括:3. The method according to claim 2, characterized in that, after parsing the historical access data to obtain the search keywords of the target website and their corresponding landing pages, the method further comprises: 建立所述搜索关键词与所述着陆页面的对应关系;establishing a corresponding relationship between the search keyword and the landing page; 其中,查询所述搜索关键词在进行所述目标网站的站内搜索时所着陆的着陆页面包括:以所述搜索关键词为索引,利用所述对应关系查询所述搜索关键词对应的着陆页面。Wherein, querying the landing page on which the search keyword landed during the in-site search of the target website includes: using the search keyword as an index, and using the corresponding relationship to query the landing page corresponding to the search keyword. 4.根据权利要求1所述的方法,其特征在于,对目标网站的搜索关键词进行聚类,得到多类搜索关键词包括:4. The method according to claim 1, wherein the search keywords of the target website are clustered to obtain multiple types of search keywords comprising: 用K-means聚类算法对所述目标网站的搜索关键词进行聚类,得到所述多类搜索关键词。The K-means clustering algorithm is used to cluster the search keywords of the target website to obtain the multi-category search keywords. 5.一种搜索关键词的类名选取装置,其特征在于,包括:5. a class name selection device for searching keywords, is characterized in that, comprising: 聚类单元,用于对目标网站的搜索关键词进行聚类,得到多类搜索关键词,其中,所述搜索关键词为对所述目标网站进行站内搜索时所采用的关键词,所述目标网站通过栏目划分站内页面;The clustering unit is used for clustering the search keywords of the target website to obtain multiple types of search keywords, wherein the search keywords are keywords used in the in-site search of the target website, and the target website The website is divided into pages by columns; 查询单元,用于查询所述搜索关键词在所述目标网站进行站内搜索时所着陆的着陆页面,确定所述搜索关键词对应的着陆页面所在栏目的栏目名称;a query unit, configured to query the landing page on which the search keyword lands when the target website performs an in-site search, and determine the column name of the column where the landing page corresponding to the search keyword is located; 选择单元,用于对于所述多类搜索关键词中每一类搜索关键词,统计所述每一类搜索关键词中搜索关键词对应的着陆页面所在栏目的栏目名称的出现次数;以及对于所述每一类搜索关键词,选择统计后出现次数最多的栏目名称作为该类搜索关键词的类名。The selection unit is configured to, for each type of search keyword in the multi-type search keywords, count the number of occurrences of the column name of the column where the landing page corresponding to the search keyword in each type of search keyword is located; and for all the search keywords Describe each category of search keywords, and select the column name with the most occurrences after statistics as the category name of this category of search keywords. 6.根据权利要求5所述的装置,其特征在于,所述装置还包括:6. The apparatus according to claim 5, wherein the apparatus further comprises: 获取单元,用于在对目标网站的搜索关键词进行聚类,得到多类搜索关键词之前,获取所述目标网站的历史访问数据;an obtaining unit, used to obtain historical access data of the target website before clustering the search keywords of the target website to obtain multiple types of search keywords; 解析单元,用于对所述历史访问数据进行解析,得到所述目标网站的搜索关键词及其对应的着陆页面。The parsing unit is used for parsing the historical access data to obtain the search keywords of the target website and their corresponding landing pages. 7.根据权利要求6所述的装置,其特征在于,所述装置还包括:7. The apparatus of claim 6, wherein the apparatus further comprises: 建立单元,用于在对所述历史访问数据进行解析,得到所述目标网站的搜索关键词及其对应的着陆页面之后,建立所述搜索关键词与所述着陆页面的对应关系;a establishing unit, configured to establish a corresponding relationship between the search keywords and the landing pages after analyzing the historical access data to obtain the search keywords of the target website and their corresponding landing pages; 其中,所述查询单元具体用于以所述搜索关键词为索引,利用所述对应关系查询所述搜索关键词对应的着陆页面。The query unit is specifically configured to use the search keyword as an index, and use the corresponding relationship to query the landing page corresponding to the search keyword. 8.根据权利要求5所述的装置,其特征在于,所述聚类单元具体用于用K-means聚类算法对所述目标网站的搜索关键词进行聚类,得到所述多类搜索关键词。8 . The device according to claim 5 , wherein the clustering unit is specifically configured to use K-means clustering algorithm to cluster the search keywords of the target website to obtain the multi-category search keys. 9 . word.
CN201510850384.0A 2015-11-27 2015-11-27 Method and device for selecting class name of search keyword Expired - Fee Related CN106815228B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510850384.0A CN106815228B (en) 2015-11-27 2015-11-27 Method and device for selecting class name of search keyword

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510850384.0A CN106815228B (en) 2015-11-27 2015-11-27 Method and device for selecting class name of search keyword

Publications (2)

Publication Number Publication Date
CN106815228A CN106815228A (en) 2017-06-09
CN106815228B true CN106815228B (en) 2020-03-03

Family

ID=59155510

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510850384.0A Expired - Fee Related CN106815228B (en) 2015-11-27 2015-11-27 Method and device for selecting class name of search keyword

Country Status (1)

Country Link
CN (1) CN106815228B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107992568B (en) * 2017-11-29 2020-05-05 政和科技股份有限公司 Searching method, device and system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101216826A (en) * 2007-01-05 2008-07-09 鸿富锦精密工业(深圳)有限公司 Information search system and method
CN102456058A (en) * 2010-11-02 2012-05-16 阿里巴巴集团控股有限公司 Method and device for providing category information
CN103365902A (en) * 2012-03-31 2013-10-23 北大方正集团有限公司 Method and device for evaluating Internet News

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8468445B2 (en) * 2005-03-30 2013-06-18 The Trustees Of Columbia University In The City Of New York Systems and methods for content extraction

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101216826A (en) * 2007-01-05 2008-07-09 鸿富锦精密工业(深圳)有限公司 Information search system and method
CN102456058A (en) * 2010-11-02 2012-05-16 阿里巴巴集团控股有限公司 Method and device for providing category information
CN103365902A (en) * 2012-03-31 2013-10-23 北大方正集团有限公司 Method and device for evaluating Internet News

Also Published As

Publication number Publication date
CN106815228A (en) 2017-06-09

Similar Documents

Publication Publication Date Title
US7840538B2 (en) Discovering query intent from search queries and concept networks
TWI512506B (en) Sorting method and device for search results
US9448999B2 (en) Method and device to detect similar documents
CN101241512B (en) Search method for redefining enquiry word and device therefor
US9317550B2 (en) Query expansion
US9448992B2 (en) Natural language search results for intent queries
US20150074289A1 (en) Detecting error pages by analyzing server redirects
EP2812815B1 (en) Web page retrieval method and device
CN101329687B (en) Method for positioning news web page
CN103729362B (en) The determination method and apparatus of navigation content
US10621187B2 (en) Methods, systems, and media for providing a media search engine
CN104537065A (en) Search result pushing method and system
CN102622445A (en) User interest perception based webpage push system and webpage push method
CN106933893B (en) multi-dimensional data query method and device
CN108241692B (en) Data query method and device
US9977816B1 (en) Link-based ranking of objects that do not include explicitly defined links
JP2012533819A (en) Method and system for document indexing and data querying
EP2933734A1 (en) Method and system for the structural analysis of websites
CN106815228B (en) Method and device for selecting class name of search keyword
CN108268522A (en) Website column content shows method and device
CN106933909A (en) The querying method and device of multi-dimensional data
CN108268552B (en) Website information processing method and device
CN106933923B (en) Method and apparatus for screening sessions
CN106874310B (en) Website column name monitoring method and device
TWI647578B (en) Search engine based document indexing method, data query method and server

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 100083 No. 401, 4th Floor, Haitai Building, 229 North Fourth Ring Road, Haidian District, Beijing

Applicant after: BEIJING GRIDSUM TECHNOLOGY Co.,Ltd.

Address before: 100086 Cuigong Hotel, 76 Zhichun Road, Shuangyushu District, Haidian District, Beijing

Applicant before: BEIJING GRIDSUM TECHNOLOGY Co.,Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20200303

CF01 Termination of patent right due to non-payment of annual fee