CN114896480B

CN114896480B - Top-K space keyword query method based on road network index

Info

Publication number: CN114896480B
Application number: CN202210356274.9A
Authority: CN
Inventors: 曾志新; 唐洁
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2022-04-06
Filing date: 2022-04-06
Publication date: 2024-09-24
Anticipated expiration: 2042-04-06
Also published as: CN114896480A

Abstract

The invention discloses a Top-K space keyword query method based on road network indexes, which comprises the following steps: 1) Defining related concepts of the road network; 2) Weighting calculation is carried out by using the shortest transit time and the TF-IDF model, so that a space text scoring function capable of measuring the space adjacency of the object and the similarity of the text is obtained; 3) Constructing a road network index TKG-tree by using edge and vertex information contained in the road network, wherein the road network index comprises a traffic time matrix recorded with space index information and an inverted document recorded with text index information; 4) Using the space text score as an index of Top-K result sequencing, and using the constructed road network index TKG-tree to find a space keyword object of ranking Top-K; the road network index is constructed in a sub-graph dividing mode, the problem that the index storage cost is too high is solved, the space text score of the node is used as an upper bound pruning object, and the speed of searching the Top-K space keyword object can be improved.

Description

Top-K spatial keyword query method based on road network index

技术领域Technical Field

本发明涉及路网索引和Top-K查询的技术领域，尤其是指一种基于路网索引的Top-K空间关键字查询方法。The present invention relates to the technical field of road network index and Top-K query, and in particular to a Top-K spatial keyword query method based on road network index.

背景技术Background Art

Top-k空间关键字查询问题作为空间数据库的一大研究热点，其目的是查询出离用户位置距离较近并且满足用户查询偏好的若干个对象，而其查询意图则是由关键词作为表达。Top-k空间关键字查询问题不仅考虑了对象与关键字的匹配程度，还考虑了用户与查询对象之间的空间距离，根据空间距离的衡量方式上的区别，可以分为欧式空间和路网空间两种不同的距离。欧式空间表示的是两点距离用直线距离来度量，而路网空间两点距离是用道路网上的最短路程来度量。The Top-k spatial keyword query problem is a hot topic in spatial database research. Its purpose is to query several objects that are close to the user's location and meet the user's query preferences, and its query intent is expressed by keywords. The Top-k spatial keyword query problem not only considers the matching degree between the object and the keyword, but also considers the spatial distance between the user and the query object. According to the difference in the measurement method of spatial distance, it can be divided into two different distances: Euclidean space and road network space. Euclidean space represents that the distance between two points is measured by straight-line distance, while the distance between two points in road network space is measured by the shortest distance on the road network.

不过，支持路网上高效的Top-K查询是非常困难的。其主要的瓶颈在于，计算点到点最短路的代价特别高，不像计算两点欧式直线距离那么简单。此外，即使能很快得到道路最短道路距离，如果没有高效的剪枝算法和策略，要计算出Top-K结果也是非常消耗时间的。虽然现在有大量的工作研究路网上的Top-K查询处理问题。然而，现有的方法都没办法支持大规模的道路网络。这些方法的主要问题大多是索引会导致过高的存储代价，或着太高的预处理时间代价。However, it is very difficult to support efficient Top-K queries on road networks. The main bottleneck is that the cost of calculating the shortest path from point to point is extremely high, which is not as simple as calculating the Euclidean straight-line distance between two points. In addition, even if the shortest road distance can be obtained quickly, it is very time-consuming to calculate the Top-K results without efficient pruning algorithms and strategies. Although there is a lot of work studying the Top-K query processing problem on road networks. However, the existing methods cannot support large-scale road networks. The main problem with these methods is that the index will lead to too high storage cost or too high preprocessing time cost.

发明内容Summary of the invention

本发明的目的在于克服现有技术的缺点与不足，提出了一种基于路网索引的Top-K空间关键字查询方法，以划分子图的方式构建路网索引，缓解了索引存储代价太高的问题，将节点的空间文本得分为上界剪枝对象，能够提高查找Top-K空间关键字对象的速度。The purpose of the present invention is to overcome the shortcomings and deficiencies of the prior art, and proposes a Top-K spatial keyword query method based on road network index. The road network index is constructed by dividing the subgraph, which alleviates the problem of high index storage cost. The spatial text score of the node is divided into upper bound pruning objects, which can improve the speed of searching for Top-K spatial keyword objects.

为实现上述目的，本发明所提供的技术方案为：基于路网索引的Top-K空间关键字查询方法，包括以下步骤：To achieve the above object, the technical solution provided by the present invention is: a Top-K spatial keyword query method based on road network index, comprising the following steps:

1)对路网的相关概念进行定义，其中包括空间关键字对象o、路网图G和两个顶点间的最短通行时间函数τ^*(t)；1) Define the relevant concepts of the road network, including the spatial keyword object o, the road network graph G and the shortest travel time function τ ^* (t) between two vertices;

2)使用步骤1)得到的最短通行时间函数τ^*(t)衡量空间关键字对象o的空间邻近性，得到空间得分函数，记为f_s(o)；使用TF-IDF模型衡量空间关键字对象o的文本相似性，得到文本得分函数，记为f_d(o)；使用文本得分函数f_d(o)和空间得分函数f_s(o)加权计算得到空间文本得分函数，记为 2) Use the shortest travel time function τ ^* (t) obtained in step 1) to measure the spatial proximity of the spatial keyword object o, and obtain the spatial score function, denoted as f _s (o); use the TF-IDF model to measure the text similarity of the spatial keyword object o, and obtain the text score function, denoted as f _d (o); use the text score function f _d (o) and the spatial score function f _s (o) to perform weighted calculation to obtain the spatial text score function, denoted as

3)根据步骤1)定义的路网，使用其包含的边和顶点信息对路网索引TKG-tree进行构造；TKG-tree是一颗由图切割而来的平衡树结构，其根节点对应整个路网图G，并有着以下性质：根节点的每个后代节点n_i都对应一个子图G_i，把最下面一层节点称为叶子节点；TKG-tree有两个超参数f和μ，分别代表非叶子节点的分支数和叶子节点的顶点数量上限；每个节点n_i均包含了一个通行时间矩阵M_i和一个记录了对象关键字信息o.key的倒排文档D_i；3) Based on the road network defined in step 1), the road network index TKG-tree is constructed using the edge and vertex information it contains; TKG-tree is a balanced tree structure obtained by graph cutting, whose root node corresponds to the entire road network graph G, and has the following properties: each descendant node n _i of the root node corresponds to a subgraph G _i , and the bottom layer of nodes are called leaf nodes; TKG-tree has two hyperparameters f and μ, which represent the number of branches of non-leaf nodes and the upper limit of the number of vertices of leaf nodes respectively; each node n _i contains a travel time matrix M _i and an inverted document D _i that records the object keyword information o.key;

4)以步骤2)得到的空间文本得分函数作为Top-K结果排序的指标，从查询点v_q出发，利用步骤3)构建的路网索引TKG-tree找到排名Top-K的空间关键字对象o。4) Using the spatial text score function obtained in step 2) As the indicator for sorting the Top-K results, starting from the query point _vq , the road network index TKG-tree constructed in step 3) is used to find the spatial keyword object o ranked Top-K.

进一步，在步骤1)中，对以下概念进行了定义：Furthermore, in step 1), the following concepts are defined:

a、在路网的边中，用通行时间来表示边的长度，而边的通行时间会随着时刻t的不同而改变，因此记边的通行时间函数为ω(t)；在路网顶点中，其中的部分顶点携带了以关键字形式存在的文本信息，将拥有空间位置信息和文本关键字信息的顶点称为空间关键字对象o，表示为：a. In the edge of the road network, the travel time is used to represent the length of the edge, and the travel time of the edge will change with the time t, so the travel time function of the edge is recorded as ω(t); in the vertices of the road network, some of the vertices carry text information in the form of keywords. The vertex with spatial position information and text keyword information is called spatial keyword object o, which is expressed as:

o＝(v_o,o.key)o＝(v _o ,o.key)

式中，v_o表示对象所在的顶点位置，o.key表示对象关键字信息；In the formula, v _o represents the vertex position of the object, and o.key represents the object keyword information;

b、将路网建模为一个无向图，表示为：b. Model the road network as an undirected graph, expressed as:

式中，G为路网图，代表一个由若干条边交叉组成的无向图路网结构；v₁表示第1个顶点，v_n表示第n个顶点，V＝{v₁,v₂,...v_n}代表路网中的所有顶点的集合；e₁表示第1条边，e_n表示第n条边，E＝{e₁,e₂,...e_n}表示路网中边的集合；而ω₁(t)表示第1条边的通行时间函数，ω_n(t)表示第n条边的通行时间函数，W＝{ω₁(t),ω₂(t),...ω_n(t)}则表示与对应边相关联的通行时间函数集合；Wherein, G is a road network graph, representing an undirected graph road network structure composed of several intersecting edges; v ₁ represents the first vertex, v _n represents the nth vertex, V = {v ₁ ,v ₂ ,...v _n } represents the set of all vertices in the road network; e ₁ represents the first edge, e _n represents the nth edge, E = {e ₁ ,e ₂ ,...e _n } represents the set of edges in the road network; ω ₁ (t) represents the travel time function of the first edge, ω _n (t) represents the travel time function of the nth edge, and W = {ω ₁ (t),ω ₂ (t),...ω _n (t)} represents the set of travel time functions associated with the corresponding edges;

c、在路网中，起点到终点的一条路径ρ由一系列相邻的连通顶点序列表示<v_i,...v_j>，其中v_i表示第i个顶点，v_j表示第j个顶点，而P则用来表示从起点到终点的所有路径集合；记路径ρ的通行时间函数为τ(t)，那么将从起点到终点的最短通行时间函数τ^*(t)定义如下：c. In a road network, a path ρ from the starting point to the end point is represented by a series of adjacent connected vertex sequences < _vi ,... _vj >, where _vi represents the i-th vertex, _vj represents the j-th vertex, and P is used to represent the set of all paths from the starting point to the end point; let the travel time function of the path ρ be τ(t), then the shortest travel time function from the starting point to the end point τ ^* (t) is defined as follows:

τ^*(t)＝min{τ(t)|ρ∈P}τ ^* (t) = min{τ(t)|ρ∈P}

式中，τ(t)表示在时刻t出发，路径ρ的通行时间函数；τ^*(t)则表示在时刻t出发，从起点到终点的所有路径中最短的通行时间函数。In the formula, τ(t) represents the travel time function of path ρ starting at time t; τ ^* (t) represents the shortest travel time function among all paths from the starting point to the end point starting at time t.

进一步，所述步骤2)包括以下步骤：Further, the step 2) comprises the following steps:

2.1)定义了基于路网空间的Top-K空间关键字的查询参数query：2.1) Define the query parameter query based on the Top-K spatial keyword of the road network space:

query＝<v_q,q.key,t,k>query = <v _q ,q.key,t,k>

式中，v_q表示查询点，q.key表示查询的关键字词组，t代表查找的时刻，k表示返回的空间关键字对象o的数量；In the formula, v _q represents the query point, q.key represents the query keyword phrase, t represents the search time, and k represents the number of spatial keyword objects o returned;

2.2)使用最短通行时间函数τ^*(t)来衡量空间关键字对象o与查询点v_q的空间邻近性，定义空间得分函数f_s(o)如下：2.2) Use the shortest travel time function τ ^* (t) to measure the spatial proximity between the spatial keyword object o and the query point v _q , and define the spatial score function f _s (o) as follows:

式中，表示空间关键字对象o所在顶点v_o到查询点v_q之间的最短通行时间；In the formula, Represents the shortest travel time between the vertex v _o where the spatial keyword object o is located and the query point v _q ;

2.3)使用TF-IDF模型来衡量空间关键字对象o与查询的关键字词组q.key的文本相似性，定义文本得分函数f_d(o)如下：2.3) Use the TF-IDF model to measure the text similarity between the spatial keyword object o and the query keyword phrase q.key, and define the text score function f _d (o) as follows:

式中，key和q.key分别表示关键字和查询的关键字词组，tf(key)表示关键字key在空间关键字对象o中出现的次数，idf(key)表示关键字key在所有空间关键字对象o中出现的频率倒数；关键字key的重要性随着它在空间关键字对象o中出现的次数成正比增加，但同时会随着它在所有空间关键字对象o中出现的频率成反比下降；In the formula, key and q.key represent the keyword and the keyword phrase of the query respectively, tf(key) represents the number of times the keyword key appears in the spatial keyword object o, and idf(key) represents the inverse frequency of the keyword key in all spatial keyword objects o. The importance of the keyword key increases in direct proportion to the number of times it appears in the spatial keyword object o, but at the same time it decreases in inverse proportion to the frequency of its appearance in all spatial keyword objects o.

2.4)使用文本得分函数f_d(o)和空间得分函数f_s(o)得到空间文本得分函数其计算公式如下：2.4) Use the text score function f _d (o) and the space score function f _s (o) to get the space text score function The calculation formula is as follows:

式中，α表示权重因子，表示空间文本得分函数，它由空间得分函数f_s(o)和文本得分函数f_d(o)加权计算得到。In the formula, α represents the weight factor, represents the spatial text score function, which is obtained by weighted calculation of the spatial score function _fs (o) and the text score function _fd (o).

进一步，在步骤3)中，路网索引TKG-tree是通过图切割的方法来构造的，首先将整个路网图G当做根节点，然后把G切割成f个大小相同的子图，并把它们作为根节点的孩子节点，接着继续递归地切割这些子图，直到最后子图包含的顶点数量不超过μ个；根据路网的拓扑关系，运行Floyd算法得到节点n_i的通行时间矩阵M_i，矩阵M_i记录了该节点n_i对应子图G_i内各个顶点之间的最短通行时间函数τ^*(t)；对该节点n_i对应子图G_i内所有空间关键字对象o的关键字信息o.key取并集，得到节点n_i的倒排文档D_i。Furthermore, in step 3), the road network index TKG-tree is constructed by a graph cutting method. First, the entire road network graph G is taken as the root node, and then G is cut into f subgraphs of the same size, and they are taken as child nodes of the root node. Then, these subgraphs are cut recursively until the number of vertices contained in the final subgraph does not exceed μ; according to the topological relationship of the road network, the Floyd algorithm is run to obtain the travel time matrix _Mi of the node n _i . The matrix _Mi records the shortest travel time function τ ^* (t) between each vertex in the subgraph _Gi corresponding to the node n _i ; the keyword information o.key of all spatial keyword objects o in the subgraph _Gi corresponding to the node n _i is taken as the union, and the inverted document _Di of the node n _i is obtained.

进一步，在步骤4)中，输入查询参数后，以空间文本得分函数作为Top-K结果排序的指标进行查找；先定义一个最大优先队列Q和结果集R，从查询点v_q所在子图开始搜索，将子图内所有的空间关键字对象o加入优先队列Q并按照空间文本得分函数排序；使用n_x记录TKG-tree上当前访问到的最高层节点，标记当前的搜索范围为该节点对应的子图G_x，同时，定义节点的空间文本得分作为上界进行剪枝：Further, in step 4), after entering the query parameters, the spatial text score function As the index for sorting Top-K results, we search for it; first, we define a maximum priority queue Q and result set R, start the search from the subgraph where the query point _vq is located, add all spatial keyword objects o in the subgraph to the priority queue Q, and then search according to the spatial text score function. Sorting; use n _x to record the highest level node currently visited on the TKG-tree, mark the current search range as the subgraph G _x corresponding to the node, and define the spatial text score of the node Prune as an upper bound:

式中，f_s(node)表示查询点v_q到当前子图G_x的最短通行时间，使用路网索引TKG-tree的通行时间矩阵M_i计算得到；表示空间关键字对象o能达到的最大文本得分，使用路网索引TKG-tree的倒排文档D_i计算得到；将队列Q中的空间关键字对象o依次出队，筛选出空间文本得分比上界高的空间关键字对象o，加入结果集R；若队列Q为空且结果集R未满k个，则将n_x更新为原来节点的父节点，并扩大当前的搜索范围为n_x节点更新后对应的子图，同时将改为n_x节点更新后的空间文本得分，重复上述操作，直到结果集R有k个结果，算法自动结束；最大优先队列Q和上界保证了从查询点v_q到第k个结果是全局最优的，因此，能正确的找到Top-K空间关键字对象o。Where _fs (node) represents the shortest travel time from the query point _vq to the current subgraph _Gx , which is calculated using the travel time matrix _Mi of the road network index TKG-tree; represents the maximum text score that the spatial keyword object o can achieve, which is calculated using the inverted document _Di of the road network index TKG-tree; the spatial keyword objects o in the queue Q are dequeued one by one, and the spatial text scores are greater than the upper bound. The high spatial keyword object o is added to the result set R; if the queue Q is empty and the result set R is less than k, n _x is updated to the parent node of the original node, and the current search range is expanded to the subgraph corresponding to the updated n _x node. Change to the updated spatial text score of n _x nodes, repeat the above operation until the result set R has k results, and the algorithm ends automatically; the maximum priority queue Q and the upper bound It is guaranteed that the query point v _q to the kth result is globally optimal, so the Top-K space keyword object o can be found correctly.

本发明与现有技术相比，具有如下优点与有益效果：Compared with the prior art, the present invention has the following advantages and beneficial effects:

1、本发明考虑了时间因素对Top-K路网距离的影响，更加具有实际应用价值。1. The present invention takes into account the influence of time factors on the Top-K road network distance and has more practical application value.

2、本发明与其它路网空间的Top-K查询方法相比，预处理时间更短，存储代价更小。2. Compared with other Top-K query methods in road network space, the present invention has shorter preprocessing time and lower storage cost.

3、本发明在不同规模的数据集上均有着毫秒级的查询响应时间，具有良好的通用性和可拓展性。3. The present invention has a millisecond query response time on data sets of different sizes and has good versatility and scalability.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为本发明方法的逻辑流程示意图。FIG1 is a schematic diagram of a logic flow of the method of the present invention.

图2为本发明所使用的路网索引TKG-tree示意图。FIG. 2 is a schematic diagram of a road network index TKG-tree used in the present invention.

具体实施方式DETAILED DESCRIPTION

下面结合实施例及附图对本发明作进一步详细的描述，但本发明的实施方式不限于此。The present invention is further described in detail below in conjunction with embodiments and drawings, but the embodiments of the present invention are not limited thereto.

参见图1所示，本实施例提供了一种基于路网索引的Top-K空间关键字查询方法，其具体情况如下：As shown in FIG1 , this embodiment provides a Top-K spatial keyword query method based on a road network index, and the specific details are as follows:

1)对空间关键字对象、路网图和最短通行时间进行了定义：1) The spatial keyword object, road network diagram and shortest travel time are defined:

o＝(v_o,o.key)o＝(v _o ,o.key)

式中，v_o表示对象所在的顶点位置，o.key表示对象的关键字信息；In the formula, v _o represents the vertex position of the object, and o.key represents the keyword information of the object;

c、在路网中，起点到终点的一条路径ρ可由一系列相邻的连通顶点序列表示<v_i,...v_j>，其中v_i表示第i个顶点，v_j表示第j个顶点，而P则可用来表示从起点到终点的所有路径集合。记路径ρ的通行时间函数为τ(t)，那么将从起点到终点的最短通行时间函数τ^*(t)定义如下：c. In a road network, a path ρ from the starting point to the end point can be represented by a series of adjacent connected vertex sequences <v _i ,...v _j >, where v _i represents the i-th vertex, v _j represents the j-th vertex, and P can be used to represent the set of all paths from the starting point to the end point. Let the travel time function of the path ρ be τ(t), then the shortest travel time function from the starting point to the end point τ ^* (t) is defined as follows:

τ^*(t)＝min{τ(t)|ρ∈P}τ ^* (t) = min{τ(t)|ρ∈P}

2)使用步骤1)得到的最短通行时间函数τ^*(t)衡量空间关键字对象o的空间邻近性；使用TF-IDF模型衡量空间关键字对象o的文本相似性；使用文本得分函数和空间得分函数加权计算得到空间文本得分函数，其中包括以下步骤：2) Using the shortest travel time function τ ^* (t) obtained in step 1) to measure the spatial proximity of the spatial keyword object o; using the TF-IDF model to measure the text similarity of the spatial keyword object o; using the text score function and the spatial score function to perform weighted calculation to obtain the spatial text score function, which includes the following steps:

query＝<v_q,q.key,t,k>query = <v _q ,q.key,t,k>

式中，表示空间关键字对象o所在顶点v_o到查询点v_q之间的最短通行时间。In the formula, Represents the shortest travel time between the vertex v _o where the spatial keyword object o is located and the query point v _q .

式中，key和q.key分别表示关键字和查询的关键字词组，tf(key)表示关键字key在空间关键字对象o中出现的次数，idf(key)表示关键字key在所有空间关键字对象o中出现的频率倒数；关键字key的重要性随着它在空间关键字对象o中出现的次数成正比增加，但同时会随着它在所有空间关键字对象o中出现的频率成反比下降。In the formula, key and q.key represent the keyword and the keyword phrase of the query respectively, tf(key) represents the number of times the keyword key appears in the spatial keyword object o, and idf(key) represents the inverse frequency of the keyword key in all spatial keyword objects o; the importance of the keyword key increases in direct proportion to the number of times it appears in the spatial keyword object o, but at the same time it decreases in inverse proportion to the frequency of its appearance in all spatial keyword objects o.

3)使用步骤1)定义路网所包含的边和顶点信息对路网索引TKG-tree进行构造；路网索引TKG-tree是通过图切割的方法来构造的，如图2所示，我们把非叶子节点的分支数f设置为2，把叶子节点的最大顶点数量μ设为4；首先将整个路网图G当做根节点，然后把G切割成f个大小相同的子图，并把它们作为根节点的孩子节点，接着继续递归地切割这些子图，直到最后子图包含的顶点数量不超过μ个；根据路网的拓扑关系，运行Floyd算法得到节点n_i的通行时间矩阵M_i，矩阵M_i记录了该节点n_i对应子图G_i内各个顶点之间的最短通行时间函数τ^*(t)；对该节点n_i对应子图G_i内所有空间关键字对象o的关键字信息o.key取并集，得到节点n_i的倒排文档D_i。3) Use the edge and vertex information defined in step 1) to construct the road network index TKG-tree; the road network index TKG-tree is constructed by graph cutting, as shown in Figure 2, we set the number of branches f of non-leaf nodes to 2, and the maximum number of vertices μ of leaf nodes to 4; first, take the entire road network graph G as the root node, then cut G into f subgraphs of the same size, and use them as child nodes of the root node, and then continue to recursively cut these subgraphs until the number of vertices contained in the final subgraph does not exceed μ; according to the topological relationship of the road network, run the Floyd algorithm to obtain the travel time matrix _Mi of the node n _i , the matrix _Mi records the shortest travel time function τ ^* (t) between each vertex in the subgraph _Gi corresponding to the node n _i ; take the union of the keyword information o.key of all spatial keyword objects o in the subgraph Gi corresponding to the node _n _i , and obtain the inverted document _Di of the node n _i .

4)以步骤2)得到的空间文本得分函数作为Top-K结果排序的指标，从查询点v_q出发，利用步骤3)构建的路网索引TKG-tree找到排名Top-K的空间关键字对象；先定义一个最大优先队列Q和结果集R，从查询点v_q所在子图开始搜索，将子图内所有的空间关键字对象o加入优先队列Q并按照空间文本得分函数排序；使用n_x记录TKG-tree上当前访问到的最高层节点，标记当前的搜索范围为该节点对应的子图G_x，同时，定义节点的空间文本得分作为上界进行剪枝：4) Using the spatial text score function obtained in step 2) As the index for sorting the Top-K results, starting from the query point _vq , the road network index TKG-tree constructed in step 3) is used to find the spatial keyword objects ranked Top-K; first define a maximum priority queue Q and result set R, start the search from the subgraph where the query point _vq is located, add all the spatial keyword objects o in the subgraph to the priority queue Q and sort them according to the spatial text score function Sorting; use n _x to record the highest level node currently visited on the TKG-tree, mark the current search range as the subgraph G _x corresponding to the node, and define the spatial text score of the node Prune as an upper bound:

式中，f_s(node)表示查询点v_q到当前子图G_x的最短通行时间，使用路网索引TKG-tree的通行时间矩阵M_i计算得到；表示空间关键字对象o能达到的最大文本得分，使用路网索引TKG-tree的倒排文档D_i计算得到；将队列Q中的空间关键字对象o依次出队，筛选出空间文本得分比上界高的空间关键字对象o，加入结果集R；若队列Q为空且结果集R未满k个，则将n_x更新为原来节点的父节点，并扩大当前的搜索范围为n_x节点更新后对应的子图，同时将改为n_x节点更新后的空间文本得分，重复上述操作，直到正确地找到Top-K空间关键字对象o。Where _fs (node) represents the shortest travel time from the query point _vq to the current subgraph _Gx , which is calculated using the travel time matrix _Mi of the road network index TKG-tree; represents the maximum text score that the spatial keyword object o can achieve, which is calculated using the inverted document _Di of the road network index TKG-tree; the spatial keyword objects o in the queue Q are dequeued one by one, and the spatial text scores are greater than the upper bound. The high spatial keyword object o is added to the result set R; if the queue Q is empty and the result set R is less than k, n _x is updated to the parent node of the original node, and the current search range is expanded to the subgraph corresponding to the updated n _x node. Change to the updated spatial text score of n _x nodes and repeat the above operation until the Top-K spatial keyword object o is found correctly.

上述实施例为本发明较佳的实施方式，但本发明的实施方式并不受上述实施例的限制，其他的任何未背离本发明的精神实质与原理下所作的改变、修饰、替代、组合、简化，均应为等效的置换方式，都包含在本发明的保护范围之内。The above embodiments are preferred implementation modes of the present invention, but the implementation modes of the present invention are not limited to the above embodiments. Any other changes, modifications, substitutions, combinations, and simplifications that do not deviate from the spirit and principles of the present invention should be equivalent replacement methods and are included in the protection scope of the present invention.

Claims

1. A Top-K spatial keyword query method based on a road network index, characterized in that it comprises the following steps:

1) Define the relevant concepts of the road network, including the spatial keyword object o, the road network graph G and the shortest travel time function τ ^* (t) between two vertices;

2) Use the shortest travel time function τ ^* (t) obtained in step 1) to measure the spatial proximity of the spatial keyword object o, and obtain the spatial score function, denoted as f _s (o); use the TF-IDF model to measure the text similarity of the spatial keyword object o, and obtain the text score function, denoted as f _d (o); use the text score function f _d (o) and the spatial score function f _s (o) to perform weighted calculation to obtain the spatial text score function, denoted as

3) Based on the road network defined in step 1), the road network index TKG-tree is constructed using the edge and vertex information it contains; TKG-tree is a balanced tree structure obtained by graph cutting, whose root node corresponds to the entire road network graph G, and has the following properties: each descendant node n _i of the root node corresponds to a subgraph G _i , and the bottom layer of nodes are called leaf nodes; TKG-tree has two hyperparameters f and μ, which represent the number of branches of non-leaf nodes and the upper limit of the number of vertices of leaf nodes respectively; each node n _i contains a travel time matrix M _i and an inverted document D _i that records the object keyword information o.key;

4) Using the spatial text score function obtained in step 2) As the indicator for sorting the Top-K results, starting from the query point _vq , the road network index TKG-tree constructed in step 3) is used to find the spatial keyword object o ranked Top-K.

2. The Top-K spatial keyword query method based on road network index according to claim 1 is characterized in that in step 1), the following concepts are defined:

a. In the edge of the road network, the travel time is used to represent the length of the edge, and the travel time of the edge will change with the time t, so the travel time function of the edge is recorded as ω(t); in the vertices of the road network, some of the vertices carry text information in the form of keywords. The vertex with spatial position information and text keyword information is called spatial keyword object o, which is expressed as:

o＝(v _o ,o.key)

In the formula, v _o represents the vertex position of the object, and o.key represents the object keyword information;

b. Model the road network as an undirected graph, expressed as:

Wherein, G is a road network graph, representing an undirected graph road network structure composed of several intersecting edges; v ₁ represents the first vertex, v _n represents the nth vertex, V = {v ₁ ,v ₂ ,...v _n } represents the set of all vertices in the road network; e ₁ represents the first edge, e _n represents the nth edge, E = {e ₁ ,e ₂ ,...e _n } represents the set of edges in the road network; ω ₁ (t) represents the travel time function of the first edge, ω _n (t) represents the travel time function of the nth edge, and W = {ω ₁ (t),ω ₂ (t),...ω _n (t)} represents the set of travel time functions associated with the corresponding edges;

c. In a road network, a path ρ from the starting point to the end point is represented by a series of adjacent connected vertex sequences < _vi ,... _vj >, where _vi represents the i-th vertex, _vj represents the j-th vertex, and P is used to represent the set of all paths from the starting point to the end point; let the travel time function of the path ρ be τ(t), then the shortest travel time function from the starting point to the end point τ ^* (t) is defined as follows:

τ ^* (t) = min{τ(t)|ρ∈P}

In the formula, τ(t) represents the travel time function of path ρ starting at time t; τ ^* (t) represents the shortest travel time function among all paths from the starting point to the end point starting at time t.

3. The Top-K spatial keyword query method based on road network index according to claim 1, characterized in that the step 2) comprises the following steps:

2.1) Define the query parameter query based on the Top-K spatial keyword of the road network space:

query＝<v _q ,q.key,t,k＞

In the formula, v _q represents the query point, q.key represents the query keyword phrase, t represents the search time, and k represents the number of spatial keyword objects o returned;

2.2) Use the shortest travel time function τ ^* (t) to measure the spatial proximity between the spatial keyword object o and the query point v _q , and define the spatial score function f _s (o) as follows:

In the formula, Represents the shortest travel time between the vertex v _o where the spatial keyword object o is located and the query point v _q ;

2.3) Use the TF-IDF model to measure the text similarity between the spatial keyword object o and the query keyword phrase q.key, and define the text score function f _d (o) as follows:

In the formula, key and q.key represent the keyword and the keyword phrase of the query respectively, tf(key) represents the number of times the keyword key appears in the spatial keyword object o, and idf(key) represents the inverse frequency of the keyword key in all spatial keyword objects o. The importance of the keyword key increases in direct proportion to the number of times it appears in the spatial keyword object o, but at the same time it decreases in inverse proportion to the frequency of its appearance in all spatial keyword objects o.

2.4) Use the text score function f _d (o) and the space score function f _s (o) to get the space text score function The calculation formula is as follows:

In the formula, α represents the weight factor, represents the spatial text score function, which is obtained by weighted calculation of the spatial score function _fs (o) and the text score function _fd (o).

4. The Top-K spatial keyword query method based on road network index according to claim 1 is characterized in that: in step 3), the road network index TKG-tree is constructed by a graph cutting method, firstly taking the entire road network graph G as the root node, then cutting G into f subgraphs of the same size, and taking them as child nodes of the root node, and then continuing to recursively cut these subgraphs until the number of vertices contained in the final subgraph does not exceed μ; according to the topological relationship of the road network, running the Floyd algorithm to obtain the transit time matrix _Mi of the node n _i , the matrix _Mi records the shortest transit time function τ ^* (t) between each vertex in the subgraph _Gi corresponding to the node n _i ; taking the union of the keyword information o.key of all spatial keyword objects o in the subgraph _Gi corresponding to the node n _i , and obtaining the inverted document _Di of the node n _i .

5. The Top-K spatial keyword query method based on road network index according to claim 1 is characterized in that: in step 4), after the query parameters are input, the spatial text score function is used to As the index for sorting Top-K results, we search for it; first, we define a maximum priority queue Q and result set R, start the search from the subgraph where the query point _vq is located, add all spatial keyword objects o in the subgraph to the priority queue Q, and then search according to the spatial text score function. Sorting; use n _x to record the highest level node currently visited on the TKG-tree, mark the current search range as the subgraph G _x corresponding to the node, and define the spatial text score of the node Prune as an upper bound:

Where _fs (node) represents the shortest travel time from the query point _vq to the current subgraph _Gx , which is calculated using the travel time matrix _Mi of the road network index TKG-tree; represents the maximum text score that the spatial keyword object o can achieve, which is calculated using the inverted document _Di of the road network index TKG-tree; the spatial keyword objects o in the queue Q are dequeued one by one, and the spatial text scores are greater than the upper bound. The high spatial keyword object o is added to the result set R; if the queue Q is empty and the result set R is less than k, n _x is updated to the parent node of the original node, and the current search range is expanded to the subgraph corresponding to the updated n _x node. Change to the updated spatial text score of n _x nodes, repeat the above operation until the result set R has k results, and the algorithm ends automatically; the maximum priority queue Q and the upper bound It is guaranteed that the query point v _q to the kth result is globally optimal, so the Top-K space keyword object o can be found correctly.