[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN110865992A - A retrieval library management method, retrieval method, device and medium - Google Patents

A retrieval library management method, retrieval method, device and medium Download PDF

Info

Publication number
CN110865992A
CN110865992A CN201911044479.8A CN201911044479A CN110865992A CN 110865992 A CN110865992 A CN 110865992A CN 201911044479 A CN201911044479 A CN 201911044479A CN 110865992 A CN110865992 A CN 110865992A
Authority
CN
China
Prior art keywords
data
retrieval
hot
cold
library
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911044479.8A
Other languages
Chinese (zh)
Other versions
CN110865992B (en
Inventor
李明耀
韦跃明
严石伟
蒋楠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Cloud Computing Beijing Co Ltd
Original Assignee
Tencent Cloud Computing Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Cloud Computing Beijing Co Ltd filed Critical Tencent Cloud Computing Beijing Co Ltd
Priority to CN201911044479.8A priority Critical patent/CN110865992B/en
Publication of CN110865992A publication Critical patent/CN110865992A/en
Application granted granted Critical
Publication of CN110865992B publication Critical patent/CN110865992B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/214Database migration support
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Library & Information Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a search library management method, a search method, a device and a medium, wherein the search library management method comprises the following steps: acquiring attribute parameters of each thermal data under the condition that the total number of data in the thermal search library is greater than a first preset scale threshold; migrating the hot data with the attribute parameters meeting the hot data management conditions to a cold search library; calculating first matching degrees of each cold data and each hot data stored in a cold search library; and migrating cold data meeting cold data management conditions to a hot search library. According to the method and the device, the cold data and the hot data of the search library are separated, and the scale of the hot search library is ensured to be within a threshold range by triggering the cold and hot library upgrading and degrading logic, so that the effectiveness of the search data is improved, and the search reliability, accuracy and search efficiency are improved.

Description

一种检索库管理方法、检索方法、装置及介质A retrieval library management method, retrieval method, device and medium

技术领域technical field

本申请涉及计算机技术领域,尤其涉及一种检索库管理方法、检索方法、装置及介质。The present application relates to the field of computer technology, and in particular, to a retrieval library management method, retrieval method, device and medium.

背景技术Background technique

目前,应用人工智能技术进行身份识别与检索已经在智慧零售、智慧社区、智慧金融、公安安防等应用场景已经得到一定的落地和推广。At present, the application of artificial intelligence technology for identification and retrieval has been implemented and promoted in smart retail, smart community, smart finance, public security security and other application scenarios.

在上述应用场景下,为了提高检索效率,进行身份识别的检索库一般固定存储在并发能力和访存速度更优秀的处理器中。以人脸检索场景为例,通常将人脸数据库存储在图像处理器(Graphics Processing Unit,GPU)显存中。然而,在这些应用场景下,随着新身份用户的大量加入,会增加消耗大量的优秀处理器且价格高昂(例如GPU)资源,大大增加检索的存储成本。此外,现有检索方法的检索可靠性、准确性和检索效率还需进一步提高。In the above application scenarios, in order to improve the retrieval efficiency, the retrieval library for identification is generally stored in a processor with better concurrency capability and better memory access speed. Taking a face retrieval scene as an example, the face database is usually stored in the video memory of an image processor (Graphics Processing Unit, GPU). However, in these application scenarios, with the addition of a large number of new identity users, it will increase the consumption of a large number of excellent processors and expensive (such as GPU) resources, greatly increasing the storage cost of retrieval. In addition, the retrieval reliability, accuracy and retrieval efficiency of the existing retrieval methods need to be further improved.

发明内容SUMMARY OF THE INVENTION

本申请提供了一种检索库管理方法、检索方法、装置及介质,以解决以上至少一种技术问题。The present application provides a retrieval library management method, retrieval method, device and medium to solve at least one of the above technical problems.

一方面,本申请提供了一种检索库管理方法,所述检索库包括热检索库和冷检索库,所述方法包括:In one aspect, the present application provides a retrieval database management method, the retrieval database includes a hot retrieval database and a cold retrieval database, and the method includes:

在所述热检索库中存储的热数据的总数大于第一预设规模阈值的情况下,获取所述热检索库中存储的各热数据的属性参数;In the case that the total number of hot data stored in the hot search library is greater than a first preset scale threshold, acquiring attribute parameters of each hot data stored in the hot search library;

若确定热数据的属性参数满足热数据管理条件,将满足热数据管理条件的热数据迁移至所述冷检索库;If it is determined that the attribute parameter of the hot data satisfies the hot data management condition, the hot data satisfying the hot data management condition is migrated to the cold retrieval library;

计算所述冷检索库中存储的各冷数据分别与所述热检索库中各热数据的第一匹配度;calculating a first degree of matching between each cold data stored in the cold retrieval library and each hot data in the hot retrieval library;

若确定第一匹配度满足冷数据管理条件,将满足冷数据管理条件的冷数据迁移至所述热检索库。If it is determined that the first matching degree satisfies the cold data management condition, the cold data satisfying the cold data management condition is migrated to the hot retrieval library.

另一方面本申请还提供了一种检索方法,采用经过数据管理的检索库进行检索,所述检索库包括热检索库和冷检索库,所述方法包括:On the other hand, the present application also provides a retrieval method, which uses a retrieval database managed by data for retrieval, the retrieval database includes a hot retrieval database and a cold retrieval database, and the method includes:

获取对象的待检索特征数据;Obtain the feature data to be retrieved of the object;

在经过检索库管理的热检索库中对所述待检索特征数据进行检索,得到检索结果;Retrieve the feature data to be retrieved in the hot retrieval database managed by the retrieval database to obtain retrieval results;

返回所述检索结果;return the search result;

其中,所述检索库通过以下检索库管理方法进行数据管理,所述数据库管理方法包括:Wherein, the retrieval library performs data management through the following retrieval library management method, and the database management method includes:

在所述热检索库中存储的热数据的总数大于第一预设规模阈值的情况下,获取所述热检索库中各热数据的属性参数;In the case where the total number of hot data stored in the hot search library is greater than a first preset scale threshold, acquiring attribute parameters of each hot data in the hot search library;

若确定热数据的属性参数满足热数据管理条件,将满足热数据管理条件的热数据迁移至所述冷检索库;If it is determined that the attribute parameter of the hot data satisfies the hot data management condition, the hot data satisfying the hot data management condition is migrated to the cold retrieval library;

计算所述冷检索库中存储的各冷数据分别与所述热检索库中各热数据的第一匹配度;calculating a first degree of matching between each cold data stored in the cold retrieval library and each hot data in the hot retrieval library;

若确定第一匹配度满足冷数据管理条件,将满足冷数据管理条件的冷数据迁移至所述热检索库。If it is determined that the first matching degree satisfies the cold data management condition, the cold data satisfying the cold data management condition is migrated to the hot retrieval library.

另一方面本申请还提供了一种检索库管理装置,所述检索库包括热检索库和冷检索库,所述装置包括:On the other hand, the present application also provides a retrieval database management device, the retrieval database includes a hot retrieval database and a cold retrieval database, and the device includes:

属性获取模块,用于在所述热检索库中存储的热数据的总数大于第一预设规模阈值的情况下,获取所述热检索库中存储的各热数据的属性参数;an attribute acquisition module, configured to acquire attribute parameters of each hot data stored in the hot search library when the total number of hot data stored in the hot search library is greater than a first preset scale threshold;

第一迁移模块,用于若确定热数据的属性参数满足热数据管理条件,将满足热数据管理条件的热数据迁移至所述冷检索库;a first migration module, configured to migrate the hot data satisfying the hot data management condition to the cold retrieval library if the attribute parameter of the hot data is determined to satisfy the hot data management condition;

第一计算模块,用于计算所述冷检索库中存储的各冷数据分别与所述热检索库中各热数据的第一匹配度;a first calculation module, configured to calculate a first degree of matching between each cold data stored in the cold retrieval library and each hot data in the hot retrieval library;

第二迁移模块,用于若确定所述第一匹配度满足冷数据管理条件,将满足冷数据管理条件的冷数据迁移至所述热检索库。The second migration module is configured to migrate the cold data satisfying the cold data management condition to the hot retrieval library if it is determined that the first matching degree satisfies the cold data management condition.

另一方面本申请还提供了一种检索装置,所述装置包括:On the other hand, the present application also provides a retrieval device, the device includes:

特征管理模块,用于获取对象的待检索特征数据;The feature management module is used to obtain the feature data to be retrieved of the object;

上述所述的检索库管理装置,用于对检索库进行数据管理,所述检索库包括热检索库和冷检索库;The above-mentioned retrieval library management device is used for data management of the retrieval library, and the retrieval library includes a hot retrieval library and a cold retrieval library;

检索模块,用于在经过数据管理的热检索库中对所述待检索特征数据进行检索,得到检索结果;a retrieval module, used for retrieving the feature data to be retrieved in the hot retrieval database managed by data to obtain retrieval results;

返回模块,用于返回所述检索结果。The return module is used to return the retrieval result.

另一方面还提供一种检索库管理设备,所述设备包括处理器和存储器,所述存储器中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或指令集由所述处理器加载并执行以实现上述任一所述的检索库管理方法。Another aspect also provides a retrieval library management device, the device includes a processor and a memory, the memory stores at least one instruction, at least a piece of program, a code set or an instruction set, the at least one instruction, the at least one A piece of program, the code set or the instruction set is loaded and executed by the processor to implement any one of the retrieval library management methods described above.

另一方面还提供一种检索设备,所述设备包括处理器和存储器,所述存储器中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或指令集由所述处理器加载并执行以实现上述任一所述的检索方法。Another aspect also provides a retrieval device, the device includes a processor and a memory, wherein the memory stores at least one instruction, at least one program, a code set or an instruction set, the at least one instruction, the at least one program , the code set or the instruction set is loaded and executed by the processor to implement any one of the retrieval methods described above.

另一方面还提供一种计算机可读存储介质,所述存储介质中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、至少一段程序、代码集或指令集由处理器加载并执行如上述任一所述的检索库管理方法。Another aspect also provides a computer-readable storage medium, where at least one instruction, at least one piece of program, code set or instruction set is stored in the storage medium, wherein the at least one instruction, at least one piece of program, code set or instruction set is composed of The processor loads and executes the retrieval library management method as described above.

另一方面还提供一种计算机可读存储介质,所述存储介质中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、至少一段程序、代码集或指令集由处理器加载并执行如上述任一所述的检索方法。Another aspect also provides a computer-readable storage medium, where at least one instruction, at least one piece of program, code set or instruction set is stored in the storage medium, wherein the at least one instruction, at least one piece of program, code set or instruction set is composed of The processor loads and executes the retrieval method as described in any of the above.

本申请提供的一种检索库管理方法、检索方法、装置及介质,具有如下技术效果:A retrieval library management method, retrieval method, device and medium provided by this application have the following technical effects:

本申请实施例的检索库包括热检索库和冷检索库,通过在所述热检索库中存储的热数据的总数大于第一预设规模阈值的情况下,获取所述热检索库中存储的各热数据的属性参数;若确定热数据的属性参数满足热数据管理条件,将满足热数据管理条件的热数据迁移至所述冷检索库;计算所述冷检索库中存储的各冷数据分别与所述热检索库中各热数据的第一匹配度;若确定第一匹配度满足冷数据管理条件,将满足冷数据管理条件的冷数据迁移至所述热检索库。从而,通过将检索库的冷热数据进行分离,在热检索库中存储的热数据的总数大于第一预设规模阈值的情况下,触发冷热库升降级逻辑,保证了热检索库的规模在阈值范围内,提高了热检索库中热数据的有效性,且提高了基于管理后的热检索库的检索可靠性、准确性和检索效率。The retrieval database in the embodiment of the present application includes a hot retrieval database and a cold retrieval database, and when the total number of hot data stored in the hot retrieval database is greater than a first preset scale threshold, obtain the data stored in the hot retrieval database. attribute parameters of each hot data; if it is determined that the attribute parameters of the hot data meet the hot data management conditions, migrate the hot data that meets the hot data management conditions to the cold retrieval library; calculate the respective cold data stored in the cold retrieval library The first matching degree with each hot data in the hot retrieval library; if it is determined that the first matching degree satisfies the cold data management condition, the cold data satisfying the cold data management condition is migrated to the hot retrieval library. Therefore, by separating the hot and cold data of the retrieval library, when the total number of hot data stored in the hot retrieval library is greater than the first preset scale threshold, the upgrading and upgrading logic of the hot and cold library is triggered, so as to ensure the scale of the hot retrieval library Within the threshold range, the validity of the hot data in the hot search library is improved, and the retrieval reliability, accuracy and search efficiency of the managed hot search library are improved.

附图说明Description of drawings

为了更清楚地说明本申请实施例或现有技术中的技术方案和优点,下面将对实施例或现有技术描述中所需要使用的附图作简单的介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它附图。In order to more clearly illustrate the technical solutions and advantages in the embodiments of the present application or in the prior art, the following briefly introduces the accompanying drawings used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description The drawings are only some embodiments of the present application. For those of ordinary skill in the art, other drawings can also be obtained based on these drawings without any creative effort.

图1是本申请实施例提供的一种实施环境的示意图;1 is a schematic diagram of an implementation environment provided by an embodiment of the present application;

图2是本申请实施例提供的一种检索库管理方法的流程示意图;2 is a schematic flowchart of a retrieval library management method provided by an embodiment of the present application;

图3是本申请实施例提供的将热数据迁移至冷检索库的步骤的流程示意图;3 is a schematic flowchart of a step of migrating hot data to a cold retrieval library provided by an embodiment of the present application;

图4是本申请实施例提供的针对将热数据迁移至冷检索库的流程示意图;4 is a schematic flowchart for migrating hot data to a cold retrieval library provided by an embodiment of the present application;

图5是本申请实施例提供的对热检索库进行数据管理的步骤的流程示意图;5 is a schematic flowchart of a step of performing data management on a hot search library provided by an embodiment of the present application;

图6是本申请实施例提供的冷检索库的清库步骤的流程示意图;6 is a schematic flowchart of a warehouse clearing step of a cold retrieval warehouse provided by an embodiment of the present application;

图7是本申请实施例提供的对冷检索库进行数据管理的步骤的流程示意图;7 is a schematic flowchart of a step of performing data management on a cold retrieval database provided by an embodiment of the present application;

图8是本申请实施例提供的一种检索库管理装置的结构框图;8 is a structural block diagram of a retrieval library management apparatus provided by an embodiment of the present application;

图9是本申请实施例提供的一种检索方法的流程示意图;9 is a schematic flowchart of a retrieval method provided by an embodiment of the present application;

图10是本申请实现基于人脸检索场景的检索方法的框架示意图;10 is a schematic diagram of a framework for implementing a retrieval method based on a face retrieval scenario in the present application;

图11是本申请实施例提供的一种检索装置的结构框图;11 is a structural block diagram of a retrieval apparatus provided by an embodiment of the present application;

图12是本申请提供的一种用于实现本申请实施例所提供的方法的设备的硬件结构示意图。FIG. 12 is a schematic diagram of the hardware structure of a device provided by the present application for implementing the method provided by the embodiment of the present application.

具体实施方式Detailed ways

机器学习(Machine Learning,ML)是一门多领域交叉学科,涉及概率论、统计学、逼近论、凸分析、算法复杂度理论等多门学科。专门研究计算机怎样模拟或实现人类的学习行为,以获取新的知识或技能,重新组织已有的知识结构使之不断改善自身的性能。机器学习是人工智能的核心,是使计算机具有智能的根本途径,其应用遍及人工智能的各个领域。机器学习和深度学习通常包括人工神经网络、置信网络、强化学习、迁移学习、归纳学习、式教学习等技术。Machine Learning (ML) is a multi-field interdisciplinary subject involving probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and other disciplines. It specializes in how computers simulate or realize human learning behaviors to acquire new knowledge or skills, and to reorganize existing knowledge structures to continuously improve their performance. Machine learning is the core of artificial intelligence and the fundamental way to make computers intelligent, and its applications are in all fields of artificial intelligence. Machine learning and deep learning usually include artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, teaching learning and other technologies.

为了使本技术领域的人员更好地理解本申请方案,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分的实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都应当属于本申请保护的范围。In order to make those skilled in the art better understand the solutions of the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are only The embodiments are part of the present application, but not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the scope of protection of the present application.

首先,对本发明中可能涉及到的技术术语进行简要描述:First, briefly describe the technical terms that may be involved in the present invention:

人脸识别:指提取人脸特征并进行相似度比对的过程,根据应用场景不同,人脸识别主要分为1:1人脸验证(Face Verification),1:N人脸识别(Face Recognition)和人脸检索(Face Retrieval)三个类别。Face recognition: refers to the process of extracting face features and performing similarity comparison. According to different application scenarios, face recognition is mainly divided into 1:1 face verification (Face Verification), 1:N face recognition (Face Recognition) and face retrieval (Face Retrieval) three categories.

1:N人脸识别:在大规模人脸数据库中找出与待检索人脸相似度最高的一个或多个人脸。1:N face recognition: Find one or more faces with the highest similarity to the face to be retrieved in a large-scale face database.

SDK:软件开发工具包(Software Development Kit,SDK)一般是一些被软件工程师用于为特定的软件包、软件框架、硬件平台、业务系統等建立应用软件的开发工具的集合。SDK: A software development kit (Software Development Kit, SDK) is generally a collection of development tools used by software engineers to build application software for specific software packages, software frameworks, hardware platforms, business systems, and the like.

CPU/GPU:CPU是中央处理器,GPU是图像处理器;两者都是视频多媒体的计算资源。CPU/GPU: The CPU is the central processing unit, and the GPU is the image processor; both are computing resources for video multimedia.

目前,身份识别与检索已经在智慧零售、智慧社区、智慧金融、公安安防等应用场景已经得到一定的落地和推广。在这些应用场景下,为了提高检索效率,进行身份识别的检索库一般固定存储在并发能力和访存速度更优秀的处理器中。现有检索方法主要适用于固定检索底库的应用场景,例如检索底库提前注册完成的场景(例如门禁、考勤等);而对于检索底库不固定的场景(例如智慧零售、智慧社区等),随着新身份用户的大量加入,检索底库的规模不断增大,对现有检索方法带来巨大的挑战。At present, identity recognition and retrieval have been implemented and promoted in smart retail, smart community, smart finance, public security security and other application scenarios. In these application scenarios, in order to improve the retrieval efficiency, the retrieval library for identification is generally stored in a processor with better concurrency capability and better memory access speed. Existing retrieval methods are mainly suitable for application scenarios with fixed retrieval bases, such as scenarios where retrieval bases are registered in advance (such as access control, attendance, etc.); and for scenarios where retrieval bases are not fixed (such as smart retail, smart communities, etc.) , with the addition of a large number of users with new identities, the scale of the retrieval base continues to increase, which brings huge challenges to the existing retrieval methods.

以人脸检索场景为例,通常将人脸数据库存储在图像处理器(GraphicsProcessing Unit,GPU)显存中。然而,在这些应用场景下,随着新身份用户的大量加入,会增加消耗大量的优秀处理器(例如GPU)资源,大大增加检索的存储成本。此外,现有检索方法的检索可靠性、准确性和检索效率还需进一步提高。Taking a face retrieval scene as an example, the face database is usually stored in the video memory of an image processor (Graphics Processing Unit, GPU). However, in these application scenarios, with a large number of new users joining, it will consume a lot of excellent processor (such as GPU) resources and greatly increase the storage cost of retrieval. In addition, the retrieval reliability, accuracy and retrieval efficiency of the existing retrieval methods need to be further improved.

另外,发明人还发现:在身份识别过程中,通常采用检索算法在检索库中进行身份检索,然而,基于检索算法的检索准确率依赖于检索库的规模大小。若检索库的规模越大,对检索算法的模型精度和计算性能要求越高。例如,以人脸检索场景为例,随着检索库中用户身份数量的增长,一般人脸识别算法的准确率会明显下降。如下表1所示,是同一个检索算法模型,在千分之一错误率下,不同检索库规模的人脸首位命中率。In addition, the inventor also found that: in the process of identity recognition, a retrieval algorithm is usually used to perform identity retrieval in the retrieval database, however, the retrieval accuracy based on the retrieval algorithm depends on the size of the retrieval database. If the scale of the retrieval database is larger, the model accuracy and computing performance of the retrieval algorithm are required to be higher. For example, taking the face retrieval scenario as an example, as the number of user identities in the retrieval database increases, the accuracy of general face recognition algorithms will drop significantly. As shown in Table 1 below, it is the same retrieval algorithm model, under the one-thousandth error rate, the first hit rate of faces of different retrieval database scales.

表1检索准确率与检索库规模关系Table 1 Relation between retrieval accuracy and retrieval database size

检索库规模/人数Search library size/number of people 100100 500500 10001000 准确率/%Accuracy/% 100100 99.299.2 78.478.4

为了解决上述至少一种技术问题,本申请提供了一种检索库管理方法、检索方法、装置及介质。In order to solve at least one of the above technical problems, the present application provides a retrieval library management method, retrieval method, device and medium.

为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。In order to make the objectives, technical solutions and advantages of the present application clearer, the embodiments of the present application will be further described in detail below with reference to the accompanying drawings.

请参考图1,其示出了本申请实施例提供的一种实施环境的示意图。该实施环境可以包括:终端10、与所述终端10通过网络连接的服务器20。Please refer to FIG. 1 , which shows a schematic diagram of an implementation environment provided by an embodiment of the present application. The implementation environment may include: a terminal 10 and a server 20 connected to the terminal 10 through a network.

终端10具体可以包括运行于实体设备中的软体,例如安装在设备上的应用等,也可以包括安装有应用的智能手机、台式电脑、平板电脑、笔记本电脑、数字助理、智能可穿戴设备等类型的实体设备中至少一种。具体的,终端10上运行有操作系统,该操作系统可以是视窗(Windows)操作系统或Linux操作系统或者Mac OS(苹果桌面操作系统)等桌面操作系统,也可以是iOS(苹果移动终端操作系统)或者安卓(Android)操作系统等移动操作系统。The terminal 10 may specifically include software running in the physical device, such as applications installed on the device, etc., and may also include types such as smart phones, desktop computers, tablet computers, notebook computers, digital assistants, and smart wearable devices installed with applications. at least one of the physical devices. Specifically, an operating system runs on the terminal 10, and the operating system may be a Windows (Windows) operating system, a Linux operating system, a desktop operating system such as a Mac OS (Apple desktop operating system), or an iOS (Apple mobile terminal operating system) ) or a mobile operating system such as the Android operating system.

服务器20可以是独立的服务器,也可以是由多个独立服务器组成的服务器集群,或者分布式服务器,或者是提供云计算服务器、云数据库、云存储等基础云计算服务的云服务器。分布式服务器具体可以为区块链(Block Chain)结构,该区块链结构中的任意一个节点都可以执行或参与执行检索库管理方法或检索方法。The server 20 may be an independent server, or a server cluster composed of multiple independent servers, or a distributed server, or a cloud server that provides basic cloud computing services such as cloud computing servers, cloud databases, and cloud storage. The distributed server may specifically be a block chain (Block Chain) structure, and any node in the block chain structure can execute or participate in executing the retrieval library management method or retrieval method.

应理解,图1中示出的实施环境仅仅是与本申请方案一种应用环境,并不构成对本申请方案应用环境的限定,其他的应用环境还可以包括比图中所示更多或更少的计算机设备,或者计算机设备网络连接关系。It should be understood that the implementation environment shown in FIG. 1 is only an application environment of the solution of the present application, and does not constitute a limitation to the application environment of the solution of the present application. Other application environments may also include more or less than those shown in the figure. computer equipment, or the network connection relationship of computer equipment.

以下介绍本申请一种检索库管理方法的具体实施例,图2是本申请实施例提供的一种检索库管理方法的流程示意图,本申请提供了如实施例或流程图所述的方法操作步骤,但基于常规或者无创造性的劳动可以包括更多或者更少的操作步骤。实施例中列举的步骤顺序仅仅为众多步骤执行顺序中的一种方式,不代表唯一的执行顺序。如图2所示,该方法的执行主体可以为上述应用环境中的的服务器,该检索库包括热检索库和冷检索库,该方法可以包括:A specific embodiment of a retrieval database management method of the present application is introduced below. FIG. 2 is a schematic flowchart of a retrieval database management method provided by an embodiment of the present application. The present application provides the operation steps of the method as described in the embodiment or the flowchart. , but based on routine or non-creative work may include more or fewer operating steps. The sequence of steps enumerated in the embodiments is only one of the execution sequences of many steps, and does not represent the only execution sequence. As shown in FIG. 2 , the execution body of the method can be a server in the above application environment, the retrieval library includes a hot retrieval library and a cold retrieval library, and the method can include:

S201:在热检索库中存储的热数据的总数大于第一预设规模阈值的情况下,获取所述热检索库中存储的各热数据的属性参数。S201: In the case that the total number of hot data stored in the hot search library is greater than a first preset scale threshold, acquire attribute parameters of each hot data stored in the hot search library.

在本实施例,热检索库为用于进行检索的检索底库。热检索库中存储的大量的用于进行检索比对的热数据,该热数据包括但不限于为用户的身份标识和用户的基本数据。该用户的基本数据包括对象数据,例如人脸图像、人脸特征图像、声纹、虹膜等。In this embodiment, the hot search library is a search base library used for searching. A large amount of hot data for retrieval and comparison stored in the hot retrieval library includes, but is not limited to, the user's identity and basic data of the user. The basic data of the user includes object data, such as a face image, a face feature image, a voiceprint, an iris, and the like.

示例的,以检测人脸的场景为例,若用户A进入某智能零售商场,采集设备采集到该用户A的人脸图像,之后热检索库获取到该用户A的人脸图像,并基于人脸图像的特征在热检索库中是否存在对应的人脸图像特征,若存在,则说明该用户A之前已经到访过,若不存在,则说明该用户A是新到访用户,则需要为该用户注册身份标识ID,并将该注册的身份标识ID与相应的人脸图像特征作为新增的热数据,进行关联存储于热检索库中。As an example, taking the scene of face detection as an example, if user A enters a smart retail mall, the collection device collects the face image of user A, and then the hot search library obtains the face image of user A, and based on the Whether the feature of the face image has a corresponding face image feature in the hot search database, if it exists, it means that the user A has visited before; if it does not exist, it means that the user A is a new visiting user, then it needs to The user registers an identification ID, and associates the registered identification ID with the corresponding face image feature as newly added hot data, and stores them in the hot retrieval database.

冷检索库为用于存储待管理的冷数据。在满足预设管理条件下,对在冷检索库中的冷数据和热检索库中的热数据进行管理,该管理包括但不限于为将数据迁移、数据删除、数据合并、数据转存至磁盘、数据同步等。The cold retrieval library is used to store cold data to be managed. Manage the cold data in the cold retrieval library and the hot data in the hot retrieval library under the pre-set management conditions, including but not limited to data migration, data deletion, data merging, and data transfer to disk , data synchronization, etc.

在智慧零售、智慧社区、智慧金融、公安安防等应用场景,会有大量的不断来访的用户。对不断大量来访的用户进行身份识别,通常采用检索算法在数据库中进行检索。由于发明人发现,检索算法与数据库的规模大小有关。因此,为了提高检索准确性,在检索底库超过第一预设规模阈值时,则检索底库的数量已经超出检索算法的计算能力,存在检索结果准确性较低的问题,因此需要触发对热检索库的管理策略。该第一预设规模阈值根据检索算法的理论准确率、检索模型理论精度和理论计算能力进行适配性调整。In smart retail, smart community, smart finance, public security security and other application scenarios, there will be a large number of users visiting constantly. To identify the users who visit constantly and in large numbers, the retrieval algorithm is usually used to search in the database. As the inventors discovered, the retrieval algorithm is related to the size of the database. Therefore, in order to improve the retrieval accuracy, when the retrieval base exceeds the first preset size threshold, the number of retrieval bases has exceeded the computing power of the retrieval algorithm, and there is a problem that the accuracy of retrieval results is low. Retrieve the management policy of the library. The first preset scale threshold is adaptively adjusted according to the theoretical accuracy rate of the retrieval algorithm, the theoretical accuracy of the retrieval model, and the theoretical computing capability.

在所述热检索库中存储的热数据的总数大于第一预设规模阈值的情况下,则代表触发对热检索库的管理条件。此时,需要获取所述热检索库中存储的各热数据的属性参数,以便于筛选需要进行数据管理的热数据。该热数据的属性参数包括热数据的记录时间和热数据关联的用户活跃度中至少一个。示例的,该记录时间可为记录的热数据关联的用户的到访时间,该活跃度可根据用户在预设时间段(例如一周、一个月、半年或一年等)内到访的次数。When the total number of hot data stored in the hot search library is greater than the first preset scale threshold, it represents a triggering condition for the management of the hot search library. At this time, it is necessary to obtain the attribute parameters of each hot data stored in the hot retrieval library, so as to facilitate the screening of hot data that needs to be managed. The attribute parameter of the hot data includes at least one of a recording time of the hot data and a user activity associated with the hot data. For example, the recorded time may be the visit time of the user associated with the recorded hot data, and the activity may be based on the number of times the user visits within a preset time period (eg, one week, one month, half a year, or one year, etc.).

需要说明的是,该第一预设规模阈值的数值并不固定,其可根据检索算法进行适配性配置,从而匹配不同的检索算法,提高检索性能。It should be noted that the value of the first preset scale threshold is not fixed, and can be adaptively configured according to the retrieval algorithm, so as to match different retrieval algorithms and improve retrieval performance.

在实际应用中,以检测人脸的场景为例,若指定人脸检索算法所达到检索准确率要求对应的额定底库规模为1万,则可配置该第一预设规模阈值为1万。之后,在检测到当前热检索库中存储的热数据的总数大于该第一预设规模阈值的情况下,则触发对热检索库进行数据库管理。在进行数据库管理中,可在非日常工作时间(例如夜间、凌晨或者其他检索请求较少时段),获取所述热检索库中存储的各热数据的属性参数(比如该热数据的存储时间、活跃度),之后基于该属性参数对满足预定条件的热数据进行管理,从而可避免影响工作时段正常人脸特征增删改查业务,充分利用了机器晚间资源。In practical applications, taking the scene of face detection as an example, if the specified database size corresponding to the retrieval accuracy requirement of the face retrieval algorithm is 10,000, the first preset size threshold can be configured to be 10,000. After that, when it is detected that the total number of hot data stored in the current hot search library is greater than the first preset scale threshold, database management of the hot search library is triggered. During database management, the attribute parameters of each hot data stored in the hot search database (such as the storage time, Activeness), and then manage the hot data that meets the predetermined conditions based on this attribute parameter, so as to avoid affecting the normal face feature addition, deletion, modification, and query business during working hours, and make full use of machine night resources.

在其他实施例,还可在确定热检索库中存储的热数据的总数小于第一预设规模阈值k1的情况下,可先不触发对热检索库的深度数据管理;进一步判断热检索库中存储的热数据的总数是否大于第三预设规模阈值k3(k3小于k1),若判断结果为是,则再触发对热检索库的数据管理。从而,通过精简热检索库中数据总数,进一步提高基于该热检索库的检索效率和准确率。In other embodiments, when it is determined that the total number of hot data stored in the hot search library is less than the first preset scale threshold k1, the deep data management of the hot search library may not be triggered first; Whether the total number of stored hot data is greater than the third preset scale threshold k3 (k3 is less than k1), if the judgment result is yes, then trigger the data management of the hot search database again. Therefore, by reducing the total number of data in the hot search library, the search efficiency and accuracy rate based on the hot search library are further improved.

S203:若确定热数据的属性参数满足热数据管理条件,将满足热数据管理条件的热数据迁移至所述冷检索库。S203: If it is determined that the attribute parameter of the hot data meets the hot data management condition, migrate the hot data that meets the hot data management condition to the cold retrieval library.

在触发了对热检索库的数据管理,获取到热检索库中各热数据的属性参数后,需要从所有热数据中筛选出需要进行管理的热数据,之后对筛选的热数据进行相应的数据管理。After triggering the data management of the hot search library and obtaining the attribute parameters of each hot data in the hot search library, it is necessary to filter out the hot data that needs to be managed from all the hot data, and then perform corresponding data on the filtered hot data. manage.

所述热数据的属性参数所述热数据的属性参数包括热数据的最近一次记录时间戳和热数据关联的用户活跃度中至少一个。The attribute parameter of the hot data The attribute parameter of the hot data includes at least one of a timestamp of the last recording of the hot data and a user activity associated with the hot data.

在本实施例,如图3和4所示,所述热数据的属性参数包括热数据的最近一次记录时间戳和热数据关联的用户活跃度。所述若确定热数据的属性参数满足热数据管理条件,将满足热数据管理条件的热数据迁移至所述冷检索库,包括:In this embodiment, as shown in FIGS. 3 and 4 , the attribute parameters of the hot data include a timestamp of the latest recording of the hot data and a user activity associated with the hot data. If it is determined that the attribute parameter of the hot data meets the hot data management condition, the hot data that meets the hot data management condition is migrated to the cold retrieval library, including:

S301:若判定所述热数据的最近一次记录时间戳与第一预设时间的时间间隔大于等于第一预设阈值,且判定所述热数据关联的用户活跃度小于第二预设阈值,确定所述热数据的属性参数满足热数据管理条件。S301: If it is determined that the time interval between the last recorded timestamp of the hot data and the first preset time is greater than or equal to a first preset threshold, and it is determined that the user activity associated with the hot data is less than a second preset threshold, determine The attribute parameter of the thermal data satisfies the thermal data management condition.

其中,最近一次记录时间戳是表征热数据关联的用户身份最近一次到访所记录的时间。该第一预设时间包括但不限于当前管理时间、其他某一固定时刻等。该第一预设时间可根据实际应用场景下用户到访情况进行配置。示例的,在到访时间比较频繁的场景,例如在智慧社区场景下,由于用户需要经常出入社区,则可配置该第一预设时间为较小值,比如半个月、一个月、季度或半年等。在到访时间不频繁的场景,例如在智慧零售场景下,由于用户是偶尔进入零售商场,则可配置该第一预设时间为较大值,比如季度、半年、一年等。The last recorded timestamp is the time recorded by the last visit of the user identity associated with the hot data. The first preset time includes, but is not limited to, the current management time, some other fixed time, and the like. The first preset time may be configured according to the user's visit situation in an actual application scenario. For example, in a scenario with frequent visits, such as in a smart community scenario, since the user needs to enter and leave the community frequently, the first preset time can be set to a smaller value, such as half a month, a month, quarterly or Wait for half a year. In scenarios with infrequent visit times, such as in smart retail scenarios, since the user occasionally enters the retail mall, the first preset time can be set to a larger value, such as quarterly, half-yearly, one-year, etc.

对象活跃度是表征该热数据对应的用户在第二预设时间段的活跃指数。具体的,对象活跃度可根据用户在预设时间段的活跃次数和该第二预设时间段来确定。该活跃次数可为累计到访次数、累计使用次数等。该第二预设时间和活跃次数可根据实际应用场景下用户到访情况进行配置。示例的,在到访比较频繁的场景,例如在智慧社区场景下,由于用户需要经常出入社区,则可配置该第二预设时间为较小值且活跃次数为较大值,比如配置第二预设时间为近三天、近一周、近半个月等;相应的活跃次数可根据情况而定,比如可设置为5~20次。在到访不频繁的场景,例如在智慧零售场景下,由于用户是偶尔进入零售商场,则可配置该第二预设时间为较大值,比如近半个月、近一个月等;相应的活跃次数可根据情况而定,比如可设置为3~8次。The object activity is an activity index representing the user corresponding to the hot data in the second preset time period. Specifically, the object activity degree may be determined according to the number of times the user is active in the preset time period and the second preset time period. The active number of times may be the cumulative number of visits, the cumulative number of uses, and the like. The second preset time and the number of active times may be configured according to user visits in an actual application scenario. For example, in a scenario with frequent visits, such as in a smart community scenario, since the user needs to enter and leave the community frequently, the second preset time can be configured to be a smaller value and the number of activities to be a larger value, such as configuring the second preset time. The preset time is nearly three days, nearly a week, nearly half a month, etc.; the corresponding active times can be determined according to the situation, for example, it can be set to 5 to 20 times. In scenarios with infrequent visits, such as in smart retail scenarios, since the user occasionally enters the retail mall, the second preset time can be set to a larger value, such as nearly half a month, nearly a month, etc.; correspondingly The number of active times can be determined according to the situation, for example, it can be set to 3 to 8 times.

示例的,在智慧零售场景下,在预设管理周期内,查询到用户B的用户标识对应的热数据在热数据底库的最近一次记录时间戳为t1。在近一个月里,查询到用户B到访该智慧零售商场的累计次数为m次。若t1与当前查询时间间隔大于第一预设阈值(比如一年),且m小于第二预设阈值(比如5次),则确定用户B对应的热数据的属性参数满足热数据管理条件。For example, in a smart retail scenario, within a preset management period, the last recorded time stamp of the hot data base corresponding to the user ID of user B in the hot data base is t1. In the past month, the cumulative number of times user B visited the smart retail mall was found to be m times. If the interval between t1 and the current query time is greater than the first preset threshold (eg, one year), and m is less than the second preset threshold (eg, 5 times), it is determined that the attribute parameter of the hot data corresponding to user B satisfies the hot data management condition.

在其他实施例,所述热数据的属性参数可以仅包括热数据的最近一次记录时间戳或热数据关联的用户活跃度。比如,在安防场景下,所述热数据的属性参数可以仅包括最近一次记录时间戳。In other embodiments, the attribute parameter of the hot data may only include a timestamp of the latest recording of the hot data or a user activity associated with the hot data. For example, in a security scenario, the attribute parameter of the thermal data may only include the timestamp of the latest recording.

本申请的热检索库通过热数据的最近一次记录时间戳和/或关联用户的活跃度进行热数据对应的身份筛选,保证了检索结果的时效性和可靠性。The hot retrieval library of the present application performs the identity screening corresponding to the hot data through the last recorded time stamp of the hot data and/or the activity of the associated user, which ensures the timeliness and reliability of the retrieval results.

S303:将满足热数据管理条件的热数据确定为待管理热数据。S303: Determine the hot data that meets the hot data management condition as the hot data to be managed.

S305:将所述待管理热数据迁移至所述冷检索库。S305: Migrate the hot data to be managed to the cold retrieval library.

在达到预设管理周期,将该待管理热数据迁移至冷检索库。该预设管理周期包括但不限于为一天、一周、一个月等。When the preset management period is reached, the hot data to be managed is migrated to the cold retrieval library. The preset management period includes, but is not limited to, one day, one week, one month, and the like.

在一实施例,所述将所述待管理热数据迁移至所述冷检索库,包括:In one embodiment, the migrating the hot data to be managed to the cold retrieval library includes:

S3051:计算所述待管理热数据分别与所述冷检索库中各冷数据的第二匹配度。S3051: Calculate a second degree of matching between the hot data to be managed and each cold data in the cold retrieval library.

其中,第二匹配度用于表征待管理热数据分别与所述冷检索库中各冷数据的相似程度。该第二匹配度的数量为多个,其可以通过通过计算待管理热数据分别与所有冷数据的相似度来确定。该相似度包括但不限于为余弦相似度。The second matching degree is used to represent the degree of similarity between the hot data to be managed and the cold data in the cold retrieval library. The number of the second matching degrees is multiple, which can be determined by calculating the similarity between the hot data to be managed and all the cold data respectively. The similarity includes, but is not limited to, cosine similarity.

S3053:在所述待管理热数据对应的多个第二匹配度中,判断最大第二匹配度是否大于等于第三预设阈值。S3053: Among the multiple second matching degrees corresponding to the thermal data to be managed, determine whether the maximum second matching degree is greater than or equal to a third preset threshold.

S3055:若判定结果为是,将最大第二匹配度大于等于第三预设阈值的的待管理热数据与冷数据进行数据合并;S3055: If the determination result is yes, perform data merging of the hot data to be managed and the cold data whose maximum second matching degree is greater than or equal to the third preset threshold;

S3057:若判定结果为否,将最大第二匹配度小于第三预设阈值的的待管理热数据存储至所述冷检索库。S3057: If the determination result is no, store the hot data to be managed whose maximum second matching degree is less than the third preset threshold into the cold retrieval library.

其中,第三预设阈值为预设值,其数值可配置但不限于为0.8~1中任意值。数据合并是指将待管理热数据合并至对应的冷数据存储位置处。在所述待管理热数据对应的多个第二匹配度中,若判定最大第二匹配度大于等于第三预设阈值,则表明冷检索库中已经存储有与该最大第二匹配度对应的待管理热数据相似的冷数据,则需要将对应的待管理热数据与冷数据进行数据合并,并删除热检索库中的待管理热数据。与之对应,若判定最大第二匹配度小于第三预设阈值,则表明该最大第二匹配度对应的待管理热数据为新的冷数据,则将其直接存储至冷检索库中。Wherein, the third preset threshold is a preset value, and its value can be configured but not limited to be any value from 0.8 to 1. Data merging refers to merging the hot data to be managed into the corresponding cold data storage locations. Among the plurality of second matching degrees corresponding to the hot data to be managed, if it is determined that the maximum second matching degree is greater than or equal to the third preset threshold, it indicates that the cold retrieval library has already stored the matching degree corresponding to the maximum second matching degree. For cold data that is similar to the hot data to be managed, the corresponding hot data to be managed and the cold data need to be merged, and the hot data to be managed in the hot search database is deleted. Correspondingly, if it is determined that the maximum second matching degree is less than the third preset threshold, it indicates that the hot data to be managed corresponding to the maximum second matching degree is new cold data, and it is directly stored in the cold retrieval library.

本申请通过热检索库的第一预设规模阈值触发热库降级策略,将库内身份时效性差的下沉到冷库,进而保证热检索库的规模和身份最新,提升检索正确率和速度。该热库降级策略保证了检索底库的热数据都是经时间活跃度下沉后的热库身份,使得返回的检索结果都是查询身份对应最近的对象信息。The present application triggers the hot storage downgrade strategy through the first preset scale threshold of the hot retrieval database, and sinks the identities in the database with poor timeliness to the cold storage, thereby ensuring the latest size and identity of the hot retrieval database, and improving the retrieval accuracy and speed. The hot library downgrade strategy ensures that the hot data of the retrieval base library is the hot library identity after the time activity has decreased, so that the returned retrieval results are all the latest object information corresponding to the query identity.

S205:计算冷检索库中存储的各冷数据分别与所述热检索库中各热数据的第一匹配度。S205: Calculate a first degree of matching between each cold data stored in the cold retrieval library and each hot data in the hot retrieval library.

其中,第一匹配度用于表征冷检索库中存储的各冷数据分别与所述热检索库中各热数据的相似程度。该第一匹配度的数量可为多个,其可以通过计算所有冷数据与所有热数据的相似度来确定。该相似度包括但不限于为余弦相似度。The first matching degree is used to represent the degree of similarity between each cold data stored in the cold retrieval library and each hot data in the hot retrieval library. The number of the first matching degrees can be multiple, which can be determined by calculating the similarity between all the cold data and all the hot data. The similarity includes, but is not limited to, cosine similarity.

S207:若确定第一匹配度满足冷数据管理条件,将满足冷数据管理条件的冷数据迁移至所述热检索库。S207: If it is determined that the first matching degree satisfies the cold data management condition, migrate the cold data that meets the cold data management condition to the hot retrieval library.

利用冷检索库中的身份检索热检索库,在计算了冷检索库中所有冷数据与热检索库中所有热数据的第一匹配度,若存在满足冷数据管理条件的冷数据,则对该冷数据进行升库管理。Using the identity in the cold retrieval database to retrieve the hot retrieval database, the first matching degree of all the cold data in the cold retrieval database and all the hot data in the hot retrieval database is calculated. Cold data is upgraded and managed.

在本实施例,如图5所示,所述若确定第一匹配度满足冷数据管理条件,将满足冷数据管理条件的冷数据迁移至所述热检索库,包括:In this embodiment, as shown in FIG. 5 , if it is determined that the first matching degree satisfies the cold data management condition, the cold data that meets the cold data management condition is migrated to the hot retrieval library, including:

S401:在各个冷数据对应的多个第一匹配度中,若判定最大第一匹配度大于等于第四预设阈值,确定第一匹配度满足冷数据管理条件。S401: Among the multiple first matching degrees corresponding to each cold data, if it is determined that the maximum first matching degree is greater than or equal to a fourth preset threshold, determine that the first matching degree satisfies the cold data management condition.

S403:将满足冷数据管理条件的冷数据确定为待管理冷数据。S403: Determine the cold data that meets the cold data management condition as the cold data to be managed.

S405:将所述待管理冷数据迁移至所述热检索库。S405: Migrate the cold data to be managed to the hot retrieval library.

其中,第四预设阈值可为预设值,其数值可配置但不限于为0.8~1中任意值。用冷检索库中的身份检索热检索库,在各个冷数据对应的多个第一匹配度中,若判定最大第一匹配度大于等于第四预设阈值,则表明热检索库中已经存储有与该最大第一匹配度对应的冷数据相似的热数据,也就是说,该冷数据的身份对应的用户最近又到访了,故热检索库中注册了该用户的身份信息,将满足冷数据管理条件对应的冷数据确定为待管理冷数据并进行数据迁移至热检索库。示例的,在一种情况,可以直接将冷检索库的用户标识覆盖热检索库的用户标识,并删除冷检索库该身份对应冷数据。在另一情况,可以直接删除删除冷检索库该身份对应冷数据。Wherein, the fourth preset threshold may be a preset value, and its value can be configured but not limited to be any value from 0.8 to 1. The hot search library is retrieved using the identity in the cold search library. Among the multiple first matching degrees corresponding to each cold data, if it is determined that the maximum first matching degree is greater than or equal to the fourth preset threshold, it indicates that there are already stored in the hot search library. Hot data that is similar to the cold data corresponding to the maximum first matching degree, that is to say, the user corresponding to the identity of the cold data has recently visited again, so the identity information of the user is registered in the hot retrieval database, which will satisfy the cold data. The cold data corresponding to the data management condition is determined as the cold data to be managed, and the data is migrated to the hot retrieval database. For example, in one case, the user identifier of the cold retrieval library may be directly overwritten with the user identifier of the hot retrieval library, and the cold data corresponding to the identity of the cold retrieval library may be deleted. In another case, the cold data corresponding to the identity of the cold retrieval repository can be deleted directly.

与之对应,若判定最大第一匹配度小于第四预设阈值,则表明冷数据的身份对应的用户最近并没有重新到访,此时对该冷数据不进行数据迁移,仍将该冷数据存储至冷检索库中。Correspondingly, if it is determined that the maximum first matching degree is less than the fourth preset threshold, it means that the user corresponding to the identity of the cold data has not re-visited recently. At this time, the cold data will not be migrated, and the cold data will still be Stored in cold retrieval library.

本申请实施例的检索库包括热检索库和冷检索库,通过在所述热检索库中存储的热数据的总数大于第一预设规模阈值的情况下,获取所述热检索库中存储的各热数据的属性参数;若确定热数据的属性参数满足热数据管理条件,将满足热数据管理条件的热数据迁移至所述冷检索库;计算所述冷检索库中存储的各冷数据分别与所述热检索库中各热数据的第一匹配度;若确定第一匹配度满足冷数据管理条件,将满足冷数据管理条件的冷数据迁移至所述热检索库。从而,通过将检索库的冷热数据进行分离,在热检索库中存储的热数据的总数大于第一预设规模阈值的情况下,触发冷热库升降级逻辑,保证了热检索库的规模在阈值范围内,提高了热检索库中热数据的有效性,也利于提高基于管理后的热检索库的检索可靠性、准确性和检索效率。The retrieval database in the embodiment of the present application includes a hot retrieval database and a cold retrieval database, and when the total number of hot data stored in the hot retrieval database is greater than a first preset scale threshold, obtain the data stored in the hot retrieval database. attribute parameters of each hot data; if it is determined that the attribute parameters of the hot data meet the hot data management conditions, the hot data that meets the hot data management conditions are migrated to the cold retrieval library; calculate the respective cold data stored in the cold retrieval library The first matching degree with each hot data in the hot retrieval library; if it is determined that the first matching degree satisfies the cold data management condition, the cold data satisfying the cold data management condition is migrated to the hot retrieval library. Therefore, by separating the hot and cold data of the retrieval library, when the total number of hot data stored in the hot retrieval library is greater than the first preset scale threshold, the upgrading and upgrading logic of the hot and cold library is triggered to ensure the scale of the hot retrieval library. Within the threshold range, the validity of the hot data in the hot search library is improved, and it is also beneficial to improve the search reliability, accuracy and search efficiency of the managed hot search library.

在一些实施例,如图6所示,所述方法还可包括:In some embodiments, as shown in FIG. 6 , the method may further include:

S501:检测所述冷检索库中存储的冷数据的总数;S501: Detect the total number of cold data stored in the cold retrieval library;

S503:若所检测的冷数据的总数大于第二预设规模阈值,将所检测的冷数据的总数与第二预设规模阈值的差值作为待转存数量N,其中N为正整数;S503: If the total number of detected cold data is greater than the second preset size threshold, use the difference between the total number of detected cold data and the second preset size threshold as the number N to be dumped, where N is a positive integer;

S505:按照最近一次记录时间的先后顺序对所述冷检索库中所有冷数据进行排序;S505: Sort all the cold data in the cold retrieval library according to the order of the latest recording time;

S507:将排序靠前的N个冷数据转存至磁盘中。S507: Dump the top N cold data to the disk.

其中,第二预设规模阈值可以根据冷检索库的存储性能进行确定。示例的,该第二预设规模阈值包括但不限于为10万、50万、100万等。The second preset scale threshold may be determined according to the storage performance of the cold retrieval library. Exemplarily, the second preset scale threshold includes, but is not limited to, 100,000, 500,000, 1,000,000, and the like.

具体的,在完成上述冷库升热库流程后,判断当前冷检索库中所有冷数据的总数,若达到第二预设规模阈值(可配置如100万)则触发冷库清理模块,对剩余的所有冷检索库的身份按照最近一次到访时间排序,并对冷检索库进行数据清理。主要完成将冷检索库内到访时间最久远的冷数据(例如TOP N个)从内存写到磁盘并将其从内存删除,以保证冷检索库规模可控。写到磁盘上的冷库信息主要用作备份,业务按照时间或者其他维度选取需要的身份ID进行线下处理或者针对性的重新激活重新入热库。Specifically, after completing the above process of upgrading the cold storage and heating, the total number of all cold data in the current cold retrieval library is determined, and if the second preset scale threshold (configurable, such as 1 million) is reached, the cold storage cleaning module is triggered. The identities of the cold retrieval database are sorted according to the last visit time, and the data of the cold retrieval database is cleaned. The main task is to write the cold data with the longest access time (such as TOP N) in the cold retrieval library from memory to disk and delete them from memory to ensure that the scale of the cold retrieval library is controllable. The cold storage information written to the disk is mainly used for backup, and the business selects the required ID according to time or other dimensions for offline processing or targeted reactivation and re-entering the hot storage.

在一些实施例,所述热检索库存储于用于匹配待检索对象的图像处理器或声音处理器对应的存储器中,所述冷检索库存储于中央处理器对应的存储器中;In some embodiments, the hot retrieval library is stored in a memory corresponding to an image processor or a sound processor for matching objects to be retrieved, and the cold retrieval library is stored in a memory corresponding to a central processing unit;

所述待检索对象包括但不限于为以下至少一种:人脸、音频和虹膜。The object to be retrieved includes, but is not limited to, at least one of the following: human face, audio and iris.

示例的,以检索人脸的场景为例,若检索库中有N个人脸身份,可将满足时效要求的M个人脸身份存储在图像处理器GPU的显存中,将剩余的N-M个人脸身份作为冷数据存储在中央处理器CPU的内存中。从而,通过将冷热数据分开存储,大大降低对高性能的GPU的依赖和需求,降低检索库的存储成本。As an example, taking the scene of retrieving faces as an example, if there are N face identities in the retrieval database, M face identities that meet the aging requirements can be stored in the video memory of the image processor GPU, and the remaining N-M face identities can be used as Cold data is stored in the memory of the central processing unit CPU. Therefore, by storing the hot and cold data separately, the dependence and demand on the high-performance GPU is greatly reduced, and the storage cost of the retrieval library is reduced.

本申请通过将检索库进行冷热分离,互不影响,在数据库管理过程中不影响线上请求,充分利用时间资源白天线上检索热检索库,夜晚利用冷检索库检索热检索库。同时,将查询热度较高的热检索库存储在GPU显存,将身份不活跃的冷检索库存储在CPU内存,且保证了热库和冷库规模在规定阈值内,热库规模上限达到阈值后触发冷热库升降级流程;冷库规模上限达到阈值后触发清库逻辑,保证最大限度利用机器CPU、GPU和磁盘资源,降低了对成本高昂的GPU的需求,可降低检索成本。The present application separates the retrieval database from hot and cold, without affecting each other, and does not affect online requests during the database management process, making full use of time resources to retrieve the hot retrieval database online during the day, and use the cold retrieval database to retrieve the hot retrieval database at night. At the same time, the hot search library with high query heat is stored in GPU memory, and the cold search library with inactive identity is stored in CPU memory, and the scale of hot library and cold library is guaranteed to be within the specified threshold, and the hot library scale is triggered when the upper limit reaches the threshold. Cold and hot storage upgrade and upgrade process; when the upper limit of the cold storage scale reaches the threshold, the storage clearance logic is triggered to ensure maximum utilization of machine CPU, GPU and disk resources, reducing the demand for expensive GPUs and reducing retrieval costs.

下述为本申请装置实施例,可以用于执行本申请方法实施例。对于本申请装置实施例中未披露的细节,请参照本申请方法实施例。The following are apparatus embodiments of the present application, which can be used to execute the method embodiments of the present application. For details not disclosed in the device embodiments of the present application, please refer to the method embodiments of the present application.

请参考图8,其示出了本申请实施例提供的一种检索库管理装置的结构框图。该装置具有实现上述方法示例中服务器侧的功能,所述功能可以由硬件实现,也可以由硬件执行相应的软件实现。所述检索库包括热检索库和冷检索库,所述装置60可以包括:Please refer to FIG. 8 , which shows a structural block diagram of a retrieval library management apparatus provided by an embodiment of the present application. The apparatus has the function of realizing the server side in the above method example, and the function may be realized by hardware or by executing corresponding software by the hardware. The retrieval library includes a hot retrieval library and a cold retrieval library, and the apparatus 60 may include:

属性获取模块601,用于在所述热检索库中存储的热数据的总数大于第一预设规模阈值的情况下,获取所述热检索库中存储的各热数据的属性参数;An attribute acquisition module 601, configured to acquire attribute parameters of each hot data stored in the hot search library when the total number of hot data stored in the hot search library is greater than a first preset scale threshold;

第一迁移模块602,用于若确定热数据的属性参数满足热数据管理条件,将满足热数据管理条件的热数据迁移至所述冷检索库;A first migration module 602, configured to migrate the hot data satisfying the hot data management condition to the cold retrieval library if it is determined that the attribute parameter of the hot data satisfies the hot data management condition;

第一计算模块603,用于计算所述冷检索库中存储的各冷数据分别与所述热检索库中各热数据的第一匹配度;a first calculation module 603, configured to calculate the first matching degree of each cold data stored in the cold search library and each hot data in the hot search library;

第二迁移模块604,用于若确定第一匹配度满足冷数据管理条件,将满足冷数据管理条件的冷数据迁移至所述热检索库。The second migration module 604 is configured to migrate the cold data satisfying the cold data management condition to the hot retrieval library if it is determined that the first matching degree satisfies the cold data management condition.

在一些实施例,所述热数据的属性参数包括热数据的最近一次记录时间戳和热数据关联的用户活跃度中至少一个。In some embodiments, the attribute parameter of the hot data includes at least one of a timestamp of the last recording of the hot data and a user activity associated with the hot data.

在一些实施例,在所述热数据的属性参数包括热数据的最近一次记录时间戳和热数据关联的用户活跃度的情况下,所述第一迁移模块包括:In some embodiments, in the case that the attribute parameter of the hot data includes the last recorded time stamp of the hot data and the user activity level associated with the hot data, the first migration module includes:

第一确定单元,用于若判定所述热数据的最近一次记录时间戳与第一预设时间的时间间隔大于等于第一预设阈值,且判定所述热数据关联的用户活跃度小于第二预设阈值,确定所述热数据的属性参数满足热数据管理条件;The first determination unit is used for determining that the time interval between the last recorded timestamp of the hot data and the first preset time is greater than or equal to a first preset threshold, and determining that the user activity associated with the hot data is less than the second Presetting a threshold value to determine that the attribute parameter of the thermal data satisfies the thermal data management condition;

第二确定单元,用于将满足热数据管理条件的热数据确定为待管理热数据;a second determining unit, configured to determine thermal data that satisfies the thermal data management condition as thermal data to be managed;

第一迁移单元,用于将所述待管理热数据迁移至所述冷检索库。A first migration unit, configured to migrate the hot data to be managed to the cold retrieval library.

在一些实施例,所述第一迁移单元包括:In some embodiments, the first migration unit includes:

计算子单元,用于计算所述待管理热数据分别与所述冷检索库中各冷数据的第二匹配度;a calculation subunit, configured to calculate the second degree of matching between the hot data to be managed and each cold data in the cold retrieval library;

判定单元,用于在所述待管理热数据对应的多个第二匹配度中,判断最大第二匹配度是否大于等于第三预设阈值;a determining unit, configured to determine whether the maximum second matching degree is greater than or equal to a third preset threshold among the plurality of second matching degrees corresponding to the thermal data to be managed;

数据合并子单元,用于若判定结果为是,将最大第二匹配度大于等于第三预设阈值的的待管理热数据与冷数据进行数据合并;a data merging subunit, configured to perform data merging of hot data to be managed and cold data whose maximum second matching degree is greater than or equal to a third preset threshold if the determination result is yes;

存储子单元,用于若判定结果为否,将最大第二匹配度小于第三预设阈值的待管理热数据存储至所述冷检索库。The storage subunit is configured to store the hot data to be managed whose maximum second matching degree is less than the third preset threshold in the cold retrieval library if the determination result is negative.

在一些实施例,所述第二迁移模块包括:In some embodiments, the second migration module includes:

第三确定单元,用于在各个冷数据对应的多个第一匹配度中,若判定最大第一匹配度大于等于第四预设阈值,确定第一匹配度满足冷数据管理条件;a third determining unit, configured to, among the multiple first matching degrees corresponding to each cold data, determine that the first matching degree satisfies the cold data management condition if it is determined that the maximum first matching degree is greater than or equal to a fourth preset threshold;

第四确定单元,用于将满足冷数据管理条件的冷数据确定为待管理冷数据;a fourth determining unit, configured to determine cold data that meets the cold data management condition as cold data to be managed;

第二迁移单元,用于将所述待管理冷数据迁移至所述热检索库。A second migration unit, configured to migrate the cold data to be managed to the hot retrieval library.

在一些实施例,所述装置还包括:In some embodiments, the apparatus further includes:

检测模块,用于检测所述冷检索库中存储的冷数据的总数;a detection module for detecting the total number of cold data stored in the cold retrieval library;

数量确定模块,用于若所检测的冷数据的总数大于第二预设规模阈值,将所检测的冷数据的总数与第二预设规模阈值的差值作为待转存数量N,其中N为正整数;A quantity determination module, configured to use the difference between the total number of detected cold data and the second preset size threshold as the number N to be dumped if the total number of detected cold data is greater than the second preset size threshold, where N is positive integer;

排序模块,用于按照最近一次记录时间的先后顺序对所述冷检索库中所有冷数据进行排序;a sorting module, used for sorting all the cold data in the cold retrieval library according to the order of the latest record time;

转存模块,用于将排序靠前的N个冷数据转存至磁盘中。The dump module is used to dump the top N cold data to the disk.

在一些实施例,所述热检索库存储于用于匹配待检索对象的图像处理器或声音处理器对应的存储器中,所述冷检索库存储于中央处理器对应的存储器中;In some embodiments, the hot retrieval library is stored in a memory corresponding to an image processor or a sound processor for matching objects to be retrieved, and the cold retrieval library is stored in a memory corresponding to a central processing unit;

所述待检索对象包括以下至少一种:人脸、音频和虹膜。The object to be retrieved includes at least one of the following: human face, audio and iris.

本申请实施例提供了一种检索库管理设备,该设备可以包括处理器和存储器,该存储器中存储有至少一条指令、至少一段程序、代码集或指令集,该至少一条指令、该至少一段程序、该代码集或指令集由该处理器加载并执行以实现如上述方法实施例所提供的检索库管理方法。An embodiment of the present application provides a retrieval library management device, the device may include a processor and a memory, and the memory stores at least one instruction, at least one piece of program, code set or instruction set, the at least one instruction, the at least one piece of program , the code set or the instruction set is loaded and executed by the processor to implement the retrieval library management method provided by the above method embodiments.

本申请实施例还提供了一种计算机可读存储介质,所述存储介质中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、至少一段程序、代码集或指令集由处理器加载并执行上述任一所述的检索库管理方法。Embodiments of the present application further provide a computer-readable storage medium, where at least one instruction, at least one piece of program, code set or instruction set is stored in the storage medium, the at least one instruction, at least one piece of program, code set or instruction set The set is loaded by the processor and executes any of the retrieval library management methods described above.

以下介绍本申请一种检索方法的具体实施例,图9是本申请实施例提供的一种检索方法的流程示意图,本申请提供了如实施例或流程图所述的方法操作步骤,但基于常规或者无创造性的劳动可以包括更多或者更少的操作步骤。实施例中列举的步骤顺序仅仅为众多步骤执行顺序中的一种方式,不代表唯一的执行顺序。如图9所示,该方法的执行主体可以为上述应用环境中的的服务器,采用经过数据管理的检索库进行检索,所述检索库包括热检索库和冷检索库,该方法可以包括:A specific embodiment of a retrieval method of the present application is introduced below. FIG. 9 is a schematic flowchart of a retrieval method provided by an embodiment of the present application. Or non-creative work may involve more or fewer operational steps. The sequence of steps enumerated in the embodiments is only one of the execution sequences of many steps, and does not represent the only execution sequence. As shown in FIG. 9 , the execution body of the method can be a server in the above application environment, and retrieval is performed by using a retrieval database managed by data, and the retrieval database includes a hot retrieval database and a cold retrieval database. The method can include:

S701:获取对象的待检索特征数据。S701: Acquire feature data to be retrieved of the object.

在本实施例,这里的对象可以为用户人脸、音频和虹膜等。In this embodiment, the objects here may be the user's face, audio, and iris.

示例的,以人脸检索为例,通过采集设备采集用户的人脸图像,之后对人脸图像进行图像处理,得到待检索特征数据。For example, taking face retrieval as an example, a user's face image is collected by a collection device, and then image processing is performed on the face image to obtain feature data to be retrieved.

该采集设备包括但不限于为摄像头抓拍人脸照、也可以是身份证自拍照以及视频原始帧。该图像处理包括但不限于为图像预处理、人脸检测和配准、人脸特征提取。其中,图像预处理包括图片降噪处理等。人脸检测和配准用于获取人脸框以及人脸配准获取人脸五点信息。人脸特征提取可以是基于人脸提特征SDK根据人脸五点计算特征,得到待检索特征数据。The collection device includes, but is not limited to, capturing a face photo for a camera, a self-portrait of an ID card, and an original video frame. The image processing includes, but is not limited to, image preprocessing, face detection and registration, and face feature extraction. Among them, the image preprocessing includes image noise reduction processing and the like. Face detection and registration are used to obtain face frames and face registration to obtain five-point face information. The facial feature extraction can be based on the facial feature extraction SDK that calculates the features according to the five points of the face, and obtains the feature data to be retrieved.

S703:在经过数据管理的热检索库中对所述待检索特征数据进行检索,得到检索结果。S703: Retrieve the feature data to be retrieved in the data-managed hot retrieval database to obtain a retrieval result.

S705:返回所述检索结果。S705: Return the retrieval result.

其中,所述检索库通过上述至少一种检索库管理方法进行数据管理。Wherein, the retrieval library performs data management through the above at least one retrieval library management method.

在一些实施例,所述检索方法还可包括对数据库进行管理的步骤,所述对数据库进行管理可以包括:In some embodiments, the retrieval method may further include the step of managing the database, and the managing the database may include:

在所述热检索库中存储的热数据的总数大于第一预设规模阈值的情况下,获取所述热检索库中各热数据的属性参数;In the case where the total number of hot data stored in the hot search library is greater than a first preset scale threshold, acquiring attribute parameters of each hot data in the hot search library;

若确定热数据的属性参数满足热数据管理条件,将满足热数据管理条件的热数据迁移至所述冷检索库;If it is determined that the attribute parameter of the hot data satisfies the hot data management condition, the hot data satisfying the hot data management condition is migrated to the cold retrieval library;

计算所述冷检索库中存储的各冷数据分别与所述热检索库中各热数据的第一匹配度;calculating a first degree of matching between each cold data stored in the cold retrieval library and each hot data in the hot retrieval library;

若确定第一匹配度满足冷数据管理条件,将满足冷数据管理条件的冷数据迁移至所述热检索库。If it is determined that the first matching degree satisfies the cold data management condition, the cold data satisfying the cold data management condition is migrated to the hot retrieval library.

需要说明的是,这里的对数据库进行管理的具体内容和有益效果与上述实施例类似,在此不再赘述。It should be noted that the specific content and beneficial effects of the database management here are similar to those of the above-mentioned embodiment, and are not repeated here.

示例的,当确定待检索特征数据后,采用常规的基于机器学习所训练的检索算法模型,在经过数据管理的热检索库中对所述待检索特征数据进行相似度检索,若检索到相似度达到预设阈值的热数据,则将对应的热数据作为检索结果并返回。若未检索到相似度达到预设阈值的热数据,则说明该待检索特征数据对应的用户为新身份用户,则在热检索库中为该新身份用户插入新身份及其对应的特征数据,并将该新身份新增入库。For example, after determining the feature data to be retrieved, a conventional retrieval algorithm model based on machine learning is used to perform similarity retrieval on the feature data to be retrieved in the hot retrieval database managed by data, if the similarity is retrieved. When the hot data reaches the preset threshold, the corresponding hot data is used as the retrieval result and returned. If no hot data whose similarity reaches the preset threshold is retrieved, it means that the user corresponding to the feature data to be retrieved is a new identity user, and a new identity and its corresponding feature data are inserted into the hot search database for the new identity user. and add the new identity to the library.

需要说明的是,利用上述任一所述的检索库管理方法对检索库进行数据管理是周期性进行的。It should be noted that the data management of the retrieval database by using any of the retrieval database management methods described above is performed periodically.

当热检索库的规模达到设定的阈值后,触发数据管理策略,完成热检索库内热数据的下沉,保存热检索库的规模可控和库内热数据的最新。冷检索库接收来自热检索库下沉的热数据,并每天定时(可配置)检索热检索库,完成与热检索库的检索高阈值合并,当冷检索库规模达到设定阈值后触发清库逻辑,保证冷检索库规模可控。When the size of the hot search library reaches the set threshold, the data management strategy is triggered to complete the sinking of the hot data in the hot search library, and the scale of the hot search library is controllable and the latest hot data in the library is kept. The cold retrieval library receives the hot data sinking from the hot retrieval library, and searches the hot retrieval library regularly (configurable) every day, and completes the combination with the high retrieval threshold of the hot retrieval library. Logic to ensure that the scale of the cold retrieval library is controllable.

本申请在检索过程中,利用经过数据管理的热检索库进行检索,而不在冷检索库进行检索,可大大提高检索效率。冷检索库作为过渡库,与热检索库存储在同一个检索库中,方便两者之间的数据读写,也利于对热检索库中的热数据进行高效管理。在数据库管理过程中,通过将检索库的冷热数据进行分离,在热检索库中存储的热数据的总数大于第一预设规模阈值的情况下,触发冷热库升降级逻辑,保证了作为检索底库的热检索库的规模在阈值范围内,既提高了热检索库中热数据的有效性,也提高了基于管理后的热检索库的检索可靠性、准确性和检索效率。During the retrieval process of the present application, the retrieval is carried out by using the hot retrieval database managed by the data instead of the cold retrieval database, which can greatly improve the retrieval efficiency. As a transition library, the cold retrieval library is stored in the same retrieval library as the hot retrieval library, which facilitates data reading and writing between the two, and is also conducive to efficient management of the hot data in the hot retrieval library. In the database management process, by separating the hot and cold data of the retrieval library, when the total number of hot data stored in the hot retrieval library is greater than the first preset scale threshold, the upgrade and upgrade logic of the hot and cold library is triggered to ensure that the The scale of the hot search database for the retrieval base is within the threshold range, which not only improves the validity of the hot data in the hot search library, but also improves the retrieval reliability, accuracy and retrieval efficiency of the managed hot search library.

下面以人脸检索场景为例具体说明本申请的检索方法。图10是本申请实现基于人脸检索场景的检索方法的框架示意图。如图10所示,该框架可包括人脸输入、人脸预处理和人脸检索三部分。The retrieval method of the present application is specifically described below by taking a face retrieval scenario as an example. FIG. 10 is a schematic diagram of the framework of the present application for implementing a retrieval method based on a face retrieval scenario. As shown in Figure 10, the framework can include three parts: face input, face preprocessing and face retrieval.

其中,1)人脸输入部分,可以通过采集模块获取人脸图片。该采集模块包括但不限于摄像头抓拍人脸照、也可以是身份证自拍照以及视频原始帧等。Among them, 1) the face input part, the face picture can be obtained through the acquisition module. The collection module includes, but is not limited to, a face photo captured by a camera, a self-portrait of an ID card, and an original video frame.

2)人脸预处理部分,可以通过人脸图片预处理模块、人脸检测/配准模块和人脸特征提取模块来实现。该人脸图片预处理模块用于对获取的人脸图片进行预处理,该预处理包括但不限于为降噪处理等。该人脸检测/配准模块用于对预处理的人脸图片进行检测以获取人脸框,并基于人脸框进行人脸配准以获取人脸五点信息。该人脸特征提取模块用于根据获取的人脸五点信息计算人脸特征。在实际应用中,可以通过运行软件开发工具包(Software Development Kit,SDK)来进行人脸预处理。2) The face preprocessing part can be realized by a face image preprocessing module, a face detection/registration module and a face feature extraction module. The face image preprocessing module is used to preprocess the acquired face image, and the preprocessing includes but is not limited to noise reduction processing. The face detection/registration module is used for detecting the preprocessed face picture to obtain a face frame, and performing face registration based on the face frame to obtain five-point face information. The face feature extraction module is used for calculating face features according to the acquired five-point information of the face. In practical applications, face preprocessing can be performed by running a software development kit (Software Development Kit, SDK).

3)人脸检索部分,可以包括检索库,该检索库包括热检索库和冷检索库,其中热检索库存储于GPU的显存中,热检索库可包括人脸特征管理模块、检索模块和热检索库管理模块。冷检索库存储于CPU的内存中,冷检索库可包括冷检索库管理模块和冷检索库清理模块。热检索库管理模块、冷检索库管理模块和冷检索库清理模块之间彼此独立,可以并发同步进行,使得人脸检索服务可以充分利用GPU和CPU的计算资源进行深度学习计算,大大提高设备整体的运算能力。3) The face retrieval part can include a retrieval library, the retrieval library includes a hot retrieval library and a cold retrieval library, wherein the hot retrieval library is stored in the video memory of the GPU, and the hot retrieval library can include a face feature management module, a retrieval module and a hot retrieval library. Search the library management module. The cold search library is stored in the memory of the CPU, and the cold search library may include a cold search library management module and a cold search library cleaning module. The hot search library management module, the cold search library management module and the cold search library cleaning module are independent of each other and can be performed concurrently and synchronously, so that the face search service can make full use of the computing resources of GPU and CPU for deep learning calculations, which greatly improves the overall performance of the device. computing power.

该人脸特征管理模块用于在检索库上面执行相应的特征管理操作(例如增删改查特征的操作)。热检索库作为检索底库,其存储有用于进行人脸检索的热数据。该热检索库管理模块用于对热检索库中存储的热数据进行管理,在满足上述热数据管理条件的情况下,将热数据迁移至冷检索库中,实现热库降冷库的策略。该检索模块用于在热检索库中检索与待检索人脸特征相似的热数据,并基于检索相似度结果返回检索结果。The face feature management module is used to perform corresponding feature management operations (for example, operations of adding, deleting, modifying, and checking features) on the retrieval database. The hot retrieval library is used as a retrieval base, which stores the hot data for face retrieval. The hot search library management module is used to manage the hot data stored in the hot search library, and when the above hot data management conditions are met, the hot data is migrated to the cold search library to realize the strategy of reducing the hot library to the cold library. The retrieval module is used for retrieving hot data similar to the face features to be retrieved in the hot retrieval database, and returns the retrieval results based on the retrieval similarity results.

该冷检索库管理模块用于对冷检索库存储的冷数据进行管理,在满足上述冷数据管理条件的情况下,将冷数据迁移至热检索库中,实现冷库升热库的策略。该冷检索库清理模块用于对冷检索库存储的冷数据进行数据清理,在满足上述数据清理条件的情况下,将冷数据转存至磁盘中。The cold retrieval library management module is used to manage the cold data stored in the cold retrieval library, and when the above cold data management conditions are met, the cold data is migrated to the hot retrieval library to realize the strategy of upgrading the cold storage to the hot library. The cold retrieval library cleaning module is used to perform data cleaning on the cold data stored in the cold retrieval library, and transfer the cold data to the disk under the condition that the above data cleaning conditions are met.

本申请的检索方法可应用计算机视觉CV领域的主流的1:N人脸检索框架,在诸如智慧零售、智慧社区、安防等场景下的身份建档和身份检索、身份合并等方面发挥重要作用,保证了检索身份的时效性和检索性能。The retrieval method of this application can be applied to the mainstream 1:N face retrieval framework in the field of computer vision CV, and plays an important role in identity filing, identity retrieval, identity merging, etc. in scenarios such as smart retail, smart community, and security. The timeliness and retrieval performance of the retrieval identity are guaranteed.

本申请提出的基于检索冷热库升降级策略的检索方法,解决了当前检索中的检索底库规模过大带来的一系列问题。该策略主要包括基于时效的热库下沉冷库策略和基于检索热库的冷库上升热库策略,两大策略按照各自规则同步进行,保证了热检索库的库规模可控和库内信息时效性。基于此策略设计的新型检索底库有助于提升检索算法性能和资源利用率。The retrieval method based on the upgrading and upgrading strategy of retrieval cold and hot storages proposed in the present application solves a series of problems caused by the excessively large scale of retrieval bases in current retrieval. The strategy mainly includes the time-based thermal storage sinking cold storage strategy and the retrieval thermal storage-based cold storage rising thermal storage strategy. The two strategies are synchronized according to their respective rules, which ensures the controllable size of the thermal retrieval database and the timeliness of the information in the storage. . The new retrieval base designed based on this strategy helps to improve the performance of retrieval algorithm and resource utilization.

下述为本申请装置实施例,可以用于执行本申请方法实施例。对于本申请装置实施例中未披露的细节,请参照本申请方法实施例。The following are apparatus embodiments of the present application, which can be used to execute the method embodiments of the present application. For details not disclosed in the device embodiments of the present application, please refer to the method embodiments of the present application.

请参考图11,其示出了本申请实施例提供的一种检索装置的结构框图。该装置具有实现上述方法示例中服务器侧的功能,所述功能可以由硬件实现,也可以由硬件执行相应的软件实现。所述装置80可以包括:Please refer to FIG. 11 , which shows a structural block diagram of a retrieval apparatus provided by an embodiment of the present application. The apparatus has the function of realizing the server side in the above method example, and the function may be realized by hardware or by executing corresponding software by the hardware. The apparatus 80 may include:

特征管理模块801,用于获取对象的待检索特征数据;A feature management module 801, configured to acquire feature data to be retrieved of an object;

检索库管理装置802,用于对检索库进行数据管理,所述检索库包括热检索库和冷检索库;A retrieval library management device 802, configured to perform data management on the retrieval library, the retrieval library includes a hot retrieval library and a cold retrieval library;

检索模块803,用于在经过数据管理的热检索库中对所述待检索特征数据进行检索,得到检索结果;The retrieval module 803 is used for retrieving the characteristic data to be retrieved in the hot retrieval database managed by the data to obtain retrieval results;

返回模块804,用于返回所述检索结果。Returning module 804, configured to return the retrieval result.

在一实施例,所述检索库管理装置为上述任一所述的检索库管理装置,该检索库管理装置可以包括:In one embodiment, the retrieval library management apparatus is any one of the retrieval library management apparatuses described above, and the retrieval library management apparatus may include:

属性获取模块,用于在所述热检索库中存储的热数据的总数大于第一预设规模阈值的情况下,获取所述热检索库中存储的各热数据的属性参数;an attribute acquisition module, configured to acquire attribute parameters of each hot data stored in the hot search library when the total number of hot data stored in the hot search library is greater than a first preset scale threshold;

第一迁移模块,用于若确定热数据的属性参数满足热数据管理条件,将满足热数据管理条件的热数据迁移至所述冷检索库;a first migration module, configured to migrate the hot data satisfying the hot data management condition to the cold retrieval library if the attribute parameter of the hot data is determined to satisfy the hot data management condition;

第一计算模块,用于计算所述冷检索库中存储的各冷数据分别与所述热检索库中各热数据的第一匹配度;a first calculation module, configured to calculate a first degree of matching between each cold data stored in the cold retrieval library and each hot data in the hot retrieval library;

第二迁移模块,用于若确定第一匹配度满足冷数据管理条件,将满足冷数据管理条件的冷数据迁移至所述热检索库。The second migration module is configured to migrate the cold data satisfying the cold data management condition to the hot retrieval library if it is determined that the first matching degree satisfies the cold data management condition.

在一些实施例,所述检索库管理装置包括热检索库管理模块和冷检索库管理模块,In some embodiments, the retrieval library management device includes a hot retrieval library management module and a cold retrieval library management module,

所述热检索库管理模块包括所述属性获取模块和所述第一迁移模块;The hot search library management module includes the attribute acquisition module and the first migration module;

所述冷检索库管理模块包括所述第一计算模块和所述第二迁移模块。The cold retrieval library management module includes the first calculation module and the second migration module.

在一种可行实施例,所述检索模块还可用于计算冷检索库中存储的各冷数据分别与热检索库中各热数据的第一匹配度,以及计算待管理热数据分别与冷检索库中各冷数据的第二匹配度。如此可省略检索库管理装置中的上述计算子单元和第一计算模块,充分利用了检索模块的功能,无需额外配置更多的特征匹配功能模块,降低检索成本。In a feasible embodiment, the retrieval module can also be used to calculate the first matching degree between the cold data stored in the cold retrieval library and the hot data in the hot retrieval library, and to calculate the relationship between the hot data to be managed and the cold retrieval library. The second matching degree of each cold data in . In this way, the above-mentioned calculation subunit and the first calculation module in the retrieval library management device can be omitted, the functions of the retrieval module are fully utilized, and additional feature matching function modules need not be additionally configured, thereby reducing the retrieval cost.

在一些实施例,所述热检索库存储于用于匹配待检索对象的图像处理器或声音处理器中,所述冷检索库存储于中央处理器中;In some embodiments, the hot retrieval library is stored in an image processor or a sound processor for matching objects to be retrieved, and the cold retrieval library is stored in a central processing unit;

所述待检索对象包括以下至少一种:人脸、音频和虹膜。The object to be retrieved includes at least one of the following: human face, audio and iris.

本申请实施例提供了一种检索设备,该设备可以包括处理器和存储器,该存储器中存储有至少一条指令、至少一段程序、代码集或指令集,该至少一条指令、该至少一段程序、该代码集或指令集由该处理器加载并执行以实现如上述方法实施例所提供的检索方法。An embodiment of the present application provides a retrieval device, the device may include a processor and a memory, the memory stores at least one instruction, at least a piece of program, code set or instruction set, the at least one instruction, the at least one piece of program, the at least one piece of program, the The code set or instruction set is loaded and executed by the processor to implement the retrieval method provided by the above method embodiments.

本申请实施例还提供了一种计算机可读存储介质,所述存储介质中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、至少一段程序、代码集或指令集由处理器加载并执行上述任一所述的检索方法。Embodiments of the present application further provide a computer-readable storage medium, where at least one instruction, at least one piece of program, code set or instruction set is stored in the storage medium, the at least one instruction, at least one piece of program, code set or instruction set The set is loaded by the processor and executes any of the retrieval methods described above.

进一步地,图12示出了一种用于实现本申请实施例所提供的检索库管理方法或检索方法的设备的硬件结构示意图,所述设备可以为计算机终端、移动终端或其它设备,所述设备还可以参与构成或包含本申请实施例所提供的装置。如图12所示,计算机终端10可以包括一个或多个(图中采用102a、102b,……,102n来示出)处理器102(处理器102可以包括但不限于微处理器MCU或可编程逻辑器件FPGA等的处理装置)、用于存储数据的存储器104、以及用于通信功能的传输装置106。除此以外,还可以包括:显示器、输入/输出接口(I/O接口)、通用串行总线(USB)端口(可以作为I/O接口的端口中的一个端口被包括)、网络接口、电源和/或相机。本领域普通技术人员可以理解,图12所示的结构仅为示意,其并不对上述电子装置的结构造成限定。例如,计算机终端10还可包括比图12中所示更多或者更少的组件,或者具有与图12所示不同的配置。Further, FIG. 12 shows a schematic diagram of the hardware structure of a device for implementing the retrieval library management method or retrieval method provided by the embodiment of the present application. The device may be a computer terminal, a mobile terminal, or other devices. The device may also participate in forming or including the apparatus provided by the embodiments of the present application. As shown in FIG. 12, the computer terminal 10 may include one or more processors 102 (102a, 102b, . processing means of a logic device FPGA or the like), a memory 104 for storing data, and a transmission means 106 for communication functions. In addition, may also include: display, input/output interface (I/O interface), universal serial bus (USB) port (may be included as one of the ports of the I/O interface), network interface, power supply and/or camera. Those of ordinary skill in the art can understand that the structure shown in FIG. 12 is only for illustration, and does not limit the structure of the above electronic device. For example, the computer terminal 10 may also include more or fewer components than those shown in FIG. 12 , or have a different configuration than that shown in FIG. 12 .

应当注意到的是上述一个或多个处理器102和/或其他数据处理电路在本文中通常可以被称为“数据处理电路”。该数据处理电路可以全部或部分的体现为软件、硬件、固件或其他任意组合。此外,数据处理电路可为单个独立的处理模块,或全部或部分的结合到计算机终端10(或移动设备)中的其他元件中的任意一个内。如本申请实施例中所涉及到的,该数据处理电路作为一种处理器控制(例如与接口连接的可变电阻终端路径的选择)。It should be noted that the one or more processors 102 and/or other data processing circuits described above may generally be referred to herein as "data processing circuits." The data processing circuit may be embodied in whole or in part as software, hardware, firmware or any other combination. Furthermore, the data processing circuitry may be a single stand-alone processing module, or incorporated in whole or in part into any of the other elements in the computer terminal 10 (or mobile device). As referred to in the embodiments of the present application, the data processing circuit acts as a kind of processor control (eg, selection of a variable resistance termination path connected to an interface).

存储器104可用于存储应用软件的软件程序以及模块,如本申请实施例中所述的方法对应的程序指令/数据存储装置,处理器102通过运行存储在存储器104内的软件程序以及模块,从而执行各种功能应用以及数据处理,即实现上述的一种神经网络处理方法。存储器104可包括高速随机存储器,还可包括非易失性存储器,如一个或者多个磁性存储装置、闪存、或者其他非易失性固态存储器。在一些实例中,存储器104可进一步包括相对于处理器102远程设置的存储器,这些远程存储器可以通过网络连接至计算机终端10。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。The memory 104 can be used to store software programs and modules of application software, such as program instructions/data storage devices corresponding to the methods described in the embodiments of the present application, the processor 102 executes the software programs and modules stored in the memory 104 by running the software programs and modules. Various functional applications and data processing are to implement the above-mentioned neural network processing method. Memory 104 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some instances, memory 104 may further include memory located remotely from processor 102, which may be connected to computer terminal 10 through a network. Examples of such networks include, but are not limited to, the Internet, an intranet, a local area network, a mobile communication network, and combinations thereof.

传输装置106用于经由一个网络接收或者发送数据。上述的网络具体实例可包括计算机终端10的通信供应商提供的无线网络。在一个实例中,传输装置106包括一个网络适配器(Network Interface Controller,NIC),其可通过基站与其他网络设备相连从而可与互联网进行通讯。在一个实例中,传输装置106可以为射频(Radio Frequency,RF)模块,其用于通过无线方式与互联网进行通讯。Transmission means 106 are used to receive or transmit data via a network. A specific example of the above-mentioned network may include a wireless network provided by a communication provider of the computer terminal 10 . In one example, the transmission device 106 includes a network adapter (Network Interface Controller, NIC), which can be connected to other network devices through a base station so as to communicate with the Internet. In one example, the transmission device 106 may be a radio frequency (Radio Frequency, RF) module, which is used for wirelessly communicating with the Internet.

显示器可以例如触摸屏式的液晶显示器(LCD),该液晶显示器可使得用户能够与计算机终端10(或移动设备)的用户界面进行交互。The display may be, for example, a touch screen type liquid crystal display (LCD) that enables a user to interact with the user interface of the computer terminal 10 (or mobile device).

需要说明的是:上述本申请实施例先后顺序仅仅为了描述,不代表实施例的优劣。且上述对本申请特定实施例进行了描述。其它实施例在所附权利要求书的范围内。在一些情况下,在权利要求书中记载的动作或步骤可以按照不同于实施例中的顺序来执行并且仍然可以实现期望的结果。另外,在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中,多任务处理和并行处理也是可以的或者可能是有利的。It should be noted that: the above-mentioned order of the embodiments of the present application is only for description, and does not represent the advantages and disadvantages of the embodiments. And the above describes specific embodiments of the present application. Other embodiments are within the scope of the appended claims. In some cases, the actions or steps recited in the claims can be performed in an order different from that in the embodiments and still achieve desirable results. Additionally, the processes depicted in the figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

本申请中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于装置和服务器实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。Each embodiment in this application is described in a progressive manner, and the same and similar parts between the various embodiments may be referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, for the apparatus and server embodiments, since they are basically similar to the method embodiments, the description is relatively simple, and for related parts, please refer to the partial descriptions of the method embodiments.

本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。Those of ordinary skill in the art can understand that all or part of the steps of implementing the above embodiments can be completed by hardware, or can be completed by instructing relevant hardware through a program, and the program can be stored in a computer-readable storage medium. The storage medium mentioned may be a read-only memory, a magnetic disk or an optical disk, etc.

以上所述仅为本申请的较佳实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。The above descriptions are only preferred embodiments of the present application, and are not intended to limit the present application. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of the present application shall be included in the protection of the present application. within the range.

Claims (10)

1.一种检索库管理方法,其特征在于,所述检索库包括热检索库和冷检索库,所述方法包括:1. A retrieval library management method, characterized in that, the retrieval library includes a hot retrieval library and a cold retrieval library, and the method comprises: 在所述热检索库中存储的热数据的总数大于第一预设规模阈值的情况下,获取所述热检索库中各热数据的属性参数;In the case where the total number of hot data stored in the hot search library is greater than a first preset scale threshold, acquiring attribute parameters of each hot data in the hot search library; 若确定热数据的属性参数满足热数据管理条件,将满足热数据管理条件的热数据迁移至所述冷检索库;If it is determined that the attribute parameter of the hot data satisfies the hot data management condition, the hot data satisfying the hot data management condition is migrated to the cold retrieval library; 计算所述冷检索库中存储的各冷数据分别与所述热检索库中各热数据的第一匹配度;calculating a first degree of matching between each cold data stored in the cold retrieval library and each hot data in the hot retrieval library; 若确定第一匹配度满足冷数据管理条件,将满足冷数据管理条件的冷数据迁移至所述热检索库。If it is determined that the first matching degree satisfies the cold data management condition, the cold data satisfying the cold data management condition is migrated to the hot retrieval library. 2.根据权利要求1所述的方法,其特征在于,所述热数据的属性参数包括热数据的最近一次记录时间戳和热数据关联的用户活跃度中至少一个;2. The method according to claim 1, wherein the attribute parameter of the hot data comprises at least one of a timestamp of the latest recording of the hot data and a user activity associated with the hot data; 在所述热数据的属性参数包括热数据的最近一次记录时间戳和热数据关联的用户活跃度的情况下,所述若确定热数据的属性参数满足热数据管理条件,将满足热数据管理条件的热数据迁移至所述冷检索库,包括:In the case that the attribute parameter of the hot data includes the last recorded time stamp of the hot data and the user activity associated with the hot data, if it is determined that the attribute parameter of the hot data satisfies the hot data management condition, the hot data management condition will be satisfied Migrate the hot data to the cold retrieval library, including: 若判定所述热数据的最近一次记录时间戳与第一预设时间的时间间隔大于等于第一预设阈值,且判定所述热数据关联的用户活跃度小于第二预设阈值,确定所述热数据的属性参数满足热数据管理条件;If it is determined that the time interval between the last recorded timestamp of the hot data and the first preset time is greater than or equal to a first preset threshold, and it is determined that the user activity associated with the hot data is less than a second preset threshold, determine the The attribute parameters of the thermal data meet the thermal data management conditions; 将满足热数据管理条件的热数据确定为待管理热数据;Determine the thermal data that meets the thermal data management conditions as the thermal data to be managed; 将所述待管理热数据迁移至所述冷检索库。The hot data to be managed is migrated to the cold retrieval library. 3.根据权利要求2所述的方法,其特征在于,所述将所述待管理热数据迁移至所述冷检索库,包括:3. The method according to claim 2, wherein the migrating the hot data to be managed to the cold retrieval repository comprises: 计算所述待管理热数据分别与所述冷检索库中各冷数据第二匹配度;calculating the second degree of matching between the hot data to be managed and the cold data in the cold retrieval library; 在所述待管理热数据对应的多个第二匹配度中,判断最大第二匹配度是否大于等于第三预设阈值;Among the plurality of second matching degrees corresponding to the thermal data to be managed, determine whether the largest second matching degree is greater than or equal to a third preset threshold; 若判定结果为是,将最大第二匹配度大于等于第三预设阈值的待管理热数据与冷数据进行数据合并;If the determination result is yes, perform data merging of the hot data to be managed and the cold data whose maximum second matching degree is greater than or equal to the third preset threshold; 若判定结果为否,将最大第二匹配度小于第三预设阈值的待管理热数据存储至所述冷检索库。If the determination result is no, the hot data to be managed whose maximum second matching degree is smaller than the third preset threshold is stored in the cold retrieval library. 4.根据权利要求1所述的方法,其特征在于,所述若确定第一匹配度满足冷数据管理条件,将满足冷数据管理条件的冷数据迁移至所述热检索库,包括:4 . The method according to claim 1 , wherein, if it is determined that the first matching degree satisfies the cold data management condition, migrating the cold data that satisfies the cold data management condition to the hot retrieval library comprises: 5 . 在各个冷数据对应的多个第一匹配度中,若判定最大第一匹配度大于等于第四预设阈值,确定第一匹配度满足冷数据管理条件;Among the multiple first matching degrees corresponding to each cold data, if it is determined that the maximum first matching degree is greater than or equal to a fourth preset threshold, it is determined that the first matching degree satisfies the cold data management condition; 将满足冷数据管理条件的冷数据确定为待管理冷数据;Determine the cold data that meets the cold data management conditions as the cold data to be managed; 将所述待管理冷数据迁移至所述热检索库。Migrate the cold data to be managed to the hot retrieval repository. 5.根据权利要求1所述的方法,其特征在于,所述方法还包括:5. The method according to claim 1, wherein the method further comprises: 检测所述冷检索库中存储的冷数据的总数;detecting the total number of cold data stored in the cold retrieval repository; 若所检测的冷数据的总数大于第二预设规模阈值,将所检测的冷数据的总数与第二预设规模阈值的差值作为待转存数量N,其中N为正整数;If the total number of detected cold data is greater than the second preset size threshold, the difference between the total number of detected cold data and the second preset size threshold is taken as the number N to be dumped, where N is a positive integer; 按照最近一次记录时间的先后顺序对所述冷检索库中所有冷数据进行排序;Sort all cold data in the cold retrieval library according to the order of the latest recording time; 将排序靠前的N个冷数据转存至磁盘中。Dump the top N cold data to disk. 6.根据权利要求1所述的方法,其特征在于,所述热检索库存储于用于匹配待检索对象的图像处理器或声音处理器对应的存储器中,所述冷检索库存储于中央处理器对应的存储器中;6. The method according to claim 1, wherein the hot retrieval library is stored in a memory corresponding to an image processor or a sound processor for matching the object to be retrieved, and the cold retrieval library is stored in a central processing unit in the memory corresponding to the device; 所述待检索对象包括以下至少一种:人脸、音频和虹膜。The object to be retrieved includes at least one of the following: human face, audio and iris. 7.一种检索方法,其特征在于,采用经过数据管理的检索库进行检索,所述检索库包括热检索库和冷检索库,所述方法包括:7. A retrieval method, characterized in that the retrieval is carried out by using a retrieval database managed by data, and the retrieval database includes a hot retrieval database and a cold retrieval database, and the method comprises: 获取对象的待检索特征数据;Obtain the feature data to be retrieved of the object; 在经过检索库管理的热检索库中对所述待检索特征数据进行检索,得到检索结果;Retrieve the feature data to be retrieved in the hot retrieval database managed by the retrieval database to obtain retrieval results; 返回所述检索结果;return the search result; 其中,所述检索库通过以下检索库管理方法进行数据管理,所述数据库管理方法包括:Wherein, the retrieval library performs data management through the following retrieval library management method, and the database management method includes: 在所述热检索库中存储的热数据的总数大于第一预设规模阈值的情况下,获取所述热检索库中各热数据的属性参数;In the case where the total number of hot data stored in the hot search library is greater than a first preset scale threshold, acquiring attribute parameters of each hot data in the hot search library; 若确定热数据的属性参数满足热数据管理条件,将满足热数据管理条件的热数据迁移至所述冷检索库;If it is determined that the attribute parameter of the hot data satisfies the hot data management condition, the hot data satisfying the hot data management condition is migrated to the cold retrieval library; 计算所述冷检索库中存储的各冷数据分别与所述热检索库中各热数据的第一匹配度;calculating a first degree of matching between each cold data stored in the cold retrieval library and each hot data in the hot retrieval library; 若确定第一匹配度满足冷数据管理条件,将满足冷数据管理条件的冷数据迁移至所述热检索库。If it is determined that the first matching degree satisfies the cold data management condition, the cold data satisfying the cold data management condition is migrated to the hot retrieval library. 8.一种检索库管理装置,其特征在于,所述检索库包括热检索库和冷检索库,所述装置包括:8. A retrieval library management device, wherein the retrieval library includes a hot retrieval library and a cold retrieval library, and the device comprises: 属性获取模块,用于在所述热检索库中存储的热数据的总数大于第一预设规模阈值的情况下,获取所述热检索库中存储的各热数据的属性参数;an attribute acquisition module, configured to acquire attribute parameters of each hot data stored in the hot search library when the total number of hot data stored in the hot search library is greater than a first preset scale threshold; 第一迁移模块,用于若确定热数据的属性参数满足热数据管理条件,将满足热数据管理条件的热数据迁移至所述冷检索库;a first migration module, configured to migrate the hot data satisfying the hot data management condition to the cold retrieval library if the attribute parameter of the hot data is determined to satisfy the hot data management condition; 第一计算模块,用于计算所述冷检索库中存储的各冷数据分别与所述热检索库中各热数据的第一匹配度;a first calculation module, configured to calculate a first degree of matching between each cold data stored in the cold retrieval library and each hot data in the hot retrieval library; 第二迁移模块,用于若确定第一匹配度满足冷数据管理条件,将满足冷数据管理条件的冷数据迁移至所述热检索库。The second migration module is configured to migrate the cold data satisfying the cold data management condition to the hot retrieval library if it is determined that the first matching degree satisfies the cold data management condition. 9.一种检索装置,其特征在于,采用经过数据管理的检索库进行检索,所述检索库包括热检索库和冷检索库,所述装置包括:9. A retrieval device, characterized in that a retrieval database managed by data is used for retrieval, the retrieval database includes a hot retrieval database and a cold retrieval database, and the device comprises: 特征管理模块,用于获取对象的待检索特征数据;The feature management module is used to obtain the feature data to be retrieved of the object; 权利要求8所述的检索库管理装置,用于对检索库进行数据管理,所述检索库包括热检索库和冷检索库;The retrieval library management device according to claim 8, which is used for performing data management on the retrieval library, and the retrieval library includes a hot retrieval library and a cold retrieval library; 检索模块,用于在经过数据管理的热检索库中对所述待检索特征数据进行检索,得到检索结果;a retrieval module, used for retrieving the feature data to be retrieved in the hot retrieval database managed by the data to obtain retrieval results; 返回模块,用于返回所述检索结果。The return module is used to return the retrieval result. 10.一种计算机可读存储介质,其特征在于,所述存储介质中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、至少一段程序、代码集或指令集由处理器加载并执行如权利要求1至7任一所述的检索库管理方法、或如权利要求8所述的检索方法。10. A computer-readable storage medium, wherein the storage medium stores at least one instruction, at least one program, code set or instruction set, the at least one instruction, at least one program, code set or instruction set The retrieval library management method according to any one of claims 1 to 7 or the retrieval method according to claim 8 is loaded and executed by the processor.
CN201911044479.8A 2019-10-30 2019-10-30 A search library management method, search method, device and medium Active CN110865992B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911044479.8A CN110865992B (en) 2019-10-30 2019-10-30 A search library management method, search method, device and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911044479.8A CN110865992B (en) 2019-10-30 2019-10-30 A search library management method, search method, device and medium

Publications (2)

Publication Number Publication Date
CN110865992A true CN110865992A (en) 2020-03-06
CN110865992B CN110865992B (en) 2024-10-18

Family

ID=69652998

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911044479.8A Active CN110865992B (en) 2019-10-30 2019-10-30 A search library management method, search method, device and medium

Country Status (1)

Country Link
CN (1) CN110865992B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111459939A (en) * 2020-03-31 2020-07-28 中国银行股份有限公司 Data processing method and device
CN111858520A (en) * 2020-07-21 2020-10-30 杭州溪塔科技有限公司 Method and device for separately storing block link point data
CN111858604A (en) * 2020-07-24 2020-10-30 平安证券股份有限公司 Data storage method and device, electronic equipment and storage medium
CN112380217A (en) * 2020-11-17 2021-02-19 安徽鸿程光电有限公司 Data processing method, device, equipment and medium
CN112416929A (en) * 2020-11-17 2021-02-26 四川长虹电器股份有限公司 Retrieval library management and data retrieval method based on mysql and java
CN114385668A (en) * 2022-01-13 2022-04-22 中国平安人寿保险股份有限公司 Cold data cleaning method, device, equipment and storage medium
CN116401212A (en) * 2023-06-07 2023-07-07 东营市第二人民医院 Personnel file quick searching system based on data analysis
CN118012851A (en) * 2024-04-08 2024-05-10 浪潮通信信息系统有限公司 Scene data management method and device, electronic equipment and storage medium
CN118282785A (en) * 2024-06-04 2024-07-02 杭州宇泛智能科技股份有限公司 High-reliability low-delay transmission and processing method for large-scale multi-source multi-mode data

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090100050A1 (en) * 2006-07-31 2009-04-16 Berna Erol Client device for interacting with a mixed media reality recognition system
CN104536904A (en) * 2014-12-29 2015-04-22 杭州华为数字技术有限公司 Data management method, equipment and system
US20150199129A1 (en) * 2014-01-14 2015-07-16 Lsi Corporation System and Method for Providing Data Services in Direct Attached Storage via Multiple De-clustered RAID Pools
CN105701028A (en) * 2014-11-28 2016-06-22 国际商业机器公司 Method and device for managing disks in distributed storage system
CN107844269A (en) * 2017-10-17 2018-03-27 华中科技大学 A kind of layering mixing storage system and method based on uniformity Hash
CN109491618A (en) * 2018-11-20 2019-03-19 上海科技大学 Data management system, method, terminal and medium based on mixing storage
CN109857737A (en) * 2019-01-03 2019-06-07 平安科技(深圳)有限公司 A kind of cold and hot date storage method and device, electronic equipment
CN110134723A (en) * 2019-05-22 2019-08-16 网易(杭州)网络有限公司 A kind of method and database of storing data
CN110196851A (en) * 2019-05-09 2019-09-03 腾讯科技(深圳)有限公司 A kind of date storage method, device, equipment and storage medium

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090100050A1 (en) * 2006-07-31 2009-04-16 Berna Erol Client device for interacting with a mixed media reality recognition system
US20150199129A1 (en) * 2014-01-14 2015-07-16 Lsi Corporation System and Method for Providing Data Services in Direct Attached Storage via Multiple De-clustered RAID Pools
CN105701028A (en) * 2014-11-28 2016-06-22 国际商业机器公司 Method and device for managing disks in distributed storage system
CN104536904A (en) * 2014-12-29 2015-04-22 杭州华为数字技术有限公司 Data management method, equipment and system
CN107844269A (en) * 2017-10-17 2018-03-27 华中科技大学 A kind of layering mixing storage system and method based on uniformity Hash
CN109491618A (en) * 2018-11-20 2019-03-19 上海科技大学 Data management system, method, terminal and medium based on mixing storage
CN109857737A (en) * 2019-01-03 2019-06-07 平安科技(深圳)有限公司 A kind of cold and hot date storage method and device, electronic equipment
CN110196851A (en) * 2019-05-09 2019-09-03 腾讯科技(深圳)有限公司 A kind of date storage method, device, equipment and storage medium
CN110134723A (en) * 2019-05-22 2019-08-16 网易(杭州)网络有限公司 A kind of method and database of storing data

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
KIM SUNHO,KANG SANG H.: "Scheduling Data Broadcast: An Efficient Cut-Off Point Between Periodic and On-Demand Data", IEEE COMMUNICATIONS LETTERS, 1 December 2010 (2010-12-01), pages 1176 - 1178, XP011337142, DOI: 10.1109/LCOMM.2010.101210.101228 *
葛微: "大数据索引和查询优化技术与系统研究", 中国博士学位论文全文数据库 信息科技辑, 15 June 2019 (2019-06-15), pages 138 - 19 *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111459939A (en) * 2020-03-31 2020-07-28 中国银行股份有限公司 Data processing method and device
CN111459939B (en) * 2020-03-31 2023-09-19 中国银行股份有限公司 Data processing method and device
CN111858520A (en) * 2020-07-21 2020-10-30 杭州溪塔科技有限公司 Method and device for separately storing block link point data
CN111858520B (en) * 2020-07-21 2024-03-22 杭州溪塔科技有限公司 Method and device for separately storing block chain node data
CN111858604B (en) * 2020-07-24 2022-11-04 平安证券股份有限公司 Data storage method and device, electronic equipment and storage medium
CN111858604A (en) * 2020-07-24 2020-10-30 平安证券股份有限公司 Data storage method and device, electronic equipment and storage medium
CN112416929A (en) * 2020-11-17 2021-02-26 四川长虹电器股份有限公司 Retrieval library management and data retrieval method based on mysql and java
CN112380217A (en) * 2020-11-17 2021-02-19 安徽鸿程光电有限公司 Data processing method, device, equipment and medium
CN112380217B (en) * 2020-11-17 2024-04-12 安徽鸿程光电有限公司 Data processing method, device, equipment and medium
CN114385668A (en) * 2022-01-13 2022-04-22 中国平安人寿保险股份有限公司 Cold data cleaning method, device, equipment and storage medium
CN116401212A (en) * 2023-06-07 2023-07-07 东营市第二人民医院 Personnel file quick searching system based on data analysis
CN116401212B (en) * 2023-06-07 2023-08-11 东营市第二人民医院 Personnel file quick searching system based on data analysis
CN118012851A (en) * 2024-04-08 2024-05-10 浪潮通信信息系统有限公司 Scene data management method and device, electronic equipment and storage medium
CN118282785A (en) * 2024-06-04 2024-07-02 杭州宇泛智能科技股份有限公司 High-reliability low-delay transmission and processing method for large-scale multi-source multi-mode data

Also Published As

Publication number Publication date
CN110865992B (en) 2024-10-18

Similar Documents

Publication Publication Date Title
CN110865992B (en) A search library management method, search method, device and medium
US10372723B2 (en) Efficient query processing using histograms in a columnar database
US10169485B2 (en) Dynamic partitioning of graph databases based on edge sampling
US20150039473A1 (en) Near-duplicate filtering in search engine result page of an online shopping system
CN113836131B (en) Big data cleaning method and device, computer equipment and storage medium
WO2015081915A1 (en) File recommendation method and device
CN110674144A (en) User portrait generation method and device, computer equipment and storage medium
CN106933511B (en) Space data storage organization method and system considering load balance and disk efficiency
KR102141083B1 (en) Optimization methods, systems, electronic devices and storage media of database systems
EP2864906A2 (en) Searching for events by attendants
CN110309143A (en) Data similarity determines method, apparatus and processing equipment
CN111427920B (en) Data acquisition method, device, system, computer equipment and storage medium
WO2021027331A1 (en) Graph data-based full relationship calculation method and apparatus, device, and storage medium
EP3620932A1 (en) Method and system for merging data
CN114155578A (en) Portrait clustering method, device, electronic equipment and storage medium
CN115982346A (en) Question-answer library construction method, terminal device and storage medium
CN116569194A (en) Joint learning
CN111797175B (en) Data storage method and device, storage medium and electronic equipment
CN117806556A (en) Low-energy-consumption full flash memory method and system
CN111061916A (en) Video sharing system based on multi-target library image recognition
CN113553320B (en) Data quality monitoring method and device
CN115904238A (en) Storage method and device based on data integration, computer equipment and storage medium
CN116628042A (en) Data processing method, device, equipment and medium
Kaur et al. Comparison study of big data processing systems for IoT cloud environment
Wei et al. A Highly Accurate Data Synchronization and Full-text Search Algorithm for Canal and Elasticsearch

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40021537

Country of ref document: HK

SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TG01 Patent term adjustment
TG01 Patent term adjustment