[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN115567412A - Traffic deduplication method, device, electronic equipment and storage medium - Google Patents

Traffic deduplication method, device, electronic equipment and storage medium Download PDF

Info

Publication number
CN115567412A
CN115567412A CN202211268231.1A CN202211268231A CN115567412A CN 115567412 A CN115567412 A CN 115567412A CN 202211268231 A CN202211268231 A CN 202211268231A CN 115567412 A CN115567412 A CN 115567412A
Authority
CN
China
Prior art keywords
identifier
traffic
sub
recording
call
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211268231.1A
Other languages
Chinese (zh)
Other versions
CN115567412B (en
Inventor
周官宝
陈吉
吴广贤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Shizhuang Information Technology Co ltd
Original Assignee
Shanghai Shizhuang Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Shizhuang Information Technology Co ltd filed Critical Shanghai Shizhuang Information Technology Co ltd
Priority to CN202211268231.1A priority Critical patent/CN115567412B/en
Publication of CN115567412A publication Critical patent/CN115567412A/en
Application granted granted Critical
Publication of CN115567412B publication Critical patent/CN115567412B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/02Capturing of monitoring data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Prevention of errors by analysis, debugging or testing of software
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/30Creation or generation of source code

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请提供一种流量去重方法、装置、电子设备及存储介质。该方法包括:获取待去重流量数据,所述待去重流量数据包括多条流量数据;其中,每条所述流量数据包括第一标识、第二标识和录制流量,所述第一标识为根据所述录制流量对应的录制接口生成,所述第二标识为根据所述录制流量对应的子调用集合和业务标签生成;根据每条流量数据对应的第一标识和第二标识,对所述待去重流量数据进行去重。本申请通过利用录制接口对应的第一标识、用业务标签和子调用集合生成的第二标识进行流量去重,提高了流量去重的效率及精准性。

Figure 202211268231

The present application provides a traffic deduplication method, device, electronic equipment, and storage medium. The method includes: acquiring traffic data to be deduplicated, the traffic data to be deduplicated includes multiple pieces of traffic data; wherein, each piece of traffic data includes a first identifier, a second identifier and recorded traffic, and the first identifier is Generated according to the recording interface corresponding to the recorded traffic, the second identifier is generated according to the sub-call set and service label corresponding to the recorded traffic; according to the first identifier and second identifier corresponding to each piece of traffic data, the The traffic data to be deduplicated is deduplicated. This application improves the efficiency and accuracy of traffic deduplication by using the first identifier corresponding to the recording interface and the second identifier generated by using the service label and the sub-call set to deduplicate traffic.

Figure 202211268231

Description

流量去重方法、装置、电子设备及存储介质Traffic deduplication method, device, electronic equipment and storage medium

技术领域technical field

本申请涉及软件开发和测试技术领域,具体而言,涉及一种流量去重方法、装置、电子设备及存储介质。The present application relates to the technical field of software development and testing, and specifically relates to a traffic deduplication method, device, electronic equipment and storage medium.

背景技术Background technique

在业务服务更新后,需要通过流量录制与回放在对业务服务进行测试。在测试过程中,将现网录制的流量数据与回放环境中的回放流量数据进行对比生成的测试结果的过程中,由于录制流量中会存在大量重复的流量,从而导致测试效率低。After the business service is updated, it is necessary to test the business service through traffic recording and playback. During the test, the test results generated by comparing the traffic data recorded on the live network with the playback traffic data in the playback environment, because there will be a lot of repeated traffic in the recorded traffic, resulting in low test efficiency.

为了解决该问题,目前采用人工为流量打标签的方式,对录制流量进行去重,这种流量去重方式的效率较低。In order to solve this problem, the method of manually labeling the traffic is currently used to deduplicate the recorded traffic. This traffic deduplication method is inefficient.

发明内容Contents of the invention

本申请实施例的目的在于提供一种流量去重方法、装置、电子设备及存储介质,用以提高流量去重的效率。The purpose of the embodiments of the present application is to provide a traffic deduplication method, device, electronic equipment, and storage medium, so as to improve the efficiency of traffic deduplication.

第一方面,本申请实施例提供一种流量去重方法,包括:In the first aspect, the embodiment of the present application provides a traffic deduplication method, including:

获取待去重流量数据,所述待去重流量数据包括多条流量数据;其中,每条所述流量数据包括第一标识、第二标识和录制流量,所述第一标识为根据所述录制流量对应的录制接口生成,所述第二标识为根据所述录制流量对应的子调用集合和业务标签生成;Obtain traffic data to be deduplicated, the traffic data to be deduplicated includes multiple pieces of traffic data; wherein, each piece of traffic data includes a first identifier, a second identifier, and recorded traffic, and the first identifier is based on the recorded The recording interface corresponding to the traffic is generated, and the second identification is generated according to the sub-call set and the service label corresponding to the recorded traffic;

根据每条流量数据对应的第一标识和第二标识,对所述待去重流量数据进行去重。Deduplication is performed on the traffic data to be deduplicated according to the first identifier and the second identifier corresponding to each piece of traffic data.

本申请实施例通过利用录制接口对应的第一标识、用业务标签和子调用集合生成的第二标识进行流量去重,提高了流量去重的效率及精准性。In this embodiment of the present application, traffic deduplication is performed by using the first identifier corresponding to the recording interface and the second identifier generated by using the service label and the sub-call set, thereby improving the efficiency and accuracy of traffic deduplication.

在任一实施例中,所述根据每条流量数据对应的第一标识和第二标识,对所述待去重流量数据进行去重,包括:In any embodiment, the deduplication of the traffic data to be deduplicated according to the first identifier and the second identifier corresponding to each piece of traffic data includes:

根据所述第一标识和所述第二标识确定所述录制流量中相同的流量数据;determining the same traffic data in the recorded traffic according to the first identifier and the second identifier;

将所述相同的流量数据进行部分剔除,只保留一条流量数据,以实现对所述待去重流量数据的去重。Part of the same traffic data is removed, and only one piece of traffic data is kept, so as to realize the deduplication of the traffic data to be deduplicated.

本申请实施例中第一标识可以用于表征录制接口的唯一性,第二标识可以用于表征子调用的唯一性,因此,通过第一标识和第二标识对录制流量进行去重,可以提高去重的精准性。In the embodiment of this application, the first identifier can be used to represent the uniqueness of the recording interface, and the second identifier can be used to represent the uniqueness of the sub-call. Therefore, deduplication of the recording traffic through the first identifier and the second identifier can improve Deduplication accuracy.

在任一实施例中,在获取待去重流量数据之前,所述方法还包括:In any embodiment, before obtaining the traffic data to be deduplicated, the method further includes:

通过jvm sandbox repeater工具获取录制流量,所述录制流量包括子调用集合;Obtain the recording traffic through the jvm sandbox repeater tool, and the recording traffic includes a set of sub-calls;

提取所述录制流量的录制接口并根据所述录制接口生成第一标识;extracting the recording interface of the recorded traffic and generating a first identifier according to the recording interface;

生成所述录制流量对应的业务标签,并根据所述业务标签和所述子调用集合生成第二标识。Generate a service label corresponding to the recorded traffic, and generate a second identifier according to the service label and the sub-call set.

本申请实施例通过利用录制接口对应的第一标识、用业务标签和子调用集合生成的第二标识进行流量去重,提高了流量去重的效率及精准性。In this embodiment of the present application, traffic deduplication is performed by using the first identifier corresponding to the recording interface and the second identifier generated by using the service label and the sub-call set, thereby improving the efficiency and accuracy of traffic deduplication.

在任一实施例中,该子调用集合包括至少一个子调用;且至少一个子调用对应的调用类型为业务服务器预先根据子调用对应的线程的哈希值确定;其中,调用类型包括主线程子调用和异步线程子调用。In any embodiment, the sub-call set includes at least one sub-call; and the call type corresponding to at least one sub-call is determined by the service server in advance according to the hash value of the thread corresponding to the sub-call; wherein, the call type includes a main thread sub-call and asynchronous thread subcalls.

本申请实施例通过将子调用划分为主线程子调用和异步线程子调用,可以避免很多不必要的场景,提高了对录制流量去重的精准性。In the embodiment of the present application, by dividing sub-calls into main thread sub-calls and asynchronous thread sub-calls, many unnecessary scenarios can be avoided, and the accuracy of deduplication of recorded traffic can be improved.

在任一实施例中,根据业务标签和录制流量的子调用集合生成第二标识,包括:In any embodiment, the second identifier is generated according to the service label and the sub-call set of recorded traffic, including:

根据调用类型为主线程子调用的子调用生成主线程子调用集合;Generate a main thread subcall set according to the subcall of the main thread subcall according to the call type;

根据调用类型为异步线程子调用的子调用生成异步线程子调用集合;Generate a collection of asynchronous thread subcalls for subcalls of asynchronous thread subcalls according to the call type;

按照预设格式将主线程子调用集合、异步线程子调用集合和业务标签生成目标字符串;According to the preset format, the main thread sub-call set, the asynchronous thread sub-call set and the business label are generated into target strings;

利用预设算法对目标字符串进行计算,获得所述第二标识。The target character string is calculated by using a preset algorithm to obtain the second identifier.

本申请实施例通过将子调用划分为主线程子调用和异步线程子调用,可以避免很多不必要的场景,再结合业务标签,提高了对录制流量去重的精准性。In the embodiment of the present application, by dividing sub-calls into main thread sub-calls and asynchronous thread sub-calls, many unnecessary scenarios can be avoided, combined with business tags, the accuracy of deduplication of recorded traffic is improved.

在任一实施例中,在生成第二标识之后,该方法还包括:In any embodiment, after the second identification is generated, the method further includes:

将每条流量数据对应的第一标识、第二标识和流量数据作为一条流量数据存储到搜索服务器中;storing the first identifier, the second identifier, and the traffic data corresponding to each piece of traffic data in the search server as a piece of traffic data;

相应的,所述获取待去重流量数据,包括:Correspondingly, the acquisition of the traffic data to be deduplicated includes:

从所述搜索服务器中获取预设时间段内的待去重流量数据。Obtain the traffic data to be deduplicated within a preset time period from the search server.

由于录制流量的数据量较大,为了降低录制平台的压力,将预处理后的流量数据存储在搜索服务器中,然后从搜索服务器中读取要去重的录制流量的数据,以达到降低录制平台压力的目的。Due to the large amount of recorded traffic data, in order to reduce the pressure on the recording platform, store the preprocessed traffic data in the search server, and then read the deduplicated recorded traffic data from the search server to reduce the recording platform purpose of pressure.

在任一实施例中,生成所述录制流量对应的业务标签,包括:In any embodiment, generating the service label corresponding to the recorded traffic includes:

从每条流量数据中提取预设字段;Extract preset fields from each flow data;

从预先存储的字段标签对应关系中确定与预设字段相匹配的业务标签。The business tag matching the preset field is determined from the pre-stored field tag correspondence.

本申请实施例通过为每条流量数据生成对应的业务标签,通过业务标签反映流量数据对应的场景是否相同,以提高对录制流量去重的精准度。In the embodiment of the present application, a corresponding service label is generated for each piece of traffic data, and whether the scenarios corresponding to the traffic data are reflected through the service label is the same, so as to improve the accuracy of deduplication of recorded traffic.

第二方面,本申请实施例提供一种流量去重装置,包括:In the second aspect, the embodiment of the present application provides a traffic deduplication device, including:

数据获取模块,用于获取待去重流量数据,所述待去重流量数据包括多条流量数据;其中,每条所述流量数据包括第一标识、第二标识和录制流量,所述第一标识为根据所述录制流量对应的录制接口生成,所述第二标识为根据所述录制流量对应的子调用集合和业务标签生成;A data acquisition module, configured to acquire traffic data to be deduplicated, where the traffic data to be deduplicated includes multiple pieces of traffic data; wherein, each piece of traffic data includes a first identifier, a second identifier, and recorded traffic, and the first The identification is generated according to the recording interface corresponding to the recording traffic, and the second identification is generated according to the sub-call set and service label corresponding to the recording traffic;

去重模块,用于根据每条流量数据对应的第一标识和第二标识,对所述待去重流量数据进行去重。The deduplication module is configured to deduplicate the traffic data to be deduplicated according to the first identification and the second identification corresponding to each piece of traffic data.

第三方面,本申请实施例提供一种电子设备,包括:处理器、存储器和总线,其中,In a third aspect, an embodiment of the present application provides an electronic device, including: a processor, a memory, and a bus, wherein,

所述处理器和所述存储器通过所述总线完成相互间的通信;The processor and the memory communicate with each other through the bus;

所述存储器存储有可被所述处理器执行的程序指令,所述处理器调用所述程序指令能够执行第一方面的方法。The memory stores program instructions executable by the processor, and the processor invokes the program instructions to execute the method of the first aspect.

第四方面,本申请实施例提供一种非暂态计算机可读存储介质,包括:In a fourth aspect, the embodiment of the present application provides a non-transitory computer-readable storage medium, including:

所述非暂态计算机可读存储介质存储计算机指令,所述计算机指令使所述计算机执行第一方面的方法。The non-transitory computer-readable storage medium stores computer instructions, and the computer instructions cause the computer to execute the method of the first aspect.

本申请的其他特征和优点将在随后的说明书阐述,并且,部分地从说明书中变得显而易见,或者通过实施本申请实施例了解。本申请的目的和其他优点可通过在所写的说明书、权利要求书、以及附图中所特别指出的结构来实现和获得。Other features and advantages of the present application will be set forth in the ensuing description and, in part, will be apparent from the description, or can be learned by practicing the embodiments of the present application. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

附图说明Description of drawings

为了更清楚地说明本申请实施例的技术方案,下面将对本申请实施例中所需要使用的附图作简单地介绍,应当理解,以下附图仅示出了本申请的某些实施例,因此不应被看作是对范围的限定,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他相关的附图。In order to more clearly illustrate the technical solutions of the embodiments of the present application, the accompanying drawings that need to be used in the embodiments of the present application will be briefly introduced below. It should be understood that the following drawings only show some embodiments of the present application, so It should not be regarded as a limitation on the scope, and those skilled in the art can also obtain other related drawings according to these drawings without creative work.

图1为本申请实施例提供的一种流量去重方法流程示意图;FIG. 1 is a schematic flow chart of a traffic deduplication method provided in an embodiment of the present application;

图2为本申请实施例提供的另一种流量去重方法流程示意图;FIG. 2 is a schematic flowchart of another traffic deduplication method provided in the embodiment of the present application;

图3为本申请实施例提供的一种流量去重装置结构示意图;FIG. 3 is a schematic structural diagram of a flow deduplication device provided in an embodiment of the present application;

图4为本申请实施例提供的一种电子设备结构示意图。FIG. 4 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.

具体实施方式detailed description

下面将结合附图对本申请技术方案的实施例进行详细的描述。以下实施例仅用于更加清楚地说明本申请的技术方案,因此只作为示例,而不能以此来限制本申请的保护范围。Embodiments of the technical solutions of the present application will be described in detail below in conjunction with the accompanying drawings. The following examples are only used to illustrate the technical solution of the present application more clearly, and therefore are only examples, rather than limiting the protection scope of the present application.

除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同;本文中所使用的术语只是为了描述具体的实施例的目的,不是旨在于限制本申请;本申请的说明书和权利要求书及上述附图说明中的术语“包括”和“具有”以及它们的任何变形,意图在于覆盖不排他的包含。Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by those skilled in the technical field of the application; the terms used herein are only for the purpose of describing specific embodiments, and are not intended to To limit this application; the terms "comprising" and "having" and any variations thereof in the specification and claims of this application and the description of the above drawings are intended to cover a non-exclusive inclusion.

在本申请实施例的描述中,技术术语“第一”“第二”等仅用于区别不同对象,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量、特定顺序或主次关系。在本申请实施例的描述中,“多个”的含义是两个以上,除非另有明确具体的限定。In the description of the embodiments of the present application, technical terms such as "first" and "second" are only used to distinguish different objects, and should not be understood as indicating or implying relative importance or implicitly indicating the number, specificity, or specificity of the indicated technical features. Sequence or primary-secondary relationship. In the description of the embodiments of the present application, "plurality" means two or more, unless otherwise specifically defined.

在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。Reference herein to an "embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the present application. The occurrences of this phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is understood explicitly and implicitly by those skilled in the art that the embodiments described herein can be combined with other embodiments.

在本申请实施例的描述中,术语“和/或”仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系。In the description of the embodiment of the present application, the term "and/or" is only a kind of association relationship describing associated objects, which means that there may be three kinds of relationships, such as A and/or B, which may mean: A exists alone, and A exists at the same time and B, there are three cases of B alone. In addition, the character "/" in this article generally indicates that the contextual objects are an "or" relationship.

在本申请实施例的描述中,术语“多个”指的是两个以上(包括两个),同理,“多组”指的是两组以上(包括两组),“多片”指的是两片以上(包括两片)。In the description of the embodiments of the present application, the term "multiple" refers to more than two (including two), similarly, "multiple groups" refers to more than two groups (including two), and "multiple pieces" refers to More than two pieces (including two pieces).

在本申请实施例的描述中,技术术语“中心”“纵向”“横向”“长度”“宽度”“厚度”“上”“下”“前”“后”“左”“右”“竖直”“水平”“顶”“底”“内”“外”“顺时针”“逆时针”“轴向”“径向”“周向”等指示的方位或位置关系为基于附图所示的方位或位置关系,仅是为了便于描述本申请实施例和简化描述,而不是指示或暗示所指的装置或元件必须具有特定的方位、以特定的方位构造和操作,因此不能理解为对本申请实施例的限制。In the description of the embodiments of the present application, the technical terms "center", "longitudinal", "lateral", "length", "width", "thickness", "upper", "lower", "front", "rear", "left", "right", "vertical" "Horizontal", "Top", "Bottom", "Inner", "Outer", "Clockwise", "Counterclockwise", "Axial", "Radial", "Circumferential", etc. indicate the orientation or positional relationship based on the drawings Orientation or positional relationship is only for the convenience of describing the embodiment of the present application and simplifying the description, and does not indicate or imply that the device or element referred to must have a specific orientation, be constructed and operated in a specific orientation, and therefore cannot be understood as an implementation of the present application. Example limitations.

在本申请实施例的描述中,除非另有明确的规定和限定,技术术语“安装”“相连”“连接”“固定”等术语应做广义理解,例如,可以是固定连接,也可以是可拆卸连接,或成一体;也可以是机械连接,也可以是电连接;可以是直接相连,也可以通过中间媒介间接相连,可以是两个元件内部的连通或两个元件的相互作用关系。对于本领域的普通技术人员而言,可以根据具体情况理解上述术语在本申请实施例中的具体含义。In the description of the embodiments of this application, unless otherwise clearly specified and limited, technical terms such as "installation", "connection", "connection" and "fixation" should be interpreted in a broad sense, for example, it can be a fixed connection or a fixed connection. Disassembled connection, or integration; it can also be a mechanical connection, or an electrical connection; it can be a direct connection, or an indirect connection through an intermediary, and it can be the internal communication of two components or the interaction relationship between two components. Those of ordinary skill in the art can understand the specific meanings of the above terms in the embodiments of the present application according to specific situations.

生产环境每天产生的流量是海量且存在大量的重复,大量重复的业务请求无法实现对待测业务系统的有效测试,因此,需要一种能够对大量的生产环境的系统业务请求进行去重的方案。The daily traffic generated in the production environment is massive and there are a lot of repetitions. A large number of repeated business requests cannot realize the effective testing of the business system under test. Therefore, a solution that can deduplicate a large number of system business requests in the production environment is needed.

目前,可以采用人工对流量数据打标签的方式来控制流量,即,人工为每条流量设置对应的标签,将属于相同标签的流量进行剔除,使得相同标签的流量只保留一条,实现对流量的去重操作。但是,面对大量的流量,人工打标签的方法会导致流量去重的效率较慢。At present, traffic can be controlled by manually labeling traffic data, that is, manually setting a corresponding label for each traffic, and removing traffic belonging to the same label, so that only one traffic with the same label is kept, realizing traffic control. Deduplication operation. However, in the face of a large amount of traffic, the method of manual tagging will lead to a slower efficiency in traffic deduplication.

为了解决上述技术问题,本申请发明人提供了一种流量去重方法,即通过根据录制接口、每条流量数据对应的业务标签,以及每条流量数据对应的子调用集合来区分流量数据的异同,以实现对录制流量去重的目的。In order to solve the above technical problems, the inventor of the present application provides a traffic deduplication method, that is, to distinguish the similarities and differences of the traffic data according to the recording interface, the service label corresponding to each traffic data, and the sub-call set corresponding to each traffic data , to achieve the purpose of deduplication of recorded traffic.

在介绍本申请具体方案之前,为了能够更好的理解本申请,对涉及到的相关概念进行介绍:Before introducing the specific scheme of this application, in order to better understand this application, the relevant concepts involved are introduced:

入口调用:由外部发起的请求,所形成的调用,称为入口调用。目前流量回放平台支持的类型有http、dubbo、mq(rocketMQ)等。Entry call: The call formed by an external request is called an entry call. Currently, the types supported by the traffic playback platform include http, dubbo, mq (rocketMQ), etc.

子调用:一次进程外或java(需要增强)方法所形成的子调用,如进程外的调用(redis,mybatis),java方法增强,自定义子调用等。其中,子调用可以分为主线程子调用和异步线程子调用。Sub-call: a sub-call formed by an out-of-process or java (requires enhancement) method, such as an out-of-process call (redis, mybatis), java method enhancement, custom sub-call, etc. Among them, sub-calls can be divided into main thread sub-calls and asynchronous thread sub-calls.

jvm sandbox repeater:是一个基于Jvm-Sandbox的服务端录制/回放通用解决方案。jvm sandbox repeater: It is a general solution for server-side recording/playback based on Jvm-Sandbox.

录制:把一次请求的入参、出参、下游RPC、DB、缓存等序列化并存储的过程。Recording: The process of serializing and storing the input parameters, output parameters, downstream RPC, DB, cache, etc. of a request.

回放:把录制数据还原,重新发起一次或N次请求,对特定的下游节点进行MOCK的过程。Playback: the process of restoring the recorded data, re-initiating one or N times of requests, and performing MOCK on specific downstream nodes.

MOCK:在回放时,被拦截的子调用不会发生真实调用,利用Sandbox的流程干预能力,将录制时的返回值直接返回。MOCK: During playback, the intercepted sub-call will not actually be called, and the return value during recording will be returned directly by using the process intervention capability of Sandbox.

Elasticsearch:搜索服务器,是一个分布式、高扩展、高实时的搜索与数据分析引擎。其能很方便的使大量数据具有搜索、分析和探索的能力。充分利用Elasticsearch的水平伸缩性,能使数据在生产环境变得更有价值。Elasticsearch的实现原理主要分为以下几个步骤,首先用户将数据提交到Elasticsearch数据库中,再通过分词控制器去将对应的语句分词,将其权重和分词结果一并存入数据,当用户搜索数据时候,再根据权重将结果排名,打分,再将返回结果呈现给用户。Elasticsearch: search server, is a distributed, highly scalable, high real-time search and data analysis engine. It can easily enable large amounts of data to be searched, analyzed and explored. Taking full advantage of the horizontal scalability of Elasticsearch can make data more valuable in the production environment. The implementation principle of Elasticsearch is mainly divided into the following steps. First, the user submits the data to the Elasticsearch database, and then uses the word segmentation controller to segment the corresponding sentence, and stores its weight and word segmentation results into the data. When the user searches for data At that time, the results will be ranked and scored according to the weight, and then the returned results will be presented to the user.

可以理解的是,本申请实施例提供的流量去重方法可以应用于电子设备,该电子设备可以为终端或服务器;其中终端具体可以为智能手机、平板电脑、计算机、个人数字助理(Personal Digital Assitant,PDA)等;服务器具体可以为应用服务器,也可以为Web服务器。可以理解的是,电子设备中运行有流量录制平台,该流量录制平台可以与业务服务器通信连接,接收业务服务器发送的录制流量。It can be understood that the traffic deduplication method provided in the embodiment of the present application can be applied to an electronic device, and the electronic device can be a terminal or a server; the terminal can specifically be a smart phone, a tablet computer, a computer, or a personal digital assistant (PDA). , PDA), etc.; the server can specifically be an application server or a Web server. It can be understood that a traffic recording platform runs in the electronic device, and the traffic recording platform can communicate with the service server to receive the recorded traffic sent by the service server.

图1为本申请实施例提供的一种流量去重方法流程示意图,如图1所示,该方法包括:Figure 1 is a schematic flow chart of a traffic deduplication method provided in the embodiment of the present application. As shown in Figure 1, the method includes:

步骤101:获取待去重流量数据,所述待去重流量数据包括多条流量数据;其中,每条所述流量数据包括第一标识、第二标识和录制流量,所述第一标识为根据所述录制流量对应的录制接口生成,所述第二标识为根据所述录制流量对应的子调用集合和业务标签生成;Step 101: Obtain traffic data to be deduplicated, the traffic data to be deduplicated includes multiple pieces of traffic data; wherein, each piece of traffic data includes a first identifier, a second identifier, and recorded traffic, and the first identifier is based on The recording interface corresponding to the recording traffic is generated, and the second identification is generated according to the sub-call set and service label corresponding to the recording traffic;

步骤102:根据每条流量数据对应的第一标识和第二标识,对所述待去重流量数据进行去重。Step 102: Deduplicate the traffic data to be deduplicated according to the first identifier and the second identifier corresponding to each piece of traffic data.

在步骤101中,待去重流量数据可以预先存储在一个预设位置中,例如:可以预先存储在搜索服务器中,或数据库中。电子设备可以根据预设周期从搜索服务器中获取预设时间段内的流量数据,构成待去重流量数据。例如:预设周期可以为一天,电子设备可以在每天的上午10点,获取前一天所有的流量数据。In step 101, the traffic data to be deduplicated may be pre-stored in a preset location, for example, may be pre-stored in a search server or a database. The electronic device may acquire traffic data within a preset time period from the search server according to a preset period to form traffic data to be deduplicated. For example: the preset cycle can be one day, and the electronic device can obtain all the traffic data of the previous day at 10 am every day.

由于一天中产生的流量数据很多,且重复的流量数据也很多,因此需要将重复的流量数据进行去重。可以理解的是,所谓重复的流量数据是指同一录制接口下对应的同一场景的流量数据。以网购为例:两个用户在同一个网购平台的同一个活动中提交的订单,这两个订单对应的流量数据属于重复的流量数据。Since there is a lot of traffic data generated in a day, and there is also a lot of duplicate traffic data, it is necessary to deduplicate the duplicate traffic data. It can be understood that the so-called repeated traffic data refers to the traffic data corresponding to the same scene under the same recording interface. Take online shopping as an example: two users submit orders in the same event on the same online shopping platform, and the traffic data corresponding to the two orders are duplicate traffic data.

第一标识是用于表征录制流量对应的录制接口全局唯一性的标识,其可采用MD5算法对录制接口进行计算获得,也可以采用哈希算法对录制接口计算获得,还可以采用其他算法,本申请实施例对此不作具体限定。The first identifier is used to represent the globally unique identifier of the recording interface corresponding to the recording traffic. It can be obtained by calculating the recording interface by using the MD5 algorithm, or by calculating the recording interface by using the hash algorithm, or by using other algorithms. The embodiment of the application does not specifically limit this.

录制流量为用户终端向业务服务器发送的业务请求,业务服务器响应该业务请求的过程中所产生的业务流量,并将业务流量通过预设的请求方式发送给电子设备,由电子设备中的流量录制平台对其进行录制而成。可以理解的是,一个业务请求可以对应一条录制流量。业务服务器可以通过http请求的方式将业务流量发送给电子设备。该http请求中包含有录制接口,因此,可以该http请求中提取录制接口。Recorded traffic is the business request sent by the user terminal to the business server, and the business server responds to the business traffic generated in the process of the business request, and sends the business traffic to the electronic device through the preset request method, and is recorded by the traffic in the electronic device It is recorded by the platform. It can be understood that one service request can correspond to one recording traffic. The service server can send the service flow to the electronic device through an http request. The http request includes a recording interface, so the recording interface can be extracted from the http request.

第二标识用于表征录制流量全局唯一性的标识,为对录制流量对应的子调用集合和业务标签按照预设算法对其计算获得,该预设算法可以为单向函数算法,该单向函数算法难以有函数输出的结果,反推输入的具体数据。其中,单向函数算法可以为密码散列函数,具体可以为MD5算法、哈希算法等。其中,录制流量中包含了多个子调用集合,其中多个子调用集合中可以包含主线程子调用,还可以包含异步线程子调用。The second identifier is used to characterize the globally unique identifier of the recorded traffic, which is obtained by calculating the sub-call set and service label corresponding to the recorded traffic according to a preset algorithm. The preset algorithm can be a one-way function algorithm, and the one-way function It is difficult for the algorithm to have the result of the function output, and reverse the input specific data. Wherein, the one-way function algorithm may be a cryptographic hash function, specifically an MD5 algorithm, a hash algorithm, or the like. Wherein, the recording traffic includes multiple sub-call sets, wherein the multiple sub-call sets may include main thread sub-calls, and may also include asynchronous thread sub-calls.

业务标签用于表征录制流量的类型,其可以通过录制流量中的一些特定字段确定。可以理解的是,录制流量可以对应一个业务标签,也可以对应多个业务标签,还可能没有对应的业务标签。对于没有业务标签的情况,电子设备可根据子调用集合生成第二标识。The service label is used to represent the type of recorded traffic, which can be determined by some specific fields in the recorded traffic. It can be understood that the recorded traffic may correspond to one service label, or may correspond to multiple service labels, or there may be no corresponding service label. For the case of no service tag, the electronic device may generate the second identifier according to the sub-call set.

在步骤102中,由于同一个录制接口中可能包括多个场景的流量数据,且同一场景可能包括多条流量数据,需要将同一录制接口中属于同一场景下的多条流量数据进行去重。并且,不同录制接口,即便包含的流量数据是相同的,那么也认为是不同的流量。因此,为了能够提高流量去重的精准度,可以用第一标识区分录制接口,用第二标识区分流量数据,从而,根据第一标识和第二标识对待去重流量的数据进行去重。In step 102, since the same recording interface may include traffic data of multiple scenarios, and the same scenario may include multiple pieces of traffic data, it is necessary to deduplicate the multiple pieces of traffic data belonging to the same scenario in the same recording interface. Moreover, even if the traffic data contained in different recording interfaces is the same, they are considered as different traffic. Therefore, in order to improve the accuracy of traffic deduplication, the first identifier can be used to distinguish the recording interface, and the second identifier can be used to distinguish the traffic data, so that deduplication is performed on the data to be deduplicated according to the first identifier and the second identifier.

本申请实施例通过利用录制接口对应的第一标识、用业务标签和子调用集合生成的第二标识进行流量去重,提高了流量去重的效率及精准性。In this embodiment of the present application, traffic deduplication is performed by using the first identifier corresponding to the recording interface and the second identifier generated by using the service label and the sub-call set, thereby improving the efficiency and accuracy of traffic deduplication.

在上述实施例的基础上,所述根据每条流量数据对应的第一标识和第二标识,对所述待去重流量数据进行去重,包括:On the basis of the above embodiments, the deduplication of the traffic data to be deduplicated according to the first identifier and the second identifier corresponding to each piece of traffic data includes:

根据所述第一标识和所述第二标识确定所述录制流量中相同的流量数据;determining the same traffic data in the recorded traffic according to the first identifier and the second identifier;

将所述相同的流量数据进行部分剔除,只保留一条流量数据,以实现对所述待去重流量数据的去重。Part of the same traffic data is removed, and only one piece of traffic data is kept, so as to realize the deduplication of the traffic data to be deduplicated.

在具体的实施过程中,可以采用spark去重方法进行去重。其中,spark包括多种去重方法,例如:distinct去重,group by操作,row_number开窗操作等。其中,distinct去重是将某一个或多个字段作为去重依据,当两条数据中的字段对应的值相同,则认为这两条数据是相同的数据,只保留其中一条数据。group by操作是将去重列作为聚合字段,通过聚合实现去重的目的,例如:将第一标识和第二标识作为聚合字段,按照第一标识和第二标识对流量数据进行聚合,只保留聚合后的一条流量数据。从而实现对待去重流量数据的去重操作。In the specific implementation process, the deduplication method of spark can be used for deduplication. Among them, spark includes a variety of deduplication methods, such as: distinct deduplication, group by operation, row_number window operation, etc. Among them, distinct deduplication is to use one or more fields as the basis for deduplication. When the values corresponding to the fields in two pieces of data are the same, the two pieces of data are considered to be the same data, and only one of the pieces of data is kept. The group by operation is to use the deduplication column as an aggregation field to achieve the purpose of deduplication through aggregation. For example: use the first identifier and the second identifier as the aggregation field, and aggregate the traffic data according to the first identifier and the second identifier, and only keep A piece of traffic data after aggregation. In this way, the deduplication operation of the traffic data to be deduplicated is realized.

本申请实施例中第一标识可以用于表征录制接口的唯一性,第二标识可以用于表征子调用的唯一性,因此,通过第一标识和第二标识对录制流量进行去重,可以提高去重的精准性。In the embodiment of this application, the first identifier can be used to represent the uniqueness of the recording interface, and the second identifier can be used to represent the uniqueness of the sub-call. Therefore, deduplication of the recording traffic through the first identifier and the second identifier can improve Deduplication accuracy.

在上述实施例的基础上,在获取待去重流量数据之前,所述方法还包括:On the basis of the above embodiments, before obtaining the traffic data to be deduplicated, the method further includes:

通过jvm sandbox repeater工具获取录制流量,所述录制流量包括子调用集合;Obtain the recording traffic through the jvm sandbox repeater tool, and the recording traffic includes a set of sub-calls;

提取所述录制流量的录制接口并根据所述录制接口生成第一标识;extracting the recording interface of the recorded traffic and generating a first identifier according to the recording interface;

生成所述录制流量对应的业务标签,并根据所述业务标签和所述子调用集合生成第二标识。Generate a service label corresponding to the recorded traffic, and generate a second identifier according to the service label and the sub-call set.

在具体的实施过程中,录制流量是电子设备中运行的jvm sandbox repeater工具对业务请求进行录制获得,录制流量的相关解释参见上述实施例,此处不再赘述。业务服务器在响应用户端发送的业务请求时,可能会进行多次调用,因此,录制流量中包括多个子调用对应的流量数据,多个子调用对应的流量数据构成了子调用集合。In the specific implementation process, the recorded traffic is obtained by recording the service request by the jvm sandbox repeater tool running in the electronic device. For the relevant explanation of the recorded traffic, refer to the above-mentioned embodiments, and will not be repeated here. When the service server responds to the service request sent by the client, it may make multiple calls. Therefore, the recorded traffic includes traffic data corresponding to multiple sub-calls, and the traffic data corresponding to multiple sub-calls constitutes a sub-call set.

业务服务器向电子设备上报录制流量时,在上报请求中会携带有录制接口,例如:可以通过入口调用http请求中提取到录制接口。在获取到录制接口后,可以按照预设的算法生成录制接口对应的第一标识,具体可参见上述实施例。When the service server reports the recording traffic to the electronic device, the reporting request will carry the recording interface, for example, the recording interface can be extracted from the http request through the ingress call. After the recording interface is obtained, a first identifier corresponding to the recording interface may be generated according to a preset algorithm, and details may be referred to the foregoing embodiments.

电子设备在获取到录制流量后,根据录制流量中的预设字段确定该录制流量对应的业务标签。可以理解的是,电子设备中可以预先存储字段与业务标签的对应关系,在获取到录制量后,将录制流量中的各个字段与预先设定的字段进行匹配,若匹配成功,则将录制流量中匹配成功的字段为预设字段。从预先存储的对应关系中确定预设字段的业务标签。After acquiring the recorded traffic, the electronic device determines the service label corresponding to the recorded traffic according to a preset field in the recorded traffic. It is understandable that the corresponding relationship between fields and business tags can be stored in the electronic device in advance. After the recording volume is obtained, each field in the recorded traffic is matched with the preset field. If the matching is successful, the recorded traffic The field that matches successfully in is the default field. The business label of the preset field is determined from the pre-stored corresponding relationship.

电子设备在获取到业务标签后,采用预设算法对业务标签和子调用集合生成第二标识,具体的预设算法可参见上述实施例,此处不再赘述。After the electronic device obtains the service tag, it uses a preset algorithm to generate a second identifier for the service tag and the sub-call set. For the specific preset algorithm, refer to the above-mentioned embodiments, which will not be repeated here.

本申请实施例通过利用录制接口对应的第一标识、用业务标签和子调用集合生成的第二标识进行流量去重,提高了流量去重的效率及精准性。In this embodiment of the present application, traffic deduplication is performed by using the first identifier corresponding to the recording interface and the second identifier generated by using the service label and the sub-call set, thereby improving the efficiency and accuracy of traffic deduplication.

在上述实施例的基础上,可以通过如下方式确定子调用的调用类型:On the basis of the foregoing embodiments, the invocation type of the sub-invocation may be determined in the following manner:

电子设备中预先存储有主线程对应的哈希值,电子设备计算子调用对应的线程的哈希值,将计算获得的哈希值与存储的主线程的哈希值进行比对,若一致,则确定该子调用为主线程子调用,若不一致,则确定该子调用为异步线程子调用。A hash value corresponding to the main thread is pre-stored in the electronic device, and the electronic device calculates the hash value of the thread corresponding to the sub-call, compares the calculated hash value with the stored hash value of the main thread, and if they are consistent, It is then determined that the sub-call is a main thread sub-call, and if not consistent, then it is determined that the sub-call is an asynchronous thread sub-call.

在上述实施例的基础上,所述根据所述业务标签和所述录制流量的子调用集合生成第二标识,包括:On the basis of the above-mentioned embodiments, the generation of the second identifier according to the service label and the sub-call set of the recorded traffic includes:

根据所述调用类型为主线程子调用的子调用生成主线程子调用集合;Generate a main thread subcall set according to the subcall of the main thread subcall according to the call type;

根据所述调用类型为异步线程子调用的子调用生成异步线程子调用集合;Generate an asynchronous thread subcall set for subcalls of asynchronous thread subcalls according to the call type;

按照预设格式将所述主线程子调用集合、所述异步线程子调用集合和所述业务标签生成目标字符串;generating a target string from the main thread sub-call set, the asynchronous thread sub-call set, and the service tag according to a preset format;

利用预设算法对所述目标字符串进行计算,获得所述第二标识。The target character string is calculated by using a preset algorithm to obtain the second identifier.

在具体的实施过程中,电子设备在确定了各个子调用的调用类型后,可以将属于主线程子调用的子调用构成主线程子调用集合,将属于异步线程子调用的子调用构成异步线程子调用集合。然后将主线程子调用集合、异步线程子调用集合和业务标签进行组合,构成一个大集合,即目标字符串。具体可以将主线程子调用集合、异步线程子调用集合和业务标签中间采用英文状态下的逗号隔开,获得目标字符串,即目标字符串=主线程子调用集合,异步线程子调用集合,业务标签。In the specific implementation process, after the electronic device determines the call type of each sub-call, it can form the sub-calls belonging to the main thread sub-call to form the main thread sub-call set, and the sub-calls belonging to the asynchronous thread sub-call to form the asynchronous thread sub-call set. Call collection. Then combine the sub-call set of the main thread, the sub-call set of the asynchronous thread and the business label to form a large set, that is, the target string. Specifically, the main thread sub-call set, the asynchronous thread sub-call set, and the business label can be separated by commas in English to obtain the target string, that is, the target string = main thread sub-call set, asynchronous thread sub-call set, business Label.

在获得目标字符串后,可以采用MD5算法或哈希算法等预设算法对目标字符串进行计算,获得第二标识。After the target character string is obtained, a preset algorithm such as MD5 algorithm or hash algorithm can be used to calculate the target character string to obtain the second identifier.

本申请实施例通过将子调用划分为主线程子调用和异步线程子调用,可以避免很多不必要的场景,再结合业务标签,提高了对录制流量去重的精准性。In the embodiment of the present application, by dividing sub-calls into main thread sub-calls and asynchronous thread sub-calls, many unnecessary scenarios can be avoided, combined with business tags, the accuracy of deduplication of recorded traffic is improved.

在上述实施例的基础上,在生成第二标识之后,所述方法还包括:On the basis of the above embodiments, after generating the second identifier, the method further includes:

将所述第一标识、所述第二标识和所述录制流量作为一条流量数据存储到搜索服务器中;storing the first identifier, the second identifier and the recorded traffic as a piece of traffic data in the search server;

相应的,所述获取待去重流量数据,包括:Correspondingly, the acquisition of the traffic data to be deduplicated includes:

从所述搜索服务器中获取预设时间段内的待去重流量数据。Obtain the traffic data to be deduplicated within a preset time period from the search server.

在具体的实施过程中,由于业务服务器中能够在短时间内产生大量的业务流量,因此电子设备可以实时获取到大量的录制流量。若实时对录制流量进行去重,则会增加电子设备的负载压力。为了降低电子设备的负载压力,本申请实施例在获取到录制流量对应的第一标识和第二标识后,将录制流量及录制流量对应的第一标识和第二标识生成一条流量数据,并将生成的流量数据存储在搜索服务器中。可以理解的是,流量数据中还可以包括录制流量对应的时间戳等信息。可以理解的是,录制流量的时间戳可以是录制流量的生成时间,也可以是录制流量存储到搜索服务器中的时间,还可以是电子设备接收到该录制流量的时间等。In a specific implementation process, since the service server can generate a large amount of service traffic in a short period of time, the electronic device can obtain a large amount of recording traffic in real time. If the recording traffic is deduplicated in real time, it will increase the load pressure on the electronic equipment. In order to reduce the load pressure on electronic equipment, in the embodiment of the present application, after obtaining the first identifier and the second identifier corresponding to the recorded traffic, the recorded traffic and the first identifier and the second identifier corresponding to the recorded traffic are generated into a piece of traffic data, and the The generated traffic data is stored in the search server. It can be understood that the traffic data may also include information such as a time stamp corresponding to the recorded traffic. It can be understood that the time stamp of the recorded traffic may be the generation time of the recorded traffic, the time when the recorded traffic is stored in the search server, or the time when the electronic device receives the recorded traffic, etc.

电子设备可以从搜索服务器中获取预设时间段内的流量数据,并将获取到的流量数据称为待去重流量数据。其中,预设时间段可以是测试人员输入的时间段,例如可以是2022年10月1日-2日的流量数据;预设时间段也可以是根据当前时间确定的时间段,例如:可以是当前时间对应的日期的前一天的流量数据。The electronic device may acquire traffic data within a preset time period from the search server, and refer to the acquired traffic data as traffic data to be deduplicated. Among them, the preset time period can be the time period input by the tester, for example, it can be the traffic data from October 1 to 2, 2022; the preset time period can also be a time period determined according to the current time, for example: it can be Traffic data of the day before the date corresponding to the current time.

可以理解的是,为了降低搜索服务器的存储压力,可以定期清理搜索服务器中的流量数据,例如:可以将距离当前时间两个星期前的流量数据清除。It can be understood that, in order to reduce the storage pressure of the search server, the traffic data in the search server can be cleared periodically, for example, the traffic data two weeks before the current time can be cleared.

由于录制流量的数据量较大,为了降低录制平台的压力,将预处理后的流量数据存储在搜索服务器中,然后从搜索服务器中读取要去重的录制流量的数据,以达到降低录制平台压力的目的。Due to the large amount of recorded traffic data, in order to reduce the pressure on the recording platform, store the preprocessed traffic data in the search server, and then read the deduplicated recorded traffic data from the search server to reduce the recording platform purpose of pressure.

图2为本申请实施例提供的另一种流量去重方法流程示意图,如图2所示,该方法包括:Fig. 2 is a schematic flow chart of another traffic deduplication method provided in the embodiment of the present application. As shown in Fig. 2, the method includes:

业务服务器判断子调用是否为异步线程:当业务服务器根据用户端发送的业务请求生成业务流量后,判断业务流量中所包含的各个子调用是否为异步线程,并对子调用进行标记。可以理解的是,判断子调用对应的线程是否为异步线程的方法参见上述实施例,此处不再赘述。且可以用“0”表示该子调用对应的线程为主线程,可以用“1”表示该子调用对应的线程为异步线程。可以理解的是,还可以采用其他标记方法对其进行标记,本申请实施例对此不作具体限定。The business server judges whether the sub-call is an asynchronous thread: After the business server generates business traffic according to the business request sent by the client, it judges whether each sub-call contained in the business traffic is an asynchronous thread, and marks the sub-call. It can be understood that, the method for judging whether the thread corresponding to the sub-call is an asynchronous thread refers to the above-mentioned embodiment, and will not be repeated here. And "0" may be used to indicate that the thread corresponding to the sub-call is the main thread, and "1" may be used to indicate that the thread corresponding to the sub-call is an asynchronous thread. It can be understood that other marking methods can also be used for marking, which is not specifically limited in this embodiment of the present application.

业务服务器向电子设备的流量录制平台(jvm sandbox repeater)发送业务流量,具体可通过http请求的方式(jvm sandbox repeater module)发送该业务流量。电子设备中的流量录制平台控制器(jvm sandbox repeater console)在接收到业务流量后对该业务流量进行录制,获得录制流量。The service server sends the service flow to the flow recording platform (jvm sandbox repeater) of the electronic device, specifically, the service flow may be sent through an http request (jvm sandbox repeater module). A flow recording platform controller (jvm sandbox repeater console) in the electronic device records the service flow after receiving the service flow to obtain the recorded flow.

电子设备从业务服务器发送的录制流量中获取录制接口,并根据录制接口生成入口MD5。The electronic device obtains the recording interface from the recording traffic sent by the service server, and generates an ingress MD5 according to the recording interface.

电子设备将主线程子调用集合、异步线程子调用集合和业务标签加入到一个大集合中,并将其用英文逗号连接变成字符串(即上述目标字符串),对该字符串进行MD5计算,获得子MD5。The electronic device adds the sub-call set of the main thread, the sub-call set of the asynchronous thread and the business label into a large set, and connects them with English commas to form a string (ie, the above-mentioned target string), and performs MD5 calculation on the string , to obtain sub-MD5.

将入口MD5、子MD5和录制流量构成一条流量数据存储到elastic search中。The ingress MD5, sub-MD5 and recorded traffic form a traffic data and store it in elastic search.

从elastic search中将多条流量数据取出,并根据入口MD5和子MD5利用spark对录制流量进行去重。Take out multiple traffic data from elastic search, and use spark to deduplicate the recorded traffic according to the ingress MD5 and sub-MD5.

将去重后的流量数据存储到预设数据库中,例如:mysql数据库。Store the deduplicated traffic data in a preset database, for example: mysql database.

本申请实施例通过利用业务标签控制流量,并结合主线程的子调用集合和异步线程的子调用集合,来保证基本该录制的业务请求基本的骨架,在提高对录制流量去重的效率的基础上,提高了去重的精准度。The embodiment of the present application controls the flow by using the business label, and combines the sub-call set of the main thread and the sub-call set of the asynchronous thread to ensure the basic skeleton of the recorded business request, on the basis of improving the efficiency of deduplication of the recorded traffic In addition, the accuracy of deduplication is improved.

图3为本申请实施例提供的一种流量去重装置结构示意图,该装置可以是电子设备上的模块、程序段或代码。应理解,该装置与上述图1方法实施例对应,能够执行图1方法实施例涉及的各个步骤,该装置具体的功能可以参见上文中的描述,为避免重复,此处适当省略详细描述。所述装置包括:数据获取模块301和去重模块302;其中:Fig. 3 is a schematic structural diagram of a traffic deduplication device provided in an embodiment of the present application, and the device may be a module, program segment or code on an electronic device. It should be understood that the device corresponds to the above-mentioned method embodiment in FIG. 1 , and can perform various steps involved in the method embodiment in FIG. 1 . The specific functions of the device can refer to the description above. To avoid repetition, detailed descriptions are appropriately omitted here. The device includes: a data acquisition module 301 and a deduplication module 302; wherein:

数据获取模块301用于获取待去重流量数据,所述待去重流量数据包括多条流量数据;其中,每条所述流量数据包括第一标识、第二标识和录制流量,所述第一标识为根据所述录制流量对应的录制接口生成,所述第二标识为根据所述录制流量对应的子调用集合和业务标签生成;The data acquisition module 301 is used to obtain traffic data to be deduplicated, and the traffic data to be deduplicated includes multiple pieces of traffic data; wherein, each piece of traffic data includes a first identifier, a second identifier, and recorded traffic, and the first The identification is generated according to the recording interface corresponding to the recording traffic, and the second identification is generated according to the sub-call set and service label corresponding to the recording traffic;

去重模块302用于根据每条流量数据对应的第一标识和第二标识,对所述待去重流量数据进行去重。The deduplication module 302 is configured to deduplicate the traffic data to be deduplicated according to the first identifier and the second identifier corresponding to each piece of traffic data.

在上述实施例的基础上,去重模块302具体用于:On the basis of the foregoing embodiments, the deduplication module 302 is specifically used for:

根据所述第一标识和所述第二标识确定所述录制流量中相同的流量数据;determining the same traffic data in the recorded traffic according to the first identifier and the second identifier;

将所述相同的流量数据进行部分剔除,只保留一条流量数据,以实现对所述待去重流量数据的去重。Part of the same traffic data is removed, and only one piece of traffic data is kept, so as to realize the deduplication of the traffic data to be deduplicated.

在上述实施例的基础上,该装置还包括流量预处理模块,用于:On the basis of the above-mentioned embodiments, the device also includes a flow preprocessing module for:

通过jvm sandbox repeater工具获取录制流量,所述录制流量包括子调用集合;Obtain the recording traffic through the jvm sandbox repeater tool, and the recording traffic includes a set of sub-calls;

提取所述录制流量的录制接口并根据所述录制接口生成第一标识;extracting the recording interface of the recorded traffic and generating a first identifier according to the recording interface;

生成所述录制流量对应的业务标签,并根据所述业务标签和所述子调用集合生成第二标识。Generate a service label corresponding to the recorded traffic, and generate a second identifier according to the service label and the sub-call set.

在上述实施例的基础上,所述子调用集合包括至少一个子调用;且所述至少一个子调用对应的调用类型为业务服务器预先根据所述子调用对应的线程的哈希值确定;其中,所述调用类型包括主线程子调用和异步线程子调用。On the basis of the above embodiments, the set of sub-calls includes at least one sub-call; and the call type corresponding to the at least one sub-call is determined by the service server in advance according to the hash value of the thread corresponding to the sub-call; wherein, The call types include main thread sub-call and asynchronous thread sub-call.

在上述实施例的基础上,流量预处理模块具体用于:On the basis of the above embodiments, the traffic preprocessing module is specifically used for:

根据所述调用类型为主线程子调用的子调用生成主线程子调用集合;Generate a main thread subcall set according to the subcall of the main thread subcall according to the call type;

根据所述调用类型为异步线程子调用的子调用生成异步线程子调用集合;Generate an asynchronous thread subcall set for subcalls of asynchronous thread subcalls according to the call type;

按照预设格式将所述主线程子调用集合、所述异步线程子调用集合和所述业务标签生成目标字符串;generating a target string from the main thread sub-call set, the asynchronous thread sub-call set, and the service tag according to a preset format;

利用预设算法对所述目标字符串进行计算,获得所述第二标识。The target character string is calculated by using a preset algorithm to obtain the second identifier.

在上述实施例的基础上,该装置还包括数据存储模块,用于:On the basis of the foregoing embodiments, the device also includes a data storage module for:

将所述第一标识、所述第二标识和所述录制流量作为一条流量数据存储到搜索服务器中;storing the first identifier, the second identifier and the recorded traffic as a piece of traffic data in the search server;

相应的,流量预处理模块具体用于:Correspondingly, the traffic preprocessing module is specifically used for:

从所述搜索服务器中获取预设时间段内的待去重流量数据。Obtain the traffic data to be deduplicated within a preset time period from the search server.

在上述实施例的基础上,流量预处理模块具体用于:On the basis of the above embodiments, the traffic preprocessing module is specifically used for:

从每条流量数据中提取预设字段;Extract preset fields from each flow data;

从预先存储的字段标签对应关系中确定与所述预设字段相匹配的业务标签。The service tag matching the preset field is determined from the pre-stored field tag correspondence.

图4为本申请实施例提供的电子设备实体结构示意图,如图4所示,所述电子设备,包括:处理器(processor)401、存储器(memory)402和总线403;其中,FIG. 4 is a schematic diagram of the physical structure of the electronic device provided by the embodiment of the present application. As shown in FIG. 4, the electronic device includes: a processor (processor) 401, a memory (memory) 402, and a bus 403; wherein,

所述处理器401和存储器402通过所述总线403完成相互间的通信;The processor 401 and the memory 402 communicate with each other through the bus 403;

所述处理器401用于调用所述存储器402中的程序指令,以执行上述各方法实施例所提供的方法,例如包括:获取待去重流量数据,所述待去重流量数据包括多条流量数据;其中,每条所述流量数据包括第一标识、第二标识和录制流量,所述第一标识为根据所述录制流量对应的录制接口生成,所述第二标识为根据所述录制流量对应的子调用集合和业务标签生成;根据每条流量数据对应的第一标识和第二标识,对所述待去重流量数据进行去重。The processor 401 is used to call the program instructions in the memory 402 to execute the methods provided by the above method embodiments, for example, including: acquiring traffic data to be deduplicated, the traffic data to be deduplicated includes multiple traffic data; wherein, each piece of traffic data includes a first identification, a second identification and recording traffic, the first identification is generated according to the recording interface corresponding to the recording traffic, and the second identification is based on the recording traffic The corresponding sub-call set and service label are generated; according to the first identifier and the second identifier corresponding to each piece of traffic data, deduplication is performed on the traffic data to be deduplicated.

处理器401可以是一种集成电路芯片,具有信号处理能力。上述处理器401可以是通用处理器,包括中央处理器(Central Processing Unit,CPU)、网络处理器(NetworkProcessor,NP)等;还可以是数字信号处理器(DSP)、专用集成电路(ASIC)、现场可编程门阵列(FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。其可以实现或者执行本申请实施例中公开的各种方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。The processor 401 may be an integrated circuit chip with signal processing capability. Above-mentioned processor 401 can be general-purpose processor, comprises central processing unit (Central Processing Unit, CPU), network processor (NetworkProcessor, NP) etc.; Can also be digital signal processor (DSP), application-specific integrated circuit (ASIC), Field programmable gate array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components. It can realize or execute various methods, steps and logic block diagrams disclosed in the embodiments of the present application. A general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.

存储器402可以包括但不限于随机存取存储器(Random Access Memory,RAM),只读存储器(Read Only Memory,ROM),可编程只读存储器(Programmable Read-OnlyMemory,PROM),可擦除只读存储器(Erasable Programmable Read-Only Memory,EPROM),电可擦除只读存储器(Electrically Erasable Programmable Read-Only Memory,EEPROM)等。Memory 402 may include but not limited to random access memory (Random Access Memory, RAM), read-only memory (Read Only Memory, ROM), programmable read-only memory (Programmable Read-Only Memory, PROM), erasable read-only memory (Erasable Programmable Read-Only Memory, EPROM), Electrically Erasable Programmable Read-Only Memory (Electrically Erasable Programmable Read-Only Memory, EEPROM), etc.

本实施例公开一种计算机程序产品,所述计算机程序产品包括存储在非暂态计算机可读存储介质上的计算机程序,所述计算机程序包括程序指令,当所述程序指令被计算机执行时,计算机能够执行上述各方法实施例所提供的方法,例如包括:获取待去重流量数据,所述待去重流量数据包括多条流量数据;其中,每条所述流量数据包括第一标识、第二标识和录制流量,所述第一标识为根据所述录制流量对应的录制接口生成,所述第二标识为根据所述录制流量对应的子调用集合和业务标签生成;根据每条流量数据对应的第一标识和第二标识,对所述待去重流量数据进行去重。This embodiment discloses a computer program product, the computer program product includes a computer program stored on a non-transitory computer-readable storage medium, the computer program includes program instructions, and when the program instructions are executed by the computer, the computer The methods provided by the above method embodiments can be executed, for example, including: acquiring traffic data to be deduplicated, the traffic data to be deduplicated includes multiple pieces of traffic data; wherein, each piece of traffic data includes a first identifier, a second Identifying and recording traffic, the first identification is generated according to the recording interface corresponding to the recording traffic, the second identification is generated according to the sub-call set and service label corresponding to the recording traffic; according to the corresponding The first identifier and the second identifier deduplicate the traffic data to be deduplicated.

本实施例提供一种非暂态计算机可读存储介质,所述非暂态计算机可读存储介质存储计算机指令,所述计算机指令使所述计算机执行上述各方法实施例所提供的方法,例如包括:获取待去重流量数据,所述待去重流量数据包括多条流量数据;其中,每条所述流量数据包括第一标识、第二标识和录制流量,所述第一标识为根据所述录制流量对应的录制接口生成,所述第二标识为根据所述录制流量对应的子调用集合和业务标签生成;根据每条流量数据对应的第一标识和第二标识,对所述待去重流量数据进行去重。This embodiment provides a non-transitory computer-readable storage medium, the non-transitory computer-readable storage medium stores computer instructions, and the computer instructions cause the computer to execute the methods provided in the foregoing method embodiments, for example including : Obtain traffic data to be deduplicated, the traffic data to be deduplicated includes multiple pieces of traffic data; wherein, each piece of traffic data includes a first identifier, a second identifier, and recorded traffic, and the first identifier is based on the The recording interface corresponding to the recording traffic is generated, and the second identification is generated according to the sub-call set and the service label corresponding to the recording traffic; according to the first identification and the second identification corresponding to each piece of traffic data, the to-be-deduplicated Traffic data is deduplicated.

在本申请所提供的实施例中,应该理解到,所揭露装置和方法,可以通过其它的方式实现。以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,又例如,多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些通信接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the embodiments provided in this application, it should be understood that the disclosed devices and methods may be implemented in other ways. The device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components can be combined or May be integrated into another system, or some features may be ignored, or not implemented. In another point, the mutual coupling or direct coupling or communication connection shown or discussed may be through some communication interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.

另外,作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。In addition, a unit described as a separate component may or may not be physically separated, and a component displayed as a unit may or may not be a physical unit, that is, it may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

再者,在本申请各个实施例中的各功能模块可以集成在一起形成一个独立的部分,也可以是各个模块单独存在,也可以两个或两个以上模块集成形成一个独立的部分。Furthermore, each functional module in each embodiment of the present application may be integrated to form an independent part, each module may exist independently, or two or more modules may be integrated to form an independent part.

在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。In this document, relational terms such as first and second etc. are used only to distinguish one entity or operation from another without necessarily requiring or implying any such relationship between these entities or operations. Actual relationship or sequence.

以上所述仅为本申请的实施例而已,并不用于限制本申请的保护范围,对于本领域的技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。The above descriptions are only examples of the present application, and are not intended to limit the scope of protection of the present application. For those skilled in the art, various modifications and changes may be made to the present application. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of this application shall be included within the protection scope of this application.

Claims (10)

1. A method for traffic deduplication, comprising:
acquiring data of a flow to be deduplicated, wherein the data of the flow to be deduplicated comprises a plurality of pieces of flow data; each piece of traffic data comprises a first identifier, a second identifier and recording traffic, wherein the first identifier is generated according to a recording interface corresponding to the recording traffic, and the second identifier is generated according to a sub-call set and a service label corresponding to the recording traffic;
and according to the first identifier and the second identifier corresponding to each piece of flow data, carrying out deduplication on the flow data to be deduplicated.
2. The method according to claim 1, wherein the performing deduplication on the to-be-deduplicated flow data according to the first identifier and the second identifier corresponding to each flow data includes:
determining the same stream data in the recording stream according to the first identifier and the second identifier;
and performing partial elimination on the same flow data, and only reserving one flow data to realize the deduplication of the flow data to be deduplicated.
3. The method of claim 1, wherein prior to obtaining the data of the stream to be deduplicated, the method further comprises:
acquiring a recording flow through a jvm sandbox repeater tool, wherein the recording flow comprises a sub-call set;
extracting a recording interface for recording the flow and generating a first identifier according to the recording interface;
and generating a service label corresponding to the recording flow, and generating a second identifier according to the service label and the sub-call set.
4. The method of claim 3, wherein the set of sub-calls comprises at least one sub-call; the calling type corresponding to the at least one sub-call is determined by the service server in advance according to the hash value of the thread corresponding to the sub-call; the calling types comprise a main thread sub-call and an asynchronous thread sub-call.
5. The method of claim 4, wherein generating the second identifier according to the service label and the set of sub-calls of the recording traffic comprises:
generating a main thread sub-call set for the sub-call of the main thread sub-call according to the call type;
generating an asynchronous thread sub-call set for the sub-call of the asynchronous thread sub-call according to the call type;
generating a target character string from the main thread sub-call set, the asynchronous thread sub-call set and the service label according to a preset format;
and calculating the target character string by using a preset algorithm to obtain the second identifier.
6. The method according to claim 3, wherein the generating the service label corresponding to the recording traffic includes:
extracting a preset field from each piece of flow data;
and determining the service label matched with the preset field from the corresponding relation of the prestored field labels.
7. The method according to any of claims 3-6, wherein after generating the second identity, the method further comprises:
storing the first identifier, the second identifier and the recording flow as a piece of flow data in a search server;
correspondingly, the acquiring the data of the to-be-deduplicated flow includes:
and acquiring the data of the flow to be deduplicated within a preset time period from the search server.
8. A flow de-weighting device, comprising:
the data acquisition module is used for acquiring to-be-deduplicated flow data, and the to-be-deduplicated flow data comprises a plurality of pieces of flow data; each piece of flow data comprises a first identifier, a second identifier and a recording flow, wherein the first identifier is generated according to a recording interface corresponding to the recording flow, and the second identifier is generated according to a sub-call set and a service label corresponding to the recording flow;
and the duplication removing module is used for carrying out duplication removal on the to-be-duplicated flow data according to the first identification and the second identification corresponding to each flow data.
9. An electronic device, comprising: a processor, a memory, and a bus, wherein,
the processor and the memory are communicated with each other through the bus;
the memory stores program instructions executable by the processor, the program instructions being invoked by the processor to perform the method of any of claims 1 to 7.
10. A non-transitory computer-readable storage medium storing computer instructions which, when executed by a computer, cause the computer to perform the method of any one of claims 1-7.
CN202211268231.1A 2022-10-17 2022-10-17 Traffic deduplication method, device, electronic device and storage medium Active CN115567412B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211268231.1A CN115567412B (en) 2022-10-17 2022-10-17 Traffic deduplication method, device, electronic device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211268231.1A CN115567412B (en) 2022-10-17 2022-10-17 Traffic deduplication method, device, electronic device and storage medium

Publications (2)

Publication Number Publication Date
CN115567412A true CN115567412A (en) 2023-01-03
CN115567412B CN115567412B (en) 2025-02-14

Family

ID=84746954

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211268231.1A Active CN115567412B (en) 2022-10-17 2022-10-17 Traffic deduplication method, device, electronic device and storage medium

Country Status (1)

Country Link
CN (1) CN115567412B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104322039A (en) * 2012-12-31 2015-01-28 华为技术有限公司 System architecture, subsystem, and method for opening of telecommunication network capability
US20150135325A1 (en) * 2013-11-13 2015-05-14 ProtectWise, Inc. Packet capture and network traffic replay
CN112214395A (en) * 2020-09-02 2021-01-12 浙江大搜车融资租赁有限公司 Interface testing method based on flow data, electronic device and storage medium
CN112637005A (en) * 2020-12-08 2021-04-09 广州品唯软件有限公司 Flow playback method and device, computer equipment and storage medium
CN114710562A (en) * 2022-03-31 2022-07-05 珠海市鸿瑞信息技术股份有限公司 Big data-based equipment application log correlation analysis system and method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104322039A (en) * 2012-12-31 2015-01-28 华为技术有限公司 System architecture, subsystem, and method for opening of telecommunication network capability
US20150135325A1 (en) * 2013-11-13 2015-05-14 ProtectWise, Inc. Packet capture and network traffic replay
CN112214395A (en) * 2020-09-02 2021-01-12 浙江大搜车融资租赁有限公司 Interface testing method based on flow data, electronic device and storage medium
CN112637005A (en) * 2020-12-08 2021-04-09 广州品唯软件有限公司 Flow playback method and device, computer equipment and storage medium
CN114710562A (en) * 2022-03-31 2022-07-05 珠海市鸿瑞信息技术股份有限公司 Big data-based equipment application log correlation analysis system and method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
唐志斌;: "网络数据采集及安全审计技术研究综述", 网络新媒体技术, no. 01, 15 January 2020 (2020-01-15), pages 15 - 24 *

Also Published As

Publication number Publication date
CN115567412B (en) 2025-02-14

Similar Documents

Publication Publication Date Title
CN111522989B (en) Method, computing device, and computer storage medium for image retrieval
CN110659298B (en) Financial data processing method and device, computer equipment and storage medium
WO2019148712A1 (en) Phishing website detection method, device, computer equipment and storage medium
CN111460131A (en) Method, device, device and computer-readable storage medium for extracting official document abstract
CN111782595B (en) Massive file management method, device, computer equipment and readable storage medium
CN111178949B (en) Service resource matching reference data determining method, device, equipment and storage medium
CN112559526A (en) Data table export method and device, computer equipment and storage medium
CN110321246B (en) Fault processing method and device
CN109785867B (en) Double-recording flow configuration method and device, computer equipment and storage medium
CN109933502B (en) Electronic device, user operation record processing method and storage medium
CN114066533A (en) Product recommendation method and device, electronic equipment and storage medium
CN117971873A (en) Method and device for generating Structured Query Language (SQL) and electronic equipment
CN112347201B (en) Information processing method, information processing device and terminal equipment
CN108460116B (en) Search method, search device, computer equipment, storage medium and search system
CN113706249A (en) Data recommendation method and device, electronic equipment and storage medium
CN110134712B (en) Entity information mark processing method and device, storage medium and terminal
CN115567412A (en) Traffic deduplication method, device, electronic equipment and storage medium
CN111046077A (en) Data acquisition method and device, storage medium and terminal
CN111489207A (en) Evaluation information writing method and device based on block chain system and hardware equipment
CN116955856A (en) Information display method, device, electronic equipment and storage medium
CN113076485B (en) Resource recommendation method, device, equipment and storage medium based on intelligent degradation
CN111400608B (en) Data processing method and device, storage medium and electronic device
CN116795872A (en) Data query method, device, computer equipment and storage medium
CN114238823A (en) Method, apparatus, computer equipment and storage medium for accessing website
CN117874084B (en) Information sharing method and device based on big data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Country or region after: China

Address after: Room 6416, Building 13, No. 723 Tongxin Road, Hongkou District, Shanghai 200080

Applicant after: Shanghai Dewu Information Group Co.,Ltd.

Address before: Room B6-2005, No. 121 Zhongshan North 1st Road, Hongkou District, Shanghai

Applicant before: SHANGHAI SHIZHUANG INFORMATION TECHNOLOGY Co.,Ltd.

Country or region before: China

GR01 Patent grant