CN113010542A - Service data processing method and device, computer equipment and storage medium - Google Patents
Service data processing method and device, computer equipment and storage medium Download PDFInfo
- Publication number
- CN113010542A CN113010542A CN202110271789.4A CN202110271789A CN113010542A CN 113010542 A CN113010542 A CN 113010542A CN 202110271789 A CN202110271789 A CN 202110271789A CN 113010542 A CN113010542 A CN 113010542A
- Authority
- CN
- China
- Prior art keywords
- service data
- data
- real
- kafka
- time service
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/242—Query formulation
- G06F16/2433—Query languages
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/248—Presentation of query results
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/50—Network services
- H04L67/55—Push-based network services
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Signal Processing (AREA)
- Computer Networks & Wireless Communication (AREA)
- Quality & Reliability (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the application belongs to the field of big data and relates to a service data processing method, which comprises the steps of obtaining real-time service data; pushing the real-time service data to Kafka; acquiring the real-time service data in the Kafka through a flink, and adding corresponding off-line service data to the real-time service data to obtain a service data body; and pushing the service data body to a drive through the Kafka, and summarizing and calculating the service data body through the drive to obtain inventory service data. The application also provides a business data processing device, computer equipment and a storage medium. In addition, the application also relates to a block chain technology, and the inventory service data can be stored in the block chain. The method and the device improve the service data processing efficiency.
Description
Technical Field
The present application relates to the field of big data technologies, and in particular, to a method and an apparatus for processing service data, a computer device, and a storage medium.
Background
In a service system associated with real-time transaction, it is often necessary to perform summary statistics on a large amount of offline service data, and the offline service data is usually associated with real-time service data, so that the service system needs to process a large amount of real-time service data and offline service data at the same time.
However, the data computing power of the business system is limited, and in the face of mass data, the business system often presents the problem of insufficient computing power, resulting in low business data processing efficiency. In order to solve the above problems, the conventional solution is usually read-write separation, that is, a service system is built on a read server and a write server to isolate reading of real-time service data from calculation of offline service data, but the architecture has high cost and large resource consumption, does not really improve data processing capability, and cannot effectively solve the problem of low processing efficiency of service data.
Disclosure of Invention
An object of the embodiments of the present application is to provide a method and an apparatus for processing service data, a computer device, and a storage medium, so as to solve the problem of low efficiency of processing service data.
In order to solve the foregoing technical problem, an embodiment of the present application provides a service data processing method, which adopts the following technical solutions:
acquiring real-time service data;
pushing the real-time service data to Kafka;
acquiring the real-time service data in the Kafka through a flink, and adding corresponding off-line service data to the real-time service data to obtain a service data body;
and pushing the service data body to a drive through the Kafka, and summarizing and calculating the service data body through the drive to obtain inventory service data.
Further, before the step of obtaining the real-time service data in Kafka through the flink, and adding the corresponding offline service data to the real-time service data to obtain a service data volume, the method further includes:
inquiring a link transmission mode of the real-time service data;
when the link transmission mode is a first transmission mode, executing the step of obtaining the real-time service data in the Kafka through the flink, and adding corresponding offline service data to the real-time service data to obtain a service data body;
and when the link transmission mode is a second transmission mode, determining the real-time service data as a service data body, and executing the step of pushing the service data body to the drive through the Kafka so as to perform summary calculation on the service data body through the drive to obtain inventory service data.
Further, the step of obtaining the real-time service data in Kafka through the flink, and adding corresponding offline service data to the real-time service data to obtain a service data body includes:
pushing the real-time service data to a flash through the kafka;
acquiring keywords in the real-time service data;
acquiring offline service data corresponding to the real-time service data from an offline database according to the keywords;
and combining the offline service data and the real-time service data through the flink to obtain a service data body.
Further, after the step of pushing the real-time service data to the flink by the kafka, the method further includes:
acquiring a data identifier of the real-time service data;
determining repeated data in the real-time service data according to the acquired data identifier;
and performing duplicate removal processing on the repeated data in the real-time service data to obtain the duplicate-removed real-time service data.
Further, the step of pushing the service data body to a pipeline by the Kafka to perform summary calculation on the service data body by the pipeline to obtain inventory service data includes:
pushing the service data body to a drive through kafka;
acquiring a processing strategy corresponding to the service data body;
the druid is instructed to conduct data recombination on the service data body according to the processing strategy to obtain a data wide table;
calculating the data wide table according to the processing strategy to obtain a calculation result;
and obtaining inventory business data according to the calculation result and the data width table.
Further, after the step of pushing the service data volume to a pipeline by the Kafka to perform summary calculation on the service data volume by the pipeline to obtain inventory service data, the method further includes:
receiving a service data query instruction sent by a terminal;
inquiring inventory service data in the drive according to the service data inquiry instruction;
and displaying the inquired inventory business data through the terminal.
Further, the step of querying inventory business data in the pipeline according to the business data query instruction includes:
extracting query statements in the business data query instruction;
converting the query statement into a pipeline query statement conforming to the pipeline grammar;
and running the druid query statement to query inventory business data from the druid.
In order to solve the above technical problem, an embodiment of the present application further provides a service data processing apparatus, which adopts the following technical solutions:
the data acquisition module is used for acquiring real-time service data;
the data pushing module is used for pushing the real-time service data to Kafka;
the offline adding module is used for acquiring the real-time service data in the Kafka through the flink, and adding corresponding offline service data to the real-time service data to obtain a service data body;
and the data calculation module is used for pushing the service data body to the drive through the Kafka so as to perform summary calculation on the service data body through the drive to obtain inventory service data.
In order to solve the above technical problem, an embodiment of the present application further provides a computer device, which adopts the following technical solutions:
acquiring real-time service data;
pushing the real-time service data to Kafka;
acquiring the real-time service data in the Kafka through a flink, and adding corresponding off-line service data to the real-time service data to obtain a service data body;
and pushing the service data body to a drive through the Kafka, and summarizing and calculating the service data body through the drive to obtain inventory service data.
In order to solve the above technical problem, an embodiment of the present application further provides a computer-readable storage medium, which adopts the following technical solutions:
acquiring real-time service data;
pushing the real-time service data to Kafka;
acquiring the real-time service data in the Kafka through a flink, and adding corresponding off-line service data to the real-time service data to obtain a service data body;
and pushing the service data body to a drive through the Kafka, and summarizing and calculating the service data body through the drive to obtain inventory service data.
Compared with the prior art, the embodiment of the application mainly has the following beneficial effects: the method comprises the steps of splitting service data into real-time service data and offline service data, pushing the real-time service data to a flink after the real-time service data are obtained from a service system, adding corresponding offline service data to the real-time service data through the flink to obtain a service data body, and reducing the processing of the service system on the service data; the flink can push the service data body to the pipeline through kafka, wherein the kafka is a high-throughput data transmission pipeline, and the data circulation speed is improved; the drive has strong data calculation capacity, can perform summary calculation on the service data body according to the set processing logic to obtain inventory service data, and further improves the processing efficiency of the service data.
Drawings
In order to more clearly illustrate the solution of the present application, the drawings needed for describing the embodiments of the present application will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and that other drawings can be obtained by those skilled in the art without inventive effort.
FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;
FIG. 2 is a flow diagram of one embodiment of a business data processing method according to the present application;
FIG. 3 is a schematic block diagram of an embodiment of a business data processing apparatus according to the present application;
FIG. 4 is a schematic block diagram of one embodiment of a computer device according to the present application.
Detailed Description
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "including" and "having," and any variations thereof, in the description and claims of this application and the description of the above figures are intended to cover non-exclusive inclusions. The terms "first," "second," and the like in the description and claims of this application or in the above-described drawings are used for distinguishing between different objects and not for describing a particular order.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings.
As shown in fig. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The terminal devices 101, 102, 103 may have various communication client applications installed thereon, such as a web browser application, a shopping application, a search application, an instant messaging tool, a mailbox client, social platform software, and the like.
The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture experts Group Audio Layer III, mpeg compression standard Audio Layer 3), MP4 players (Moving Picture experts Group Audio Layer IV, mpeg compression standard Audio Layer 4), laptop portable computers, desktop computers, and the like.
The server 105 may be a server providing various services, such as a background server providing support for pages displayed on the terminal devices 101, 102, 103. The business systems, flink, and draid in the present application may be hosted in the server 105. The server 105 may be one server or a plurality of servers. When the server 105 is a plurality of servers, the service system, the flash, and the drain may be respectively installed in one server.
It should be noted that, the service data processing method provided in the embodiment of the present application is generally executed by a server, and accordingly, the service data processing apparatus is generally disposed in the server.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
With continuing reference to FIG. 2, a flow diagram of one embodiment of a business data processing method in accordance with the present application is shown. The business data processing method comprises the following steps:
step S201, acquiring real-time service data.
In this embodiment, the electronic device (e.g., the server shown in fig. 1) on which the server method operates may communicate with the terminal through a wired connection or a wireless connection. It should be noted that the wireless connection means may include, but is not limited to, a 3G/4G connection, a WiFi connection, a bluetooth connection, a WiMAX connection, a Zigbee connection, a uwb (ultra wideband) connection, and other wireless connection means now known or developed in the future.
The real-time service data is a part of the service data, and the real-time service data may be data that changes in real time in the service transaction. For example, the real-time business data may be a premium paid by the client on a policy every month.
Specifically, the real-time service data comes from a service system, the service system is used for processing service transactions, and when a service associated with the service system changes, the service system generates the real-time service data. The server acquires real-time service data from the service system.
Step S202, pushing the real-time service data to Kafka.
Specifically, Kafka is an open source stream processing platform, written in Scala and Java. Kafka is a high-throughput distributed publish-subscribe messaging system, which can be used as an information transmission pipeline to efficiently transmit information. Real-time service data acquired from the service system will be pushed to kafka.
In one embodiment, the server monitors the service system, and when it is monitored that the service system generates new real-time service data, the real-time service data is pushed to the kafka in a multithreading asynchronous mode.
Step S203, acquiring the real-time service data in Kafka through the flink, and adding corresponding offline service data to the real-time service data to obtain a service data body.
The flink is an open source stream processing framework, and the core of the flink is a distributed stream data stream engine written in Java and Scala. Flink executes arbitrary stream data programs in a data parallel and pipelined manner, and Flink's pipelined runtime can execute batch and stream processing programs. For a certain service, the service may have real-time service data and offline service data, and the whole formed by the real-time service data and the offline service data is a service data body.
The offline service data is a part of the service data, is low in timeliness and does not change frequently in service transaction. For example, in an insurance service scenario, the username in the policy may be offline service data.
Specifically, the real-time service data in kafka can be acquired by flink. The Flink is used as a data processing party, and can extract off-line service data associated with the received real-time service data from a database for storing the off-line service data, and add the off-line service data to the real-time service data to obtain a complete service data body.
And step S204, pushing the service data body to the pipeline through Kafka, and summarizing and calculating the service data body through the pipeline to obtain inventory service data.
The drive is a distributed data Processing system supporting real-time multidimensional OLAP (Online Analytical Processing) analysis. The method supports high-speed real-time data intake processing and real-time and flexible multi-dimensional data analysis and query. The druid can be used for flexible and quick multi-dimensional OLAP analysis under the background of big data. In addition, the pipeline supports pre-polymerization uptake and aggregation analysis of data based on the time stamp, and therefore the pipeline is used in some cases in the context of data processing analysis.
Specifically, after the flink generates the service data body, the service data body can be pushed to the pipeline through kafka. The user can process the service data body through the pipeline, for example, the service data body is calculated, counted, and reconstructed by the wide table through the pipeline, so as to obtain inventory service data, thereby implementing one-stop processing on the service data. The resulting inventory business data may be stored in the drive.
It is emphasized that, in order to further ensure the privacy and security of the inventory service data, the inventory service data may also be stored in a node of a block chain.
The block chain referred by the application is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
In this embodiment, the service data is split into real-time service data and offline service data, after the real-time service data is obtained from the service system, the real-time service data is pushed to the flink, and the flink adds the corresponding offline service data to the real-time service data to obtain a service data volume, thereby reducing the processing of the service system on the service data; the flink can push the service data body to the pipeline through kafka, wherein the kafka is a high-throughput data transmission pipeline, and the data circulation speed is improved; the drive has strong data calculation capacity, can perform summary calculation on the service data body according to the set processing logic to obtain inventory service data, and further improves the processing efficiency of the service data.
Further, before step S203, the method may further include: inquiring a link transmission mode of real-time service data; when the link transmission mode is the first transmission mode, acquiring real-time service data in Kafka through flink, and adding corresponding offline service data to the real-time service data to obtain a service data body; and when the link transmission mode is the second transmission mode, determining the real-time service data as a service data body, pushing the service data body to the drive through Kafka, and summarizing and calculating the service data body through the drive to obtain inventory service data.
The link transmission mode may be a transmission sequence of the service data.
In particular, real-time traffic data from a traffic system may have more than one link transport. In a typical link transmission mode, when offline service data corresponding to real-time service data needs to be processed, a first transmission mode can be selected, the real-time service data is pushed to kafka, the real-time service data is pushed to flink by the kafka, the corresponding offline service data is added to the real-time service data by the flink, and a service data body is obtained; when the real-time service data is complete and can meet the processing requirement, a second transmission mode can be selected, the real-time service data is used as a service data body and is pushed to the drive through kafka, and the drive carries out summary calculation on the service data body to obtain inventory service data.
The real-time service data is related to a service, and a link transmission mode of the real-time service data from a certain service can be predefined. And inquiring the service generating the real-time service data, namely determining the link transmission mode of the real-time service data.
In the embodiment, different link transmission modes can be selected for the transmission of the real-time service data, so that various processing modes of the real-time service data are met.
Further, the step S203 may include: pushing real-time service data to the flink through the kafka; acquiring keywords in real-time service data; acquiring offline service data corresponding to the real-time service data from an offline database according to the keywords; and combining the offline service data and the real-time service data through the flink to obtain a service data body.
The keyword may be a field in the service data, and the keyword may identify a service transaction that generates real-time service data and offline service data, that is, the real-time service data and the offline service data may include the same keyword. Specifically, the flink acquires real-time service data in kafka, and the flink acquires real-time service data in kafka so as to process the real-time service data.
The real-time service data is real-time data generated in a service transaction, and when the service data is processed, corresponding offline service data, such as a name, an organization name, and the like, may be further added. If the service system merges the offline service data, the processing efficiency of the service data is affected; in order to reduce the pressure of the service system, offline service data corresponding to the real-time service data can be acquired from the offline database by the flink, and the combination of the real-time service data and the offline service data is completed by the flink.
In one embodiment, offline traffic data may be stored in hives. hive is a data warehouse tool based on Hadoop, which is used for data extraction, transformation and loading, and is a mechanism capable of storing, querying and analyzing large-scale data stored in Hadoop. The hive data warehouse tool can map the structured data file into a database table, provide SQL query function and convert SQL sentences into MapReduce tasks for execution.
The Flink can inquire the keywords in the real-time service data in the offline database, and merge the offline service data and the real-time service data which are stored corresponding to the inquired keywords to obtain a service data body.
In practical application, if the offline service data is stored in a plurality of offline databases, an operator can check the real-time service data on the data management platform, and manually configure how to acquire the offline service data corresponding to the real-time service data, for example, the offline database where the offline service data is located and which data table in the offline database can be configured, and data is acquired from the offline service database according to which fields, and the like, without writing SQL or HQL access statements, and the data management platform automatically generates the access statements according to the configured information to acquire the data, so that manual operation is saved, and data acquisition efficiency is improved. It can be understood that the operator may also write an access statement to obtain the offline service data.
In the embodiment, offline service data related to the real-time service data is acquired by the flink instead of acquiring the offline service data by the service system, so that the data processing pressure of the service system is reduced, and the service data processing efficiency is improved; the off-line service data and the real-time service data are spliced into a service data body to obtain complete service data, so that the normal processing of the service data is ensured.
Further, after the step of pushing the real-time service data to the flink by the kafka, the method further includes: acquiring a data identifier of real-time service data; determining repeated data in the real-time service data according to the acquired data identifier; and carrying out duplicate removal processing on the repeated data in the real-time service data to obtain the duplicate-removed real-time service data.
The data identifier may be an identifier of each piece of service data.
Specifically, in the application, after the consumer acquires and processes the message in the kafka due to network reasons or cluster failure, the kafka may not receive the result of the processing completion, and the message is pushed to the consumer again. Therefore, there may be duplicate data in the real-time service data. In the big data field, a consumer is often used to refer to a subject that acquires and processes data, for example, a consumer may be a data processing component flink.
The Flink can compare the data identifier of the real-time service data to determine the repeated real-time service data, and the repeated real-time service data is marked as repeated data. The data identifier may be a data number of each piece of real-time service data, for example, in an insurance service scenario, the data identifier may be a policy number. In an embodiment, the data identifier may also be a hash value obtained by performing hash operation after splicing each piece of data in the real-time service data according to a preset sequence.
The Flink can perform deduplication processing on the determined repeated data to obtain the deduplicated real-time service data. It is understood that kafka may also repeatedly push data to the pipeline, and the pipeline may also perform deduplication processing on the received service data volume.
In the embodiment, the repeated data in the real-time service data is determined and the duplicate removal processing is performed, so that the repeated processing of the real-time service data can be avoided, and the data error possibly caused by the repeated data can be avoided.
Further, the step S204 may include: pushing the service data body to the pipeline through kafka; acquiring a processing strategy corresponding to a service data body; the indication drive conducts data recombination on the service data body according to the processing strategy to obtain a data width table; calculating the data width table according to the processing strategy to obtain a calculation result; and obtaining inventory business data according to the calculation result and the data width table.
Wherein, the processing policy is information that can indicate how to process the service data body.
Specifically, the processing policy for the service data body may be configured in advance. The druid can inquire the processing strategy corresponding to the service data body according to the keywords in the service data body.
The data in the service system or hive are messy and exist one by one, and the sequencing of the data is not regular, so that the sequencing of the service data does not necessarily meet the requirement of data processing. The drive can perform data reorganization on the service data in the service data body according to a pre-configured processing strategy, the data reorganization includes field selection and data sorting, data of preset fields can be reserved, and the service data can be rearranged according to the fields to obtain a data wide table. For example, the service data may be rearranged according to the users, so that the service data generated by the same user are arranged together.
The mapping relationship between the fields in the service data body and the data width table can be configured in the JSON file in advance, and the data width table is established in the drive, but not in the flush. Data may flow into the flink at any time, if the data wide table is established in the flink, the server overhead is high, and the establishment of the data wide table by the pipeline can reduce the table establishment overhead and improve the service data processing efficiency.
The Druid can also perform statistical calculation on the data width table according to the processing strategy to obtain a calculation result. The calculation result can be written into a data wide table to obtain inventory business data. Inventory business data may be stored in the drive.
In this embodiment, the processing policy of the service data volume is obtained, and the drive performs the broad-table reconstruction and calculation on the service data volume according to the processing policy, so that the service data volume is processed, and the service data processing efficiency is improved.
Further, after the step S204, the method may further include: receiving a service data query instruction sent by a terminal; inquiring inventory service data in the drive according to the service data inquiry instruction; and displaying the inquired inventory business data through a terminal.
The data query instruction may be an instruction instructing the pipeline to perform data query.
Specifically, the present application further provides a data management platform, where the data management platform may control and manage service data processing, for example, configure a link transmission mode, a processing policy, and the like. The data management platform can also access the stock service data in the pipeline, and the data management platform provides a data query interface to realize the query of the stock service data under the condition that a service data application layer does not exist.
The data management platform provides a user page, and a user can use the user page to inquire inventory service data at a terminal and trigger a service data inquiry instruction. The user can query according to any field dimension. And the Druid inquires inventory service data according to the data inquiry command and displays the inquired inventory service data through the terminal.
In this embodiment, the druid is used to realize the query function, and the inventory service data can be checked, so that the processing result of the druid on the service data is obtained.
Further, the step of querying inventory business data in the pipeline according to the business data query instruction includes: extracting query statements in the business data query instruction; converting the query statement into a pipeline query statement conforming to the pipeline grammar; a drive query statement is run to query inventory business data from the drive.
Specifically, a user may write a query statement at the data management platform, where the query statement may be an SQL statement or an HQL statement. And after receiving the business data query instruction, the pipeline parses the business data query instruction to extract a query statement, and converts the query statement into the pipeline query statement conforming to the pipeline grammar. And the Druid runs a Druid query statement to realize the query of the inventory business data. The draid may send the queried inventory service data to the terminal for presentation through a link such as http (Hypertext Transfer Protocol) between the server and the terminal.
In this embodiment, after the query statement is extracted from the service data query instruction, the query statement is converted into a pipeline query statement conforming to the pipeline grammar, so that the pipeline can query the inventory service data.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the computer program is executed. The storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a Random Access Memory (RAM).
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
With further reference to fig. 3, as an implementation of the method shown in fig. 2, the present application provides an embodiment of a service data processing apparatus, where the embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 2, and the apparatus may be specifically applied to various electronic devices.
As shown in fig. 3, the service data processing apparatus 300 according to this embodiment includes: a data acquisition module 301, a data push module 302, an offline addition module 303, and a data calculation module 304. Wherein:
a data obtaining module 301, configured to obtain real-time service data;
a data pushing module 302, configured to push the real-time service data to Kafka;
an offline adding module 303, configured to obtain the real-time service data in Kafka through a flink, and add corresponding offline service data to the real-time service data to obtain a service data volume;
and the data calculation module 304 is configured to push the service data volume to a drive through the Kafka, so as to perform summary calculation on the service data volume through the drive, thereby obtaining inventory service data.
In this embodiment, the service data is split into real-time service data and offline service data, after the real-time service data is obtained from the service system, the real-time service data is pushed to the flink, and the flink adds the corresponding offline service data to the real-time service data to obtain a service data volume, thereby reducing the processing of the service system on the service data; the flink can push the service data body to the pipeline through kafka, wherein the kafka is a high-throughput data transmission pipeline, and the data circulation speed is improved; the drive has strong data calculation capacity, can perform summary calculation on the service data body according to the set processing logic to obtain inventory service data, and further improves the processing efficiency of the service data.
In some optional implementations of this embodiment, the service data processing apparatus 300 may further include a mode query module, where:
and the mode query module is used for querying the link transmission mode of the real-time service data.
The offline adding module 303 is further configured to, when the link transmission mode is the first transmission mode, obtain the real-time service data in Kafka through the flink, and add corresponding offline service data to the real-time service data to obtain a service data volume.
The data calculation module 304 is further configured to determine the real-time service data as a service data body when the link transmission mode is the second transmission mode, and push the service data body to the drive through Kafka, so as to perform summary calculation on the service data body through the drive, thereby obtaining the inventory service data.
In the embodiment, different link transmission modes can be selected for the transmission of the real-time service data, so that various processing modes of the real-time service data are met.
In some optional implementation manners of this embodiment, the offline adding module 303 may include a data pushing sub-module, a keyword obtaining sub-module, an offline obtaining sub-module, and a data merging sub-module, where:
and the data pushing submodule is used for pushing the real-time service data to the flink through the kafka.
And the keyword acquisition submodule is used for acquiring keywords in the real-time service data.
And the offline acquisition submodule is used for acquiring offline service data corresponding to the real-time service data from the offline database according to the keywords.
And the data merging submodule is used for merging the offline service data and the real-time service data through the flink to obtain a service data body.
In the embodiment, offline service data related to the real-time service data is acquired by the flink instead of acquiring the offline service data by the service system, so that the data processing pressure of the service system is reduced, and the service data processing efficiency is improved; the off-line service data and the real-time service data are spliced into a service data body to obtain complete service data, so that the normal processing of the service data is ensured.
In some optional implementation manners of this embodiment, the offline adding module 303 may further include an identification obtaining sub-module, a data determining sub-module, and a data deduplication sub-module, where:
and the identification obtaining submodule is used for obtaining the data identification of the real-time service data.
And the data determining submodule is used for determining the repeated data in the real-time service data according to the acquired data identification.
And the data deduplication submodule is used for performing deduplication processing on the repeated data in the real-time service data to obtain the deduplicated real-time service data.
In the embodiment, the repeated data in the real-time service data is determined and the duplicate removal processing is performed, so that the repeated processing of the real-time service data can be avoided, and the data error possibly caused by the repeated data can be avoided.
In some optional implementations of this embodiment, the data calculation module 304 may include: the system comprises a data body pushing submodule, a strategy obtaining submodule, a data recombination submodule, a wide table calculation submodule and an inventory generating submodule, wherein:
and the data body pushing submodule is used for pushing the service data body to the pipeline through kafka.
And the strategy acquisition submodule is used for acquiring the processing strategy corresponding to the service data body.
And the data recombination submodule is used for indicating the drive to carry out data recombination on the service data body according to the processing strategy to obtain a data wide table.
And the wide table calculation sub-module is used for calculating the data wide table according to the processing strategy to obtain a calculation result.
And the inventory generation submodule is used for obtaining inventory business data according to the calculation result and the data wide table.
In this embodiment, the processing policy of the service data volume is obtained, and the drive performs the broad-table reconstruction and calculation on the service data volume according to the processing policy, so that the service data volume is processed, and the service data processing efficiency is improved.
In some optional implementations of this embodiment, the service data processing apparatus 300 may further include: instruction receiving module, stock inquiry module and data display module, wherein:
and the instruction receiving module is used for receiving the service data query instruction sent by the terminal.
And the inventory query module is used for querying inventory business data in the pipeline according to the business data query instruction.
And the data display module is used for displaying the inquired stock business data through a terminal.
In this embodiment, the druid is used to realize the query function, and the inventory service data can be checked, so that the processing result of the druid on the service data is obtained.
In some optional implementations of this embodiment, the inventory querying module may include: statement extraction submodule, statement conversion submodule and statement operation submodule, wherein:
and the statement extraction submodule is used for extracting the query statement in the business data query instruction.
And the statement conversion submodule is used for converting the query statement into a pipeline query statement which accords with the pipeline grammar.
And the statement operation sub-module is used for operating the pipeline query statement so as to query the inventory business data from the pipeline.
In this embodiment, after the query statement is extracted from the service data query instruction, the query statement is converted into a pipeline query statement conforming to the pipeline grammar, so that the pipeline can query the inventory service data.
In order to solve the technical problem, an embodiment of the present application further provides a computer device. Referring to fig. 4, fig. 4 is a block diagram of a basic structure of a computer device according to the present embodiment.
The computer device 4 comprises a memory 41, a processor 42, a network interface 43 communicatively connected to each other via a system bus. It is noted that only computer device 4 having components 41-43 is shown, but it is understood that not all of the shown components are required to be implemented, and that more or fewer components may be implemented instead. As will be understood by those skilled in the art, the computer device is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and the hardware includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.
The computer device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing devices. The computer equipment can carry out man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch panel or voice control equipment and the like.
The memory 41 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, the memory 41 may be an internal storage unit of the computer device 4, such as a hard disk or a memory of the computer device 4. In other embodiments, the memory 41 may also be an external storage device of the computer device 4, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the computer device 4. Of course, the memory 41 may also include both internal and external storage devices of the computer device 4. In this embodiment, the memory 41 is generally used for storing an operating system installed in the computer device 4 and various types of application software, such as computer readable instructions of a business data processing method. Further, the memory 41 may also be used to temporarily store various types of data that have been output or are to be output.
The processor 42 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 42 is typically used to control the overall operation of the computer device 4. In this embodiment, the processor 42 is configured to execute computer readable instructions stored in the memory 41 or process data, for example, execute computer readable instructions of the service data processing method.
The network interface 43 may comprise a wireless network interface or a wired network interface, and the network interface 43 is generally used for establishing communication connection between the computer device 4 and other electronic devices.
The computer device provided in this embodiment may execute the service data processing method. The service data processing method here may be the service data processing method of the above-described embodiments.
In this embodiment, the service data is split into real-time service data and offline service data, after the real-time service data is obtained from the service system, the real-time service data is pushed to the flink, and the flink adds the corresponding offline service data to the real-time service data to obtain a service data volume, thereby reducing the processing of the service system on the service data; the flink can push the service data body to the pipeline through kafka, wherein the kafka is a high-throughput data transmission pipeline, and the data circulation speed is improved; the drive has strong data calculation capacity, can perform summary calculation on the service data body according to the set processing logic to obtain inventory service data, and further improves the processing efficiency of the service data.
The present application further provides another embodiment, which is to provide a computer-readable storage medium, wherein the computer-readable storage medium stores computer-readable instructions, which can be executed by at least one processor, so as to cause the at least one processor to execute the steps of the service data processing method as described above.
In this embodiment, the service data is split into real-time service data and offline service data, after the real-time service data is obtained from the service system, the real-time service data is pushed to the flink, and the flink adds the corresponding offline service data to the real-time service data to obtain a service data volume, thereby reducing the processing of the service system on the service data; the flink can push the service data body to the pipeline through kafka, wherein the kafka is a high-throughput data transmission pipeline, and the data circulation speed is improved; the drive has strong data calculation capacity, can perform summary calculation on the service data body according to the set processing logic to obtain inventory service data, and further improves the processing efficiency of the service data.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.
It is to be understood that the above-described embodiments are merely illustrative of some, but not restrictive, of the broad invention, and that the appended drawings illustrate preferred embodiments of the invention and do not limit the scope of the invention. This application is capable of embodiments in many different forms and is provided for the purpose of enabling a thorough understanding of the disclosure of the application. Although the present application has been described in detail with reference to the foregoing embodiments, it will be apparent to one skilled in the art that the present application may be practiced without modification or with equivalents of some of the features described in the foregoing embodiments. All equivalent structures made by using the contents of the specification and the drawings of the present application are directly or indirectly applied to other related technical fields and are within the protection scope of the present application.
Claims (10)
1. A service data processing method is characterized by comprising the following steps:
acquiring real-time service data;
pushing the real-time service data to Kafka;
acquiring the real-time service data in the Kafka through a flink, and adding corresponding off-line service data to the real-time service data to obtain a service data body;
and pushing the service data body to a drive through the Kafka, and summarizing and calculating the service data body through the drive to obtain inventory service data.
2. The method for processing service data according to claim 1, wherein before the step of obtaining the real-time service data in the Kafka through a flink, and adding corresponding offline service data to the real-time service data to obtain a service data body, the method further comprises:
inquiring a link transmission mode of the real-time service data;
when the link transmission mode is a first transmission mode, executing the step of obtaining the real-time service data in the Kafka through the flink, and adding corresponding offline service data to the real-time service data to obtain a service data body;
and when the link transmission mode is a second transmission mode, determining the real-time service data as a service data body, and executing the step of pushing the service data body to the drive through the Kafka so as to perform summary calculation on the service data body through the drive to obtain inventory service data.
3. The method for processing service data according to claim 1, wherein the step of obtaining the real-time service data in Kafka through flink and adding corresponding offline service data to the real-time service data to obtain a service data body comprises:
pushing the real-time service data to a flash through the kafka;
acquiring keywords in the real-time service data;
acquiring offline service data corresponding to the real-time service data from an offline database according to the keywords;
and combining the offline service data and the real-time service data through the flink to obtain a service data body.
4. The service data processing method according to claim 1, wherein after the step of pushing the real-time service data to the flash by the kafka, the method further comprises:
acquiring a data identifier of the real-time service data;
determining repeated data in the real-time service data according to the acquired data identifier;
and performing duplicate removal processing on the repeated data in the real-time service data to obtain the duplicate-removed real-time service data.
5. The business data processing method according to claim 1, wherein the step of pushing the business data body to a drive by the Kafka to perform summary calculation on the business data body by the drive to obtain inventory business data comprises:
pushing the service data body to a drive through kafka;
acquiring a processing strategy corresponding to the service data body;
the druid is instructed to conduct data recombination on the service data body according to the processing strategy to obtain a data wide table;
calculating the data wide table according to the processing strategy to obtain a calculation result;
and obtaining inventory business data according to the calculation result and the data width table.
6. The service data processing method according to claim 1, wherein after the step of pushing the service data body to a drive by the Kafka to perform summary calculation on the service data body by the drive to obtain inventory service data, the method further comprises:
receiving a service data query instruction sent by a terminal;
inquiring inventory service data in the drive according to the service data inquiry instruction;
and displaying the inquired inventory business data through the terminal.
7. The service data processing method according to claim 6, wherein the step of querying inventory service data in the pipeline according to the service data query instruction comprises:
extracting query statements in the business data query instruction;
converting the query statement into a pipeline query statement conforming to the pipeline grammar;
and running the druid query statement to query inventory business data from the druid.
8. A service data processing apparatus, comprising:
the data acquisition module is used for acquiring real-time service data;
the data pushing module is used for pushing the real-time service data to Kafka;
the offline adding module is used for acquiring the real-time service data in the Kafka through the flink, and adding corresponding offline service data to the real-time service data to obtain a service data body;
and the data calculation module is used for pushing the service data body to the drive through the Kafka so as to perform summary calculation on the service data body through the drive to obtain inventory service data.
9. A computer device comprising a memory having computer readable instructions stored therein and a processor which when executed implements the steps of the business data processing method of any one of claims 1 to 7.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon computer-readable instructions, which, when executed by a processor, implement the steps of the business data processing method according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110271789.4A CN113010542B (en) | 2021-03-12 | 2021-03-12 | Service data processing method, device, computer equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110271789.4A CN113010542B (en) | 2021-03-12 | 2021-03-12 | Service data processing method, device, computer equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113010542A true CN113010542A (en) | 2021-06-22 |
CN113010542B CN113010542B (en) | 2023-09-19 |
Family
ID=76406430
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110271789.4A Active CN113010542B (en) | 2021-03-12 | 2021-03-12 | Service data processing method, device, computer equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113010542B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113256355A (en) * | 2021-07-14 | 2021-08-13 | 北京宇信科技集团股份有限公司 | Method, device, medium, equipment and system for determining integral rights and interests in real time |
CN113407617A (en) * | 2021-06-25 | 2021-09-17 | 交控科技股份有限公司 | Real-time and off-line service unified processing method and device based on big data technology |
CN113934785A (en) * | 2021-09-28 | 2022-01-14 | 杭州玳数科技有限公司 | Batch data query method and device in real-time calculation and computer equipment |
CN114416849A (en) * | 2022-01-25 | 2022-04-29 | 平安科技(深圳)有限公司 | Data processing method and device, electronic equipment and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109684352A (en) * | 2018-12-29 | 2019-04-26 | 江苏满运软件科技有限公司 | Data analysis system, method, storage medium and electronic equipment |
CN109783512A (en) * | 2018-12-13 | 2019-05-21 | 平安科技(深圳)有限公司 | Data processing method, device, computer equipment and storage medium |
CN111666296A (en) * | 2020-04-28 | 2020-09-15 | 中国平安财产保险股份有限公司 | SQL data real-time processing method and device based on Flink, computer equipment and medium |
CN111953713A (en) * | 2019-05-14 | 2020-11-17 | 上海博泰悦臻网络技术服务有限公司 | Kafka data display method and device, computer readable storage medium and terminal |
-
2021
- 2021-03-12 CN CN202110271789.4A patent/CN113010542B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109783512A (en) * | 2018-12-13 | 2019-05-21 | 平安科技(深圳)有限公司 | Data processing method, device, computer equipment and storage medium |
CN109684352A (en) * | 2018-12-29 | 2019-04-26 | 江苏满运软件科技有限公司 | Data analysis system, method, storage medium and electronic equipment |
CN111953713A (en) * | 2019-05-14 | 2020-11-17 | 上海博泰悦臻网络技术服务有限公司 | Kafka data display method and device, computer readable storage medium and terminal |
CN111666296A (en) * | 2020-04-28 | 2020-09-15 | 中国平安财产保险股份有限公司 | SQL data real-time processing method and device based on Flink, computer equipment and medium |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113407617A (en) * | 2021-06-25 | 2021-09-17 | 交控科技股份有限公司 | Real-time and off-line service unified processing method and device based on big data technology |
CN113407617B (en) * | 2021-06-25 | 2024-10-11 | 交控科技股份有限公司 | Real-time and offline business unified processing method and device based on big data technology |
CN113256355A (en) * | 2021-07-14 | 2021-08-13 | 北京宇信科技集团股份有限公司 | Method, device, medium, equipment and system for determining integral rights and interests in real time |
CN113256355B (en) * | 2021-07-14 | 2021-09-17 | 北京宇信科技集团股份有限公司 | Method, device, medium, equipment and system for determining integral rights and interests in real time |
CN113934785A (en) * | 2021-09-28 | 2022-01-14 | 杭州玳数科技有限公司 | Batch data query method and device in real-time calculation and computer equipment |
CN113934785B (en) * | 2021-09-28 | 2024-09-13 | 杭州玳数科技有限公司 | Batch data query method and device in real-time calculation and computer equipment |
CN114416849A (en) * | 2022-01-25 | 2022-04-29 | 平安科技(深圳)有限公司 | Data processing method and device, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN113010542B (en) | 2023-09-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109947789B (en) | Method, device, computer equipment and storage medium for processing data of multiple databases | |
CN113010542B (en) | Service data processing method, device, computer equipment and storage medium | |
CN111666490B (en) | Information pushing method, device, equipment and storage medium based on kafka | |
CN112162965B (en) | Log data processing method, device, computer equipment and storage medium | |
CN113254445A (en) | Real-time data storage method and device, computer equipment and storage medium | |
CN109471893B (en) | Network data query method, equipment and computer readable storage medium | |
CN112860662A (en) | Data blood relationship establishing method and device, computer equipment and storage medium | |
CN115712422A (en) | Form page generation method and device, computer equipment and storage medium | |
CN113836235B (en) | Data processing method based on data center and related equipment thereof | |
US10552419B2 (en) | Method and system for performing an operation using map reduce | |
CN111797297A (en) | Page data processing method and device, computer equipment and storage medium | |
CN117217684A (en) | Index data processing method and device, computer equipment and storage medium | |
CN116450723A (en) | Data extraction method, device, computer equipment and storage medium | |
CN116956326A (en) | Authority data processing method and device, computer equipment and storage medium | |
CN113590372A (en) | Log-based link tracking method and device, computer equipment and storage medium | |
CN114968725A (en) | Task dependency relationship correction method and device, computer equipment and storage medium | |
CN114615325A (en) | Message pushing method and device, computer equipment and storage medium | |
CN110472055B (en) | Method and device for marking data | |
CN111782677A (en) | Data clustering method and device based on multiple engines, computer equipment and storage medium | |
CN113504957A (en) | Table data processing method and device, computer equipment and storage medium | |
CN112328960B (en) | Optimization method and device for data operation, electronic equipment and storage medium | |
CN117272077A (en) | Data processing method, device, computer equipment and storage medium | |
CN117056309A (en) | Service aging monitoring method and device, computer equipment and storage medium | |
CN115757373A (en) | Data warehouse cleaning method and device, computer equipment and storage medium | |
CN116932486A (en) | File generation method, device, computer equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |