CN113190558A - Data processing method and system - Google Patents
Data processing method and system Download PDFInfo
- Publication number
- CN113190558A CN113190558A CN202110507204.4A CN202110507204A CN113190558A CN 113190558 A CN113190558 A CN 113190558A CN 202110507204 A CN202110507204 A CN 202110507204A CN 113190558 A CN113190558 A CN 113190558A
- Authority
- CN
- China
- Prior art keywords
- data
- processing
- wide
- real time
- output
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 19
- 238000012545 processing Methods 0.000 claims abstract description 300
- 238000013499 data model Methods 0.000 claims abstract description 93
- 238000000034 method Methods 0.000 claims abstract description 57
- 238000004140 cleaning Methods 0.000 claims description 14
- 238000004590 computer program Methods 0.000 claims description 9
- 238000012423 maintenance Methods 0.000 abstract description 10
- 230000008569 process Effects 0.000 description 34
- 238000010586 diagram Methods 0.000 description 11
- 238000004891 communication Methods 0.000 description 7
- 230000006870 function Effects 0.000 description 6
- 230000003287 optical effect Effects 0.000 description 4
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 239000000835 fiber Substances 0.000 description 2
- 238000007726 management method Methods 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 230000000717 retained effect Effects 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 238000010923 batch production Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2282—Tablespace storage structures; Management thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/283—Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Quality & Reliability (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a data processing method and a data processing system, and relates to the technical field of big data. One embodiment of the method comprises: receiving service data in real time through a first processing module of a stream processing frame, and processing the service data in real time to output a data model and wide specification detailed data; receiving the service data in real time through a second processing module of the stream processing frame, and processing the service data in a first preset time window to output a data model and wide specification detailed data; and receiving the wide specification data sent by the first processing module and/or the second processing module through a batch processing frame, and processing the wide specification data in a second preset time window to output a data model and the wide specification data. The implementation method can solve the technical problems of low resource utilization rate, low output timeliness, difficult code maintenance, poor data consistency and the like.
Description
Technical Field
The invention relates to the technical field of big data, in particular to a data processing method and a data processing system.
Background
In the existing data processing process, according to different service scenes, data processing can be divided into real-time data (incremental data processed by a stream system) and offline data (full data processed by a batch system), and the real-time and T + N data viewing requirements are met respectively. As shown in fig. 1, in the two modes, the used technologies and languages are different, and the environment is often independent, and the intermediate data and the data model are also independent.
In the process of implementing the invention, the inventor finds that the two modes of real-time data processing and offline data processing have the following problems:
the bottom data models are inconsistent, so that a large amount of splicing logic needs to be performed on an application layer, the output timeliness is low, and the error probability is high; the two systems are respectively provided with a data model and a storage layer, and all calculate and store the full amount of data, so that the cost is high, and the resource utilization rate is low; one service logic, two sets of codes, the logic can not be reused, and the consistency and quality of data are difficult to ensure; in the task execution, the cluster cannot realize peak staggering, and the resource utilization rate is low.
Disclosure of Invention
In view of this, embodiments of the present invention provide a data processing method and system, so as to solve the technical problems of low resource utilization rate, low output timeliness, difficult code maintenance, poor data consistency, and the like.
To achieve the above object, according to an aspect of an embodiment of the present invention, there is provided a data processing method including:
receiving service data in real time through a first processing module of a stream processing frame, and processing the service data in real time to output a data model and wide specification detailed data;
receiving the service data in real time through a second processing module of the stream processing frame, and processing the service data in a first preset time window to output a data model and wide specification detailed data;
and receiving the wide specification data sent by the first processing module and/or the second processing module through a batch processing frame, and processing the wide specification data in a second preset time window to output a data model and the wide specification data.
Optionally, the first preset time window is smaller than the second preset time window.
Optionally, receiving service data in real time, and processing the service data in real time to output a data model and broad band refinement data, includes:
receiving business data pushed by a data source in real time, and cleaning the business data in real time to output a data model and wide specification detailed data; or,
the method comprises the steps of receiving business data pushed by a data source in real time, cleaning the business data in real time, obtaining dimension data from a dimension table, and processing the cleaned business data by combining the dimension data so as to output a data model and wide list detail data.
Optionally, after outputting the data model and the wide surface detailed data, the method further includes:
sending the wide manifest detail data to a second processing module of the stream processing framework and/or the batch processing framework.
Optionally, receiving the service data in real time, and processing the service data within the first preset time window to output a data model and wide-band detailed data, including:
receiving service data pushed by a data source in real time, and processing the service data in a first preset time window to output a data model and wide detailed data; and/or the presence of a gas in the gas,
and receiving the wide specification data sent by the first processing module, and processing the wide specification data in a first preset time window to output a data model and the wide specification data.
Optionally, after outputting the data model and the wide surface detailed data, the method further includes:
sending the wide manifest detail data to the batch framework.
Optionally, the stream processing framework is an Apache Flink framework, and the batch processing framework is a Hive framework.
In addition, according to another aspect of an embodiment of the present invention, there is provided a data processing system including a stream processing framework and a batch processing framework, wherein the stream processing framework includes a first processing module and a second processing module;
the first processing module is used for receiving service data in real time and processing the service data in real time so as to output a data model and wide specification detailed data;
the second processing module is used for receiving the service data in real time and processing the service data in the first preset time window so as to output a data model and wide list detail data;
the batch processing framework is used for receiving the wide specification data sent by the first processing module and/or the second processing module and processing the wide specification data in a second preset time window so as to output a data model and the wide specification data.
Optionally, the first preset time window is smaller than the second preset time window.
Optionally, the first processing module is further configured to:
receiving business data pushed by a data source in real time, and cleaning the business data in real time to output a data model and wide specification detailed data; or,
the method comprises the steps of receiving business data pushed by a data source in real time, cleaning the business data in real time, obtaining dimension data from a dimension table, and processing the cleaned business data by combining the dimension data so as to output a data model and wide list detail data.
Optionally, the first processing module is further configured to:
after outputting the data model and the wide manifest detail data, sending the wide manifest detail data to a second processing module of the stream processing framework and/or the batch processing framework.
Optionally, the second processing module is further configured to:
receiving service data pushed by a data source in real time, and processing the service data in a first preset time window to output a data model and wide detailed data; and/or the presence of a gas in the gas,
and receiving the wide specification data sent by the first processing module, and processing the wide specification data in a first preset time window to output a data model and the wide specification data.
Optionally, the second processing module is further configured to:
after outputting the data model and the wide manifest detail data, sending the wide manifest detail data to the batch framework.
Optionally, the stream processing framework is an Apache Flink framework, and the batch processing framework is a Hive framework.
According to another aspect of the embodiments of the present invention, there is also provided an electronic device, including:
one or more processors;
a storage device for storing one or more programs,
when the one or more programs are executed by the one or more processors, the one or more processors implement the method of any of the embodiments described above.
According to another aspect of the embodiments of the present invention, there is also provided a computer readable medium, on which a computer program is stored, which when executed by a processor implements the method of any of the above embodiments.
One embodiment of the above invention has the following advantages or benefits: because the technical means that the business data are processed by the stream processing frame and the batch processing frame together so as to output the data model and the wide-surface thin data is adopted, the technical problems of low resource utilization rate, low output timeliness, difficult code maintenance, poor data consistency and the like in the prior art are solved. The embodiment of the invention carries out stage processing on the data, only one full amount of data is needed, and only one time of processing is needed without overlapping, thereby improving the resource utilization rate and the output timeliness; codes of all links are unified, so that the codes are unified integrally, the data consistency is ensured, and the code maintenance difficulty can be reduced; the data aperture is unified, and no matter the requirements are modified and iterated in the later period, or the application is landed, a plurality of sets of templates are not used. Therefore, the embodiment of the invention can solve the problems of low landing efficiency of the application layer, high error possibility and the like caused by inconsistent data models. It should be noted that, in the embodiment of the present invention, the data is processed in stages, so as to improve the resource utilization rate and the yield time.
Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.
Drawings
The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:
FIG. 1 is a schematic diagram of a main flow of a data processing method in the prior art;
FIG. 2 is a schematic diagram of a main flow of a data processing method according to an embodiment of the present invention;
FIG. 3 is a schematic view of a main flow of a data processing method according to a referential embodiment of the present invention;
FIG. 4 is a schematic view of a main flow of a data processing method according to another referential embodiment of the present invention;
FIG. 5 is a schematic diagram of the major modules of a data processing system according to an embodiment of the present invention;
FIG. 6 is an exemplary system architecture diagram in which embodiments of the present invention may be employed;
fig. 7 is a schematic block diagram of a computer system suitable for use in implementing a terminal device or server of an embodiment of the invention.
Detailed Description
Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Fig. 2 is a schematic view of a main flow of a data processing method according to an embodiment of the present invention. As an embodiment of the present invention, as shown in fig. 2, the data processing method may include:
step 201, receiving service data in real time through a first processing module of a stream processing framework, and processing the service data in real time to output a data model and wide specification detail data.
The first processing module of the stream processing framework receives service data pushed by each data source in real time, the service data are incremental data, and the first processing module of the stream processing framework performs data processing on the service data so as to obtain and output a data model and wide-specification detailed data. Alternatively, the data source may be a business system that constantly generates business data and pushes the generated business data to the first processing module of the stream processing framework in real time. Alternatively, the data source may also be a data warehouse that continually pushes new business data to the first processing module of the stream processing framework. Optionally, the data source may further include a topic domain, and the topic domain pushes the relevant incremental business data to the first processing module of the stream processing framework. And a first processing module of the stream processing frame performs real-time stream processing on the service data, wherein the processing time of a single piece of data is in the second level.
Optionally, receiving service data in real time, and processing the service data in real time to output a data model and broad band refinement data, includes: receiving business data pushed by a data source in real time, and cleaning the business data in real time to output a data model and wide specification detailed data; or receiving service data pushed by a data source in real time, cleaning the service data in real time, acquiring dimension data from a dimension table, and processing the cleaned service data by combining the dimension data to output a data model and wide list detail data.
In an embodiment of the present invention, a first processing module of a stream processing framework receives incremental business data pushed by a data source in real time, and performs data cleaning on the business data, thereby obtaining and outputting a data model and wide-specification detailed data. As shown in fig. 3, assuming that there is a need for daily operation supervision on a certain brand of goods, when incremental business data is streamed, other brands of business data are washed away, and only the brand of business data is retained, in this embodiment, the data flow process is as follows: - > stream.
In another embodiment of the present invention, a first processing module of a stream processing framework receives incremental service data pushed by a data source in real time, performs data cleaning on the service data, then obtains dimension data from a dimension table, and processes the cleaned service data in combination with the dimension data, thereby obtaining and outputting a data model and wide-specification detailed data. As shown in fig. 3, assuming that there is a daily operation supervision requirement for a certain brand of goods, when incremental business data is streamed, business data of other brands are cleaned, only business data of the brand is retained as order inflow details, then dimension data such as quantity of goods, amount of money of goods, delivery location, receiving location, etc. are obtained from a dimension table, and the order detail data can be obtained by combining these dimension data, as daily monitoring, in this embodiment, the data circulation process is: - > Liu- > Ming.
Optionally, after outputting the data model and the wide surface detailed data, the method further includes: and sending the wide manifest detail data to a second processing module of the stream processing framework and/or the batch processing framework. In an embodiment of the present invention, after the first processing module of the stream processing framework outputs the data model and the width-indicating fine data, the width-indicating fine data may be further sent to the second processing module of the stream processing framework, and the second processing module of the stream processing framework continues to process the width-indicating fine data. As shown in fig. 3, the detailed data flow is transferred to a small batch, so that daily reporting can be realized, and in this embodiment, the data flow process is as follows: - > Liu- > Ming Xiao- > Small batch. In another embodiment of the present invention, after the first processing module of the stream processing framework outputs the data model and the wide thin data, the wide thin data can be further sent to the batch processing framework, and the batch processing framework processes the wide thin data. As shown in fig. 3, the cycle of month, season, year, etc. is too large, the processing efficiency ratio of the flow or the small batch is too low, and the processing efficiency is improved by transferring to batch processing, in this embodiment, the data circulation process is as follows: - > Liu- > Ming Xiao- > batched.
For example, the incoming ranking can be analyzed by the dimensional data of the delivery location, the receiving location, etc. obtained from the dimensional table, so as to provide a reference for selecting which regions, and the incoming ranking can be sent to the second processing module (i.e. small lot) of the stream processing framework if the data processing is day-week level, and sent to the batch processing framework (i.e. lot) if the data processing is other level.
Step 202, receiving the service data in real time through a second processing module of the stream processing framework, and processing the service data in the first preset time window to output a data model and wide specification detail data.
And a second processing module of the stream processing frame receives the incremental business data in real time and processes the data of each business data in the window size according to a first preset time window, so as to obtain and output a data model and wide-list detailed data. The second processing module of the stream processing framework performs lightweight summary data such as 10 minute achievement, hourly inventory, via "small batch" processing in fig. 3, for an age of M (minutes) + N or H (hours) + N.
Optionally, receiving the service data in real time, and processing the service data within the first preset time window to output a data model and wide-band detailed data, including: receiving service data pushed by a data source in real time, and processing the service data in a first preset time window to output a data model and wide detailed data; and/or receiving the wide specification data sent by the first processing module, and processing the wide specification data in a first preset time window to output a data model and the wide specification data.
In an embodiment of the present invention, the second processing module of the stream processing framework receives, in real time, service data pushed by each data source, where the service data are incremental data, and the second processing module of the stream processing framework performs data processing on the service data within a first preset time window, so as to obtain and output a data model and wide specification detailed data. Alternatively, the data source may be a business system that constantly generates business data and pushes the generated business data to the second processing module of the stream processing framework in real time. Alternatively, the data source may also be a data warehouse that continually pushes new business data to the second processing module of the stream processing framework. Optionally, the data source may further include a topic domain, and the topic domain pushes the relevant incremental business data to the second processing module of the stream processing framework.
In another embodiment of the present invention, the second processing module of the stream processing framework receives the wide manifest data sent by the first processing module, and then performs data processing on the wide manifest data within the window size according to the first preset time window, so as to obtain and output the data model and the wide manifest data. In this embodiment, the data flow process is as follows: - > Liu- > Ming Xiao- > Small batch. For example, assuming the demand is sales per hour in a day, the data flow process is: - > Liu- > Ming Xiao- > Small batch.
Alternatively, the stream processing framework may be one of Apache Storm, Trident, Spark Streaming, Samza, and Apache Flink. Preferably, the stream processing framework is an Apache Flink framework, and the framework can process the service data in real time and can process the service data in batch.
Optionally, after outputting the data model and the wide surface detailed data, the method further includes: sending the wide manifest detail data to the batch framework. After the second processing module of the stream processing framework outputs the data model and the wide manifest detail data, the wide manifest detail data can be further sent to the batch processing framework. As shown in fig. 3, the cycle of month, season, year, etc. is too large, the processing efficiency ratio of the flow or the small batch is too low, and the processing efficiency is improved by transferring to batch processing, in this embodiment, the data circulation process is as follows: - > stream- > Ming Xiao- > Small batch- > batch.
Step 203, receiving the wide specification data sent by the first processing module and/or the second processing module through a batch processing framework, and processing the wide specification data in a second preset time window to output a data model and the wide specification data.
In the embodiment of the present invention, the batch processing framework does not receive the full amount of data pushed by the data source any more, but receives the wide manifest detail data sent by the first processing module and/or the wide manifest detail data sent by the second processing module of the stream processing framework, and the batch processing framework performs data processing on the wide manifest detail data in the second preset time window, so as to obtain and output the data model and the wide manifest detail data.
Optionally, the first preset time window is smaller than the second preset time window. For ease of understanding, embodiments of the present invention refer to data processed by the second processing module of the stream processing framework as a small batch, and refer to data processed by the batch processing framework as a batch, where a time window of the small batch is smaller than a time window of the batch.
The batch processing framework directly summarizes the business settlement data, such as week/month/season/year summary reports and indicator cards, the time efficiency is T (day) + N, zipper data (chain) and dimension table data are not processed, and the integration processing work of a large amount of data is not processed.
Optionally, the stream processing framework may be one of spring-batch and Hive, and preferably, the batch processing framework is a Hive framework, and the framework may be combined with the stream processing framework to receive the wide-specification detail data sent by the stream processing framework and batch process the wide-specification detail data.
As shown in fig. 3, if the requirement is warehouse basic information, such as region provincial and urban attribution information, which is not changed very frequently, and a dimension table can be generated directly, the data circulation process is as follows: - > Liu- > Ming Xiao- > Wei Tao or- > Liu- > Ming Wei Tao.
The data model and the wide band indication detail data generated in step 201 and 203 can be stored in the database, and after all the links are completed, interfaces are provided for the outside in a unified manner.
According to the various embodiments described above, it can be seen that the technical means of the embodiments of the present invention that the business data is processed by the stream processing framework and the batch processing framework together, so as to output the data model and the broad band thin data solves the technical problems of low resource utilization rate, low output timeliness, difficult code maintenance, poor data consistency, and the like in the prior art. According to the flow, the data are processed in stages, only one full amount of data is needed, and only one time of processing is needed, so that overlapping does not exist, and the resource utilization rate and the output timeliness are improved; codes of all links are unified, so that the codes are unified integrally, the data consistency is ensured, and the code maintenance difficulty can be reduced; the data aperture is unified, and no matter the requirements are modified and iterated in the later period, or the application is landed, a plurality of sets of templates are not used. Therefore, the embodiment of the invention can solve the problems of low landing efficiency of the application layer, high error possibility and the like caused by inconsistent data models. It should be noted that, in the embodiment of the present invention, the data is processed in stages, so as to improve the resource utilization rate and the yield time.
Fig. 4 is a schematic view of a main flow of a data processing method according to another referential embodiment of the present invention. As another embodiment of the present invention, as shown in fig. 4, the data processing method may include:
the first processing module of the stream processing framework receives service data pushed by each data source in real time, the service data are incremental data, and the first processing module of the stream processing framework performs data processing on the service data so as to obtain and output a data model and wide-specification detailed data. For example, a first processing module of a stream processing framework receives incremental business data pushed by a data source in real time, and performs data cleaning on the business data, so as to obtain and output a data model and wide-specification detailed data, where a data streaming process is as follows: - > stream.
A first processing module of a stream processing framework receives incremental business data pushed by a data source in real time, firstly, data cleaning is carried out on the business data, then, dimension data are obtained from a dimension table, the cleaned business data are processed by combining the dimension data, and therefore, a data model and wide list detail data are obtained and output, and the data circulation process is as follows: - > Liu- > Ming.
Further, after the first processing module of the stream processing framework outputs the data model and the width indication detailed data, the width indication detailed data is sent to the second processing module of the stream processing framework, the second processing module of the stream processing framework continues to process the width indication detailed data, and the data circulation process is as follows: - > Liu- > Ming Xiao- > Small batch.
Further, after the first processing module of the stream processing framework outputs the data model and the width indication detail data, the width indication detail data is sent to the batch processing framework, and the batch processing framework continues to process the width indication detail data, wherein the stream process of the data comprises the following steps: - > Liu- > Ming Xiao- > batched.
The second processing module of the stream processing frame receives service data pushed by each data source in real time, the service data are incremental data, the second processing module of the stream processing frame performs data processing on the service data in a first preset time window, so as to obtain and output a data model and wide specification detailed data, and the data circulation process is as follows: - > minor batches.
A second processing module of the stream processing frame receives the wide specification data sent by the first processing module, and then performs data processing on the wide specification data in the window size according to a first preset time window, so as to obtain and output a data model and the wide specification data, wherein the stream process of the data is as follows: - > Liu- > Ming Xiao- > Small batch.
Further, after the second processing module of the stream processing framework outputs the data model and the width indicating detail data, the width indicating detail data can be further sent to the batch processing framework, and the data circulation process is as follows: - > stream- > Ming Xiao- > Small batch- > batch.
In the embodiment of the present invention, the batch processing framework does not receive the full amount of data pushed by the data source any more, but receives the wide manifest detail data sent by the first processing module and/or the wide manifest detail data sent by the second processing module of the stream processing framework, and the batch processing framework performs data processing on the wide manifest detail data in the second preset time window, so as to obtain and output the data model and the wide manifest detail data.
Optionally, the first preset time window is smaller than the second preset time window. For ease of understanding, embodiments of the present invention refer to data processed by the second processing module of the stream processing framework as a small batch, and refer to data processed by the batch processing framework as a batch, where a time window of the small batch is smaller than a time window of the batch.
It should be noted that, in the embodiment of the present invention, the data flow process may only execute any one of the data flow processes, may execute any multiple data flow processes, and may also execute all the data flow processes, which is determined according to service requirements. Complex requirements perform the entire data flow process, while simple requirements may only require the performance of one data flow process.
The generated data model and the wide-band thin data can be stored in a database, after all links are finished, interfaces are uniformly provided to the outside, and the corresponding data model can be obtained by applying and calling the interfaces.
In addition, in another embodiment of the present invention, the detailed implementation of the data processing method is described in detail above, so that the repeated description is not repeated here.
FIG. 5 is a schematic diagram of the main modules of a data processing system according to an embodiment of the present invention, and as shown in FIG. 5, the data processing system 500 includes a stream processing framework 501 and a batch processing framework 502, wherein the stream processing framework 501 includes a first processing module and a second processing module;
the first processing module is used for receiving service data in real time and processing the service data in real time so as to output a data model and wide specification detailed data;
the second processing module is used for receiving the service data in real time and processing the service data in the first preset time window so as to output a data model and wide list detail data;
the batch processing framework is used for receiving the wide specification data sent by the first processing module and/or the second processing module and processing the wide specification data in a second preset time window so as to output a data model and the wide specification data.
Optionally, the first preset time window is smaller than the second preset time window.
Optionally, the first processing module is further configured to:
receiving business data pushed by a data source in real time, and cleaning the business data in real time to output a data model and wide specification detailed data; or,
the method comprises the steps of receiving business data pushed by a data source in real time, cleaning the business data in real time, obtaining dimension data from a dimension table, and processing the cleaned business data by combining the dimension data so as to output a data model and wide list detail data.
Optionally, the first processing module is further configured to:
after outputting the data model and the wide manifest detail data, sending the wide manifest detail data to a second processing module of the stream processing framework and/or the batch processing framework.
Optionally, the second processing module is further configured to:
receiving service data pushed by a data source in real time, and processing the service data in a first preset time window to output a data model and wide detailed data; and/or the presence of a gas in the gas,
and receiving the wide specification data sent by the first processing module, and processing the wide specification data in a first preset time window to output a data model and the wide specification data.
Optionally, the second processing module is further configured to:
after outputting the data model and the wide manifest detail data, sending the wide manifest detail data to the batch framework.
Optionally, the stream processing framework is an Apache Flink framework, and the batch processing framework is a Hive framework.
According to the various embodiments described above, it can be seen that the technical means of the embodiments of the present invention that the business data is processed by the stream processing framework and the batch processing framework together, so as to output the data model and the broad band thin data solves the technical problems of low resource utilization rate, low output timeliness, difficult code maintenance, poor data consistency, and the like in the prior art. According to the flow, the data are processed in stages, only one full amount of data is needed, and only one time of processing is needed, so that overlapping does not exist, and the resource utilization rate and the output timeliness are improved; codes of all links are unified, so that the codes are unified integrally, the data consistency is ensured, and the code maintenance difficulty can be reduced; the data aperture is unified, and no matter the requirements are modified and iterated in the later period, or the application is landed, a plurality of sets of templates are not used. Therefore, the embodiment of the invention can solve the problems of low landing efficiency of the application layer, high error possibility and the like caused by inconsistent data models. It should be noted that, in the embodiment of the present invention, the data is processed in stages, so as to improve the resource utilization rate and the yield time.
It should be noted that, in the data processing system according to the present invention, the implementation content has been described in detail in the above data processing method, and therefore, the repeated content is not described again.
FIG. 6 illustrates an exemplary system architecture 600 of a data processing method or data processing system to which embodiments of the present invention may be applied.
As shown in fig. 6, the system architecture 600 may include terminal devices 601, 602, 603, a network 604, and a server 605. The network 604 serves to provide a medium for communication links between the terminal devices 601, 602, 603 and the server 605. Network 604 may include various types of connections, such as wire, wireless communication links, or fiber optic cables, to name a few.
A user may use the terminal devices 601, 602, 603 to interact with the server 605 via the network 604 to receive or send messages or the like. The terminal devices 601, 602, 603 may have installed thereon various communication client applications, such as shopping applications, web browser applications, search applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only).
The terminal devices 601, 602, 603 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 605 may be a server providing various services, such as a background management server (for example only) providing support for shopping websites browsed by users using the terminal devices 601, 602, 603. The background management server may analyze and otherwise process the received data such as the item information query request, and feed back a processing result (for example, target push information, item information — just an example) to the terminal device.
It should be noted that the data processing method provided by the embodiment of the present invention is generally executed by the server 605, and accordingly, the data processing system is generally disposed in the server 605.
It should be understood that the number of terminal devices, networks, and servers in fig. 6 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Referring now to FIG. 7, shown is a block diagram of a computer system 700 suitable for use with a terminal device implementing an embodiment of the present invention. The terminal device shown in fig. 7 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 7, the computer system 700 includes a Central Processing Unit (CPU)701, which can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)702 or a program loaded from a storage section 708 into a Random Access Memory (RAM) 703. In the RAM703, various programs and data necessary for the operation of the system 700 are also stored. The CPU 701, the ROM 702, and the RAM703 are connected to each other via a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.
The following components are connected to the I/O interface 705: an input portion 706 including a keyboard, a mouse, and the like; an output section 707 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 708 including a hard disk and the like; and a communication section 709 including a network interface card such as a LAN card, a modem, or the like. The communication section 709 performs communication processing via a network such as the internet. A drive 710 is also connected to the I/O interface 705 as needed. A removable medium 711 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 710 as necessary, so that a computer program read out therefrom is mounted into the storage section 708 as necessary.
In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program can be downloaded and installed from a network through the communication section 709, and/or installed from the removable medium 711. The computer program performs the above-described functions defined in the system of the present invention when executed by the Central Processing Unit (CPU) 701.
It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer programs according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present invention may be implemented by software or hardware. The described modules may also be provided in a processor, which may be described as: a processor includes a first processing module and a second processing module, where the names of the modules do not in some cases constitute a limitation on the modules themselves.
As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by a device, implement the method of: receiving service data in real time through a first processing module of a stream processing frame, and processing the service data in real time to output a data model and wide specification detailed data; receiving the service data in real time through a second processing module of the stream processing frame, and processing the service data in a first preset time window to output a data model and wide specification detailed data; and receiving the wide specification data sent by the first processing module and/or the second processing module through a batch processing frame, and processing the wide specification data in a second preset time window to output a data model and the wide specification data.
According to the technical scheme of the embodiment of the invention, because the technical means that the business data are processed by the stream processing frame and the batch processing frame together so as to output the data model and the wide-surface detailed data is adopted, the technical problems of low resource utilization rate, low output timeliness, difficult code maintenance, poor data consistency and the like in the prior art are solved. The embodiment of the invention carries out stage processing on the data, only one full amount of data is needed, and only one time of processing is needed without overlapping, thereby improving the resource utilization rate and the output timeliness; codes of all links are unified, so that the codes are unified integrally, the data consistency is ensured, and the code maintenance difficulty can be reduced; the data aperture is unified, and no matter the requirements are modified and iterated in the later period, or the application is landed, a plurality of sets of templates are not used. Therefore, the embodiment of the invention can solve the problems of low landing efficiency of the application layer, high error possibility and the like caused by inconsistent data models. It should be noted that, in the embodiment of the present invention, the data is processed in stages, so as to improve the resource utilization rate and the yield time.
The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (10)
1. A data processing method, comprising:
receiving service data in real time through a first processing module of a stream processing frame, and processing the service data in real time to output a data model and wide specification detailed data;
receiving the service data in real time through a second processing module of the stream processing frame, and processing the service data in a first preset time window to output a data model and wide specification detailed data;
and receiving the wide specification data sent by the first processing module and/or the second processing module through a batch processing frame, and processing the wide specification data in a second preset time window to output a data model and the wide specification data.
2. The method of claim 1, wherein the first predetermined time window is smaller than the second predetermined time window.
3. The method of claim 1, wherein receiving the service data in real time and processing the service data in real time to output a data model and broad cast detailed data comprises:
receiving business data pushed by a data source in real time, and cleaning the business data in real time to output a data model and wide specification detailed data; or,
the method comprises the steps of receiving business data pushed by a data source in real time, cleaning the business data in real time, obtaining dimension data from a dimension table, and processing the cleaned business data by combining the dimension data so as to output a data model and wide list detail data.
4. The method of claim 3, wherein after outputting the data model and the broad manifest detail data, further comprising:
sending the wide manifest detail data to a second processing module of the stream processing framework and/or the batch processing framework.
5. The method of claim 4, wherein receiving the service data in real time, and processing the service data within the first predetermined time window to output the data model and the broad band refinement data comprises:
receiving service data pushed by a data source in real time, and processing the service data in a first preset time window to output a data model and wide detailed data; and/or the presence of a gas in the gas,
and receiving the wide specification data sent by the first processing module, and processing the wide specification data in a first preset time window to output a data model and the wide specification data.
6. The method of claim 5, wherein after outputting the data model and the broad manifest detail data, further comprising:
sending the wide manifest detail data to the batch framework.
7. The method of claim 1, wherein the stream processing framework is an Apache Flink framework and the batch processing framework is a Hive framework.
8. A data processing system comprising a stream processing framework and a batch processing framework, wherein the stream processing framework comprises a first processing module and a second processing module;
the first processing module is used for receiving service data in real time and processing the service data in real time so as to output a data model and wide specification detailed data;
the second processing module is used for receiving the service data in real time and processing the service data in the first preset time window so as to output a data model and wide list detail data;
the batch processing framework is used for receiving the wide specification data sent by the first processing module and/or the second processing module and processing the wide specification data in a second preset time window so as to output a data model and the wide specification data.
9. An electronic device, comprising:
one or more processors;
a storage device for storing one or more programs,
the one or more programs, when executed by the one or more processors, implement the method of any of claims 1-7.
10. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-7.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110507204.4A CN113190558A (en) | 2021-05-10 | 2021-05-10 | Data processing method and system |
PCT/CN2022/091922 WO2022237764A1 (en) | 2021-05-10 | 2022-05-10 | Data processing method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110507204.4A CN113190558A (en) | 2021-05-10 | 2021-05-10 | Data processing method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113190558A true CN113190558A (en) | 2021-07-30 |
Family
ID=76980991
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110507204.4A Pending CN113190558A (en) | 2021-05-10 | 2021-05-10 | Data processing method and system |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN113190558A (en) |
WO (1) | WO2022237764A1 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114116162A (en) * | 2021-11-18 | 2022-03-01 | 支付宝(杭州)信息技术有限公司 | Data processing method, system and non-transitory storage medium |
WO2022237764A1 (en) * | 2021-05-10 | 2022-11-17 | 北京京东振世信息技术有限公司 | Data processing method and system |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140280142A1 (en) * | 2013-03-14 | 2014-09-18 | Science Applications International Corporation | Data analytics system |
CN110362622A (en) * | 2019-07-22 | 2019-10-22 | 江苏满运软件科技有限公司 | Real-time stream processing system, method, equipment and storage medium based on real-time number storehouse |
CN110502566A (en) * | 2019-08-29 | 2019-11-26 | 江苏满运软件科技有限公司 | Near real-time data acquisition method, device, electronic equipment, storage medium |
CN112667368A (en) * | 2019-10-16 | 2021-04-16 | 北京京东乾石科技有限公司 | Task data processing method and device |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110019397B (en) * | 2017-12-06 | 2021-06-29 | 北京京东尚科信息技术有限公司 | Method and device for data processing |
CN109684352B (en) * | 2018-12-29 | 2020-12-01 | 江苏满运软件科技有限公司 | Data analysis system, data analysis method, storage medium, and electronic device |
CN113190558A (en) * | 2021-05-10 | 2021-07-30 | 北京京东振世信息技术有限公司 | Data processing method and system |
-
2021
- 2021-05-10 CN CN202110507204.4A patent/CN113190558A/en active Pending
-
2022
- 2022-05-10 WO PCT/CN2022/091922 patent/WO2022237764A1/en unknown
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140280142A1 (en) * | 2013-03-14 | 2014-09-18 | Science Applications International Corporation | Data analytics system |
CN110362622A (en) * | 2019-07-22 | 2019-10-22 | 江苏满运软件科技有限公司 | Real-time stream processing system, method, equipment and storage medium based on real-time number storehouse |
CN110502566A (en) * | 2019-08-29 | 2019-11-26 | 江苏满运软件科技有限公司 | Near real-time data acquisition method, device, electronic equipment, storage medium |
CN112667368A (en) * | 2019-10-16 | 2021-04-16 | 北京京东乾石科技有限公司 | Task data processing method and device |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2022237764A1 (en) * | 2021-05-10 | 2022-11-17 | 北京京东振世信息技术有限公司 | Data processing method and system |
CN114116162A (en) * | 2021-11-18 | 2022-03-01 | 支付宝(杭州)信息技术有限公司 | Data processing method, system and non-transitory storage medium |
Also Published As
Publication number | Publication date |
---|---|
WO2022237764A1 (en) | 2022-11-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111429241A (en) | Accounting processing method and device | |
CN110866040A (en) | User portrait generation method, device and system | |
CN113127225A (en) | Method, device and system for scheduling data processing tasks | |
CN112784152A (en) | Method and device for marking user | |
WO2022237764A1 (en) | Data processing method and system | |
CN111984234A (en) | Method and device for processing work order | |
CN107729394A (en) | Data Mart management system and its application method based on Hadoop clusters | |
CN111949678A (en) | Method and device for processing non-accumulation indexes across time windows | |
CN108985805B (en) | Method and device for selectively executing push task | |
CN116450622B (en) | Method, apparatus, device and computer readable medium for data warehouse entry | |
CN113434754A (en) | Method and device for determining recommended API (application program interface) service, electronic equipment and storage medium | |
CN113378346A (en) | Method and device for model simulation | |
CN111415262A (en) | Service processing method and device | |
CN111311305A (en) | Method and system for analyzing user public traffic band based on user track | |
CN114817297A (en) | Method and device for processing data | |
CN113362097B (en) | User determination method and device | |
CN112988857B (en) | Service data processing method and device | |
CN112559646A (en) | Report downloading method and device | |
CN111127077A (en) | Recommendation method and device based on stream computing | |
CN111786801A (en) | Method and device for charging based on data flow | |
CN112749204A (en) | Method and device for reading data | |
CN112799863A (en) | Method and apparatus for outputting information | |
CN110874386A (en) | Method and device for establishing category mapping relation | |
CN113157828B (en) | Method and device for pushing data | |
CN113760900B (en) | Method and device for real-time summarizing of data and interval summarizing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |