CN113806434B

CN113806434B - Big data processing method, device, equipment and medium

Info

Publication number: CN113806434B
Application number: CN202111107162.1A
Authority: CN
Inventors: 潘康; 廖仁巍; 王威凌
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2021-09-22
Filing date: 2021-09-22
Publication date: 2023-09-05
Anticipated expiration: 2041-09-22
Also published as: CN113806434A

Abstract

The invention relates to the field of big data, and provides a big data processing method, a device, equipment and a medium, which can load dictionary information in a preset language, generate a data extraction statement, extract data from a source system as data to be processed according to the data type and the data extraction statement of the source system, pertinently extract the data according to different data types, enable the obtained data to be more accurate, preprocess the data to be processed based on basic cleaning conversion logic to obtain intermediate data, realize the first basic cleaning of the data, process the intermediate data based on target cleaning conversion logic to obtain target data, and transmit the target data to a target system, and further carry out secondary cleaning and conversion on the individualized requirements of the data according to different application scenes so as to ensure that the data is matched with a use scene, thereby having higher applicability. In addition, the invention also relates to a blockchain technology, and target data can be stored in a blockchain node.

Description

Big data processing method, device, equipment and medium

Technical Field

The present invention relates to the field of artificial intelligence technologies, and in particular, to a method, an apparatus, a device, and a medium for processing big data.

Background

The data acquisition cleaning conversion ETL (Extract-Transform-Load) is a general solution for data operation in the current big data industry, and currently, each big open-source community and business company provides many tool libraries aiming at local functions such as acquisition or cleaning conversion, but does not have a framework which is robust enough, integrates the complete function of ETL, and realizes the function of automatically converting from the mapping logic of a requirement side to the code of a development side, so that the framework is difficult to adapt to increasingly complex business data scenes and the rhythm of updating iteration of a technology tool which is not faster and faster.

Because frame integration and automatic conversion from logic to code are not realized, each ETL task needs manpower to be independently customized and developed, and ETL processes are connected in series so as to perform point-to-point data acquisition, cleaning and conversion aiming at the ETL, the efficiency is low, and the universality is lacking.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a big data processing method, apparatus, device, and medium, which aim to solve the problem of big data cleansing.

A big data processing method, the big data processing method comprising:

in response to a data processing instruction, acquiring data from the data processing instruction to generate dictionary information;

Defining a preset language, loading the dictionary information in the preset language, and generating a data extraction sentence;

determining a source system and a target system according to the data processing instruction;

identifying the data type of the source system, and extracting data from the source system as data to be processed according to the data type of the source system and the data extraction statement;

extracting basic cleaning conversion logic from the data to be processed, and preprocessing the data to be processed based on the basic cleaning conversion logic to obtain intermediate data;

and acquiring target cleaning conversion logic of the target system, processing the intermediate data based on the target cleaning conversion logic to obtain target data, and transmitting the target data to the target system.

According to a preferred embodiment of the present invention, the acquiring data from the data processing instruction to generate dictionary information includes:

acquiring position information and a data structure of a field to be acquired from the data processing instruction;

constructing a data acquisition function according to the position information of the field to be acquired;

generating a table falling model according to the data structure;

and packaging the data acquisition function and the table falling model to obtain the dictionary information.

According to a preferred embodiment of the present invention, the defining the preset language includes:

performing rule definition based on a scale analyzer, wherein the rule definition is used for calculating an initial value and obtaining a target value;

defining a mapping rule based on a scale analyzer, wherein the mapping rule is used for representing the mapping relation between source data and target data;

and loading grammar writing rules of the groovy language, and generating a sentence converter according to the grammar writing rules, the rule definitions and the mapping rules, wherein the sentence converter is used for converting any sentence into the groovy sentence.

According to a preferred embodiment of the present invention, the extracting data from the source system according to the data type of the source system and the data extraction statement includes:

when the data type of the source system is a database type, acquiring a database connection string and a login credential from the data processing instruction, connecting to the source system according to the database connection string and the login credential, and extracting data from the source system by utilizing the data extraction statement to obtain the data to be processed; or alternatively

And when the data type of the source system is the file type, extracting metadata from the source system by using the data extraction statement to obtain the data to be processed.

According to a preferred embodiment of the present invention, the preprocessing the data to be processed based on the basic cleaning conversion logic, to obtain intermediate data includes:

performing de-duplication processing on the data to be processed to obtain first data;

clustering the first data by adopting a clustering algorithm to obtain a plurality of subareas;

calculating the upper limit distance and the lower limit distance of all points in each sub-area relative to other points;

acquiring a configuration threshold value, and determining a subarea with the upper limit distance being greater than or equal to the configuration threshold value as a subarea to be screened;

obtaining isolated points from the subarea to be screened;

deleting the isolated points from the first data to obtain second data;

acquiring a missing value in the second data;

and filling the missing values based on a configuration filling mechanism to obtain the intermediate data.

According to a preferred embodiment of the present invention, the padding the missing values based on the configuration padding mechanism, and obtaining the intermediate data includes:

calculating the conditional probability of each sub-data in the second data;

sequencing the conditional probability of each sub data according to the sequence from big to small to obtain a target sequence;

Acquiring a first element from the target sequence as a filling value;

and replacing the missing value by the filling value to obtain the intermediate data.

According to a preferred embodiment of the present invention, before the preprocessing of the data to be processed based on the basic flush conversion logic, the method further comprises:

packaging the basic cleaning conversion logic into a class library, wherein a plurality of cleaning conversion logics are stored in the class library, and each cleaning conversion logic corresponds to one adapter;

configuring a target adapter corresponding to the base clean conversion logic for the class library;

after extracting the basic cleaning conversion logic from the data to be processed, connecting to the class library by using the target adapter;

and based on the basic cleaning conversion logic, acquiring processing logic from the class library to preprocess the data to be processed.

A big data processing apparatus, the big data processing apparatus comprising:

a generation unit for acquiring data from a data processing instruction in response to the data processing instruction to generate dictionary information;

the generating unit is further used for defining a preset language, loading the dictionary information in the preset language and generating a data extraction sentence;

The determining unit is used for determining a source system and a target system according to the data processing instruction;

the extraction unit is used for identifying the data type of the source system and extracting data from the source system as data to be processed according to the data type of the source system and the data extraction statement;

the preprocessing unit is used for extracting basic cleaning conversion logic from the data to be processed, and preprocessing the data to be processed based on the basic cleaning conversion logic to obtain intermediate data;

the processing unit is used for acquiring target cleaning conversion logic of the target system, processing the intermediate data based on the target cleaning conversion logic to obtain target data, and transmitting the target data to the target system.

A computer device, the computer device comprising:

a memory storing at least one instruction; and

And the processor executes the instructions stored in the memory to realize the big data processing method.

A computer readable storage medium having stored therein at least one instruction for execution by a processor in a computer device to implement the big data processing method.

According to the technical scheme, the method and the device can respond to the data processing instruction, acquire data from the data processing instruction to generate dictionary information, define a preset language, load the dictionary information in the preset language to generate a data extraction statement, determine a source system and a target system according to the data processing instruction, identify the data type of the source system, extract data from the source system according to the data type of the source system and the data extraction statement as data to be processed, extract the data from the source system according to different data types in a targeted manner, enable the acquired data to be more accurate, extract basic cleaning conversion logic from the data to be processed, preprocess the data to be processed based on the basic cleaning conversion logic to obtain intermediate data, realize first basic cleaning of the data, acquire target cleaning conversion logic of the target system, process the intermediate data based on the target cleaning conversion logic to obtain target data, transmit the target data to the target system, and further conduct secondary cleaning and conversion on the data according to different application scenes, and the requirements of the target system on the personalized data are met, and the application prospect is more guaranteed, and the applicability is better.

Drawings

FIG. 1 is a flow chart of a preferred embodiment of the big data processing method of the present invention.

FIG. 2 is a functional block diagram of a preferred embodiment of the big data processing apparatus of the present invention.

FIG. 3 is a schematic diagram of a computer device implementing a big data processing method according to a preferred embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in detail with reference to the accompanying drawings and specific embodiments.

FIG. 1 is a flow chart of a big data processing method according to a preferred embodiment of the present invention. The order of the steps in the flowchart may be changed and some steps may be omitted according to various needs.

The big data processing method is applied to one or more computer devices, wherein the computer device is a device capable of automatically performing numerical calculation and/or information processing according to preset or stored instructions, and the hardware comprises, but is not limited to, a microprocessor, an application specific integrated circuit (Application Specific Integrated Circuit, an ASIC), a programmable gate array (Field-Programmable Gate Array, an FPGA), a digital processor (Digital Signal Processor, a DSP), an embedded device and the like.

The computer device may be any electronic product that can interact with a user in a human-computer manner, such as a personal computer, tablet computer, smart phone, personal digital assistant (Personal Digital Assistant, PDA), game console, interactive internet protocol television (Internet Protocol Television, IPTV), smart wearable device, etc.

The computer device may also include a network device and/or a user device. Wherein the network device includes, but is not limited to, a single network server, a server group composed of a plurality of network servers, or a Cloud based Cloud Computing (Cloud Computing) composed of a large number of hosts or network servers.

The server may be an independent server, or may be a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, content delivery networks (Content Delivery Network, CDN), and basic cloud computing services such as big data and artificial intelligence platforms.

Among these, artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use knowledge to obtain optimal results.

Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

The network in which the computer device is located includes, but is not limited to, the internet, a wide area network, a metropolitan area network, a local area network, a virtual private network (Virtual Private Network, VPN), and the like.

S10, responding to a data processing instruction, and acquiring data from the data processing instruction to generate dictionary information.

In at least one embodiment of the present invention, the data processing instructions may be triggered by an associated worker, such as a developer, tester, salesman, etc.

In at least one embodiment of the present invention, the acquiring data from the data processing instruction to generate dictionary information includes:

Generating a table falling model according to the data structure;

For example: the data collection function may be a select () function or a regular expression, which is not limited by the present invention.

Through the data collection function, fields at specified positions can be collected, such as: data of 1-20 field positions are collected.

The concrete meaning of each field in the finally generated data can be defined through the table falling model, for example: the fields 0-11 represent the user's cell phone number and the fields 12-18 represent the user information.

S11, defining a preset language, loading the dictionary information in the preset language, and generating a data extraction sentence.

In at least one embodiment of the present invention, the defining the preset language includes:

For example: the rule definition may be:

if(value＝＝'RR')

'RR_MAPPER_123'

else if(value＝＝'ATM')

'ATM_MAPPER_123'

else"EEROR_MAPPER"。

the mapping rule may be:

tagContext["tagModeName1"]＝

srcContext["srcModeName1"]

tagContext["tagModeName8"]＝

hardCode("GALAXY-AQUILA")。

through the implementation mode, improvement can be carried out on the groovy language, so that the obtained language learning cost is lower, and the use is easy.

In at least one embodiment of the present invention, the loading the dictionary information in the preset language, and generating the data extraction sentence includes:

and generating JAVA class byte codes according to the dictionary information based on a dynamic compiling technology of the groovy language to obtain the data extraction statement.

S12, determining a source system and a target system according to the data processing instruction.

In at least one embodiment of the present invention, the determining a source system and a target system according to the data processing instructions includes:

analyzing the data processing instruction to obtain a source system identification code and a target system identification code;

the source system is determined according to the source system identification code, and the target system is determined according to the target system identification code.

The source system identification code and the target system identification code have uniqueness, so that one source system can be uniquely determined by the source system identification code, and one target system can be uniquely determined by the target system identification code.

In this embodiment, the source system may include a service system and the like, and the target system may include a downstream system of an application and the like.

S13, identifying the data type of the source system, and extracting data from the source system as data to be processed according to the data type of the source system and the data extraction statement.

In at least one embodiment of the present invention, the data types of the source system may include, but are not limited to: database type, file type.

In at least one embodiment of the present invention, the extracting data from the source system according to the data type of the source system and the data extraction statement includes:

In this embodiment, the login credentials may include, but are not limited to: a user name and a password.

According to the embodiment, the data can be extracted from the source system in a targeted manner according to different data types, so that the acquired data is more accurate.

S14, extracting basic cleaning conversion logic from the data to be processed, and preprocessing the data to be processed based on the basic cleaning conversion logic to obtain intermediate data.

In at least one embodiment of the present invention, the base flush conversion logic may be configured according to history processing logic to implement a first step of base flush of data.

In at least one embodiment of the present invention, the preprocessing the data to be processed based on the basic cleaning conversion logic, to obtain intermediate data includes:

Obtaining isolated points from the subarea to be screened;

deleting the isolated points from the first data to obtain second data;

acquiring a missing value in the second data;

Through the implementation mode, the basic cleaning and conversion of the data to be processed can be realized, and the subsequent calculation can be simplified through screening the isolated points, so that the data processing efficiency is improved.

Specifically, the filling the missing values based on the configuration filling mechanism to obtain the intermediate data includes:

calculating the conditional probability of each sub-data in the second data;

acquiring a first element from the target sequence as a filling value;

In the above embodiment, the discrete missing values are filled with the largest conditional probability values, and the filled values have a correlation with other values than the conventional filling with a fixed value (e.g., 0).

In at least one embodiment of the present invention, before the preprocessing of the data to be processed based on the basic flush conversion logic, the method further comprises:

Through the implementation mode, the cleaning conversion logic can be packaged, when new processing logic exists, the new processing logic can be directly changed in the class library, and a new adapter is configured, so that the cleaning conversion logic is convenient to call.

S15, acquiring target cleaning conversion logic of the target system, processing the intermediate data based on the target cleaning conversion logic to obtain target data, and transmitting the target data to the target system.

In this embodiment, the target cleaning conversion logic refers to a data processing manner adapted to an actual application scenario corresponding to the target system.

It can be understood that each application scene may have different requirements on data structures such as data formats, so that after basic data cleaning and conversion, secondary cleaning and conversion can be further performed on the personalized requirements of the data according to different application scenes, so as to ensure that the data is matched with the application scenes, and the application scene has higher applicability.

For example: when the target cleaning conversion logic of the actual use scene requires data to have consistency, a consistency detection requirement can be obtained from the target cleaning conversion logic, and the intermediate data is processed according to the consistency detection requirement to obtain the target data.

Further, the target data is transmitted to the target system for use by the target system.

It should be noted that, in order to further improve the security of the data and avoid the data from being tampered maliciously, the target data may be stored in a blockchain node.

FIG. 2 is a functional block diagram of a preferred embodiment of the big data processing device of the present invention. The big data processing apparatus 11 includes a generating unit 110, a determining unit 111, an extracting unit 112, a preprocessing unit 113, and a processing unit 114. The module/unit referred to in the present invention refers to a series of computer program segments capable of being executed by the processor 13 and of performing a fixed function, which are stored in the memory 12. In the present embodiment, the functions of the respective modules/units will be described in detail in the following embodiments.

In response to the data processing instruction, the generation unit 110 acquires data from the data processing instruction to generate dictionary information.

In at least one embodiment of the present invention, the generating unit 110 acquiring data from the data processing instruction to generate dictionary information includes:

Generating a table falling model according to the data structure;

The generating unit 110 defines a preset language, and loads the dictionary information in the preset language to generate a data extraction sentence.

In at least one embodiment of the present invention, the generating unit 110 defines a preset language including:

For example: the rule definition may be:

if(value＝＝'RR')

'RR_MAPPER_123'

else if(value＝＝'ATM')

'ATM_MAPPER_123'

else"EEROR_MAPPER"。

the mapping rule may be:

tagContext["tagModeName1"]＝

srcContext["srcModeName1"]

tagContext["tagModeName8"]＝

hardCode("GALAXY-AQUILA")。

In at least one embodiment of the present invention, the generating unit 110 loads the dictionary information in the preset language, and generating the data extraction sentence includes:

The determination unit 111 determines a source system and a target system according to the data processing instruction.

In at least one embodiment of the present invention, the determining unit 111 determines a source system and a target system according to the data processing instruction, including:

The extraction unit 112 identifies the data type of the source system, and extracts data from the source system as data to be processed according to the data type of the source system and the data extraction statement.

In at least one embodiment of the present invention, the extracting unit 112 extracts data from the source system as data to be processed according to the data type of the source system and the data extraction statement includes:

The preprocessing unit 113 extracts basic cleaning conversion logic from the data to be processed, and performs preprocessing on the data to be processed based on the basic cleaning conversion logic to obtain intermediate data.

In at least one embodiment of the present invention, the preprocessing unit 113 performs preprocessing on the data to be processed based on the basic cleaning conversion logic, and obtaining intermediate data includes:

Obtaining isolated points from the subarea to be screened;

deleting the isolated points from the first data to obtain second data;

acquiring a missing value in the second data;

Specifically, the preprocessing unit 113 performs padding processing on the missing values based on a configuration padding mechanism, and the obtaining the intermediate data includes:

calculating the conditional probability of each sub-data in the second data;

acquiring a first element from the target sequence as a filling value;

In at least one embodiment of the present invention, before the preprocessing of the data to be processed based on the basic flush conversion logic, the basic flush conversion logic is packaged into a class library, wherein a plurality of flush conversion logics are stored in the class library, and each flush conversion logic corresponds to an adapter;

The processing unit 114 obtains the target cleaning conversion logic of the target system, processes the intermediate data based on the target cleaning conversion logic to obtain target data, and transmits the target data to the target system.

FIG. 3 is a schematic diagram of a computer device for implementing a big data processing method according to a preferred embodiment of the present invention.

The computer device 1 may comprise a memory 12, a processor 13 and a bus, and may further comprise a computer program, such as a big data processing program, stored in the memory 12 and executable on the processor 13.

It will be appreciated by those skilled in the art that the schematic diagram is merely an example of the computer device 1 and does not constitute a limitation of the computer device 1, the computer device 1 may be a bus type structure, a star type structure, the computer device 1 may further comprise more or less other hardware or software than illustrated, or a different arrangement of components, for example, the computer device 1 may further comprise an input-output device, a network access device, etc.

It should be noted that the computer device 1 is only used as an example, and other electronic products that may be present in the present invention or may be present in the future are also included in the scope of the present invention by way of reference.

The memory 12 includes at least one type of readable storage medium including flash memory, a removable hard disk, a multimedia card, a card memory (e.g., SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, etc. The memory 12 may in some embodiments be an internal storage unit of the computer device 1, such as a removable hard disk of the computer device 1. The memory 12 may in other embodiments also be an external storage device of the computer device 1, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the computer device 1. Further, the memory 12 may also include both an internal storage unit and an external storage device of the computer device 1. The memory 12 may be used not only for storing application software installed in the computer device 1 and various types of data, such as codes of large data processing programs, but also for temporarily storing data that has been output or is to be output.

The processor 13 may be comprised of integrated circuits in some embodiments, for example, a single packaged integrated circuit, or may be comprised of multiple integrated circuits packaged with the same or different functions, including one or more central processing units (Central Processing unit, CPU), microprocessors, digital processing chips, graphics processors, a combination of various control chips, and the like. The processor 13 is a Control Unit (Control Unit) of the computer device 1, connects the respective components of the entire computer device 1 using various interfaces and lines, executes programs or modules stored in the memory 12 (for example, executes a big data processing program or the like), and invokes data stored in the memory 12 to perform various functions of the computer device 1 and process data.

The processor 13 executes the operating system of the computer device 1 and various types of applications installed. The processor 13 executes the application program to implement the steps of the various big data processing method embodiments described above, such as the steps shown in fig. 1.

Illustratively, the computer program may be partitioned into one or more modules/units that are stored in the memory 12 and executed by the processor 13 to complete the present invention. The one or more modules/units may be a series of computer readable instruction segments capable of performing the specified functions, which instruction segments describe the execution of the computer program in the computer device 1. For example, the computer program may be divided into a generating unit 110, a determining unit 111, an extracting unit 112, a preprocessing unit 113, a processing unit 114.

The integrated units implemented in the form of software functional modules described above may be stored in a computer readable storage medium. The software functional modules described above are stored in a storage medium and include instructions for causing a computer device (which may be a personal computer, a computer device, or a network device, etc.) or a processor (processor) to perform portions of the big data processing method according to the embodiments of the present invention.

The modules/units integrated in the computer device 1 may be stored in a computer readable storage medium if implemented in the form of software functional units and sold or used as separate products. Based on this understanding, the present invention may also be implemented by a computer program for instructing a relevant hardware device to implement all or part of the procedures of the above-mentioned embodiment method, where the computer program may be stored in a computer readable storage medium and the computer program may be executed by a processor to implement the steps of each of the above-mentioned method embodiments.

Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory, or the like.

Further, the computer-readable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created from the use of blockchain nodes, and the like.

The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm and the like. The Blockchain (Blockchain), which is essentially a decentralised database, is a string of data blocks that are generated by cryptographic means in association, each data block containing a batch of information of network transactions for verifying the validity of the information (anti-counterfeiting) and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, an application services layer, and the like.

The bus may be a peripheral component interconnect standard (peripheral component interconnect, PCI) bus or an extended industry standard architecture (extended industry standard architecture, EISA) bus, among others. The bus may be classified as an address bus, a data bus, a control bus, etc. For ease of illustration, only one straight line is shown in fig. 3, but not only one bus or one type of bus. The bus is arranged to enable a connection communication between the memory 12 and at least one processor 13 or the like.

Although not shown, the computer device 1 may further comprise a power source (such as a battery) for powering the various components, preferably the power source may be logically connected to the at least one processor 13 via a power management means, whereby the functions of charge management, discharge management, and power consumption management are achieved by the power management means. The power supply may also include one or more of any of a direct current or alternating current power supply, recharging device, power failure detection circuit, power converter or inverter, power status indicator, etc. The computer device 1 may further include various sensors, bluetooth modules, wi-Fi modules, etc., which will not be described in detail herein.

Further, the computer device 1 may also comprise a network interface, optionally comprising a wired interface and/or a wireless interface (e.g. WI-FI interface, bluetooth interface, etc.), typically used for establishing a communication connection between the computer device 1 and other computer devices.

The computer device 1 may optionally further comprise a user interface, which may be a Display, an input unit, such as a Keyboard (Keyboard), or a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch, or the like. The display may also be referred to as a display screen or display unit, as appropriate, for displaying information processed in the computer device 1 and for displaying a visual user interface.

It should be understood that the embodiments described are for illustrative purposes only and are not limited to this configuration in the scope of the patent application.

Fig. 3 shows only a computer device 1 with components 12-13, it being understood by those skilled in the art that the structure shown in fig. 3 is not limiting of the computer device 1 and may include fewer or more components than shown, or may combine certain components, or a different arrangement of components.

In connection with fig. 1, the memory 12 in the computer device 1 stores a plurality of instructions to implement a big data processing method, the processor 13 being executable to implement:

Specifically, the specific implementation method of the above instructions by the processor 13 may refer to the description of the relevant steps in the corresponding embodiment of fig. 1, which is not repeated herein.

In the several embodiments provided in the present invention, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical function division, and there may be other manners of division when actually implemented.

The invention is operational with numerous general purpose or special purpose computer system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like. The invention may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

The modules described as separate components may or may not be physically separate, and components shown as modules may or may not be physical units, may be located in one place, or may be distributed over multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional module in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units can be realized in a form of hardware or a form of hardware and a form of software functional modules.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof.

The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.

Furthermore, it is evident that the word "comprising" does not exclude other elements or steps, and that the singular does not exclude a plurality. The units or means stated in the invention may also be implemented by one unit or means, either by software or hardware. The terms first, second, etc. are used to denote a name, but not any particular order.

Finally, it should be noted that the above-mentioned embodiments are merely for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made to the technical solution of the present invention without departing from the spirit and scope of the technical solution of the present invention.

Claims

1. A big data processing method, characterized in that the big data processing method comprises:

acquiring target cleaning conversion logic of the target system, processing the intermediate data based on the target cleaning conversion logic to obtain target data, and transmitting the target data to the target system;

the preprocessing the data to be processed based on the basic cleaning conversion logic to obtain intermediate data comprises the following steps:

obtaining isolated points from the subarea to be screened;

deleting the isolated points from the first data to obtain second data;

acquiring a missing value in the second data;

2. The big data processing method of claim 1, wherein the acquiring data from the data processing instruction to generate dictionary information includes:

generating a table falling model according to the data structure;

3. The big data processing method of claim 1, wherein defining the preset language includes:

4. The big data processing method of claim 1, wherein the extracting data from the source system as data to be processed according to the data type of the source system and the data extraction statement comprises:

5. The big data processing method of claim 1, wherein the filling the missing values based on a configuration filling mechanism to obtain the intermediate data comprises:

calculating the conditional probability of each sub-data in the second data;

acquiring a first element from the target sequence as a filling value;

6. The big data processing method of claim 1, wherein prior to the preprocessing of the data to be processed based on the base flush transition logic, the method further comprises:

7. A big data processing apparatus, characterized in that the big data processing apparatus comprises:

the processing unit is used for acquiring target cleaning conversion logic of the target system, processing the intermediate data based on the target cleaning conversion logic to obtain target data, and transmitting the target data to the target system;

obtaining isolated points from the subarea to be screened;

deleting the isolated points from the first data to obtain second data;

acquiring a missing value in the second data;

8. A computer device, the computer device comprising:

a memory storing at least one instruction; and

A processor executing instructions stored in the memory to implement the big data processing method according to any of claims 1 to 6.

9. A computer-readable storage medium, characterized by: the computer readable storage medium has stored therein at least one instruction that is executed by a processor in a computer device to implement the big data processing method of any of claims 1 to 6.