CN113806434B - Big data processing method, device, equipment and medium - Google Patents
Big data processing method, device, equipment and medium Download PDFInfo
- Publication number
- CN113806434B CN113806434B CN202111107162.1A CN202111107162A CN113806434B CN 113806434 B CN113806434 B CN 113806434B CN 202111107162 A CN202111107162 A CN 202111107162A CN 113806434 B CN113806434 B CN 113806434B
- Authority
- CN
- China
- Prior art keywords
- data
- target
- source system
- processed
- conversion logic
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 25
- 238000006243 chemical reaction Methods 0.000 claims abstract description 100
- 238000004140 cleaning Methods 0.000 claims abstract description 91
- 238000013075 data extraction Methods 0.000 claims abstract description 38
- 238000000034 method Methods 0.000 claims abstract description 21
- 238000012545 processing Methods 0.000 claims description 95
- 230000006870 function Effects 0.000 claims description 26
- 238000007781 pre-processing Methods 0.000 claims description 23
- 238000013507 mapping Methods 0.000 claims description 19
- 230000007246 mechanism Effects 0.000 claims description 10
- 238000004806 packaging method and process Methods 0.000 claims description 7
- 238000004422 calculation algorithm Methods 0.000 claims description 6
- 230000004044 response Effects 0.000 claims description 6
- 238000012163 sequencing technique Methods 0.000 claims description 4
- 238000000605 extraction Methods 0.000 claims description 3
- 230000007704 transition Effects 0.000 claims 1
- 238000005516 engineering process Methods 0.000 abstract description 14
- 239000000284 extract Substances 0.000 abstract description 14
- 230000008569 process Effects 0.000 abstract description 7
- 238000004590 computer program Methods 0.000 description 11
- 238000013473 artificial intelligence Methods 0.000 description 6
- 238000001514 detection method Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 5
- 238000007726 management method Methods 0.000 description 5
- 238000004891 communication Methods 0.000 description 4
- 238000013480 data collection Methods 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 230000006872 improvement Effects 0.000 description 2
- 239000004973 liquid crystal related substance Substances 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 238000012216 screening Methods 0.000 description 2
- 230000001960 triggered effect Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000000802 evaporation-induced self-assembly Methods 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000005055 memory storage Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 230000033764 rhythmic process Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/254—Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/41—Compilation
- G06F8/42—Syntactic analysis
- G06F8/427—Parsing
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Devices For Executing Special Programs (AREA)
- Stored Programmes (AREA)
Abstract
The invention relates to the field of big data, and provides a big data processing method, a device, equipment and a medium, which can load dictionary information in a preset language, generate a data extraction statement, extract data from a source system as data to be processed according to the data type and the data extraction statement of the source system, pertinently extract the data according to different data types, enable the obtained data to be more accurate, preprocess the data to be processed based on basic cleaning conversion logic to obtain intermediate data, realize the first basic cleaning of the data, process the intermediate data based on target cleaning conversion logic to obtain target data, and transmit the target data to a target system, and further carry out secondary cleaning and conversion on the individualized requirements of the data according to different application scenes so as to ensure that the data is matched with a use scene, thereby having higher applicability. In addition, the invention also relates to a blockchain technology, and target data can be stored in a blockchain node.
Description
Technical Field
The present invention relates to the field of artificial intelligence technologies, and in particular, to a method, an apparatus, a device, and a medium for processing big data.
Background
The data acquisition cleaning conversion ETL (Extract-Transform-Load) is a general solution for data operation in the current big data industry, and currently, each big open-source community and business company provides many tool libraries aiming at local functions such as acquisition or cleaning conversion, but does not have a framework which is robust enough, integrates the complete function of ETL, and realizes the function of automatically converting from the mapping logic of a requirement side to the code of a development side, so that the framework is difficult to adapt to increasingly complex business data scenes and the rhythm of updating iteration of a technology tool which is not faster and faster.
Because frame integration and automatic conversion from logic to code are not realized, each ETL task needs manpower to be independently customized and developed, and ETL processes are connected in series so as to perform point-to-point data acquisition, cleaning and conversion aiming at the ETL, the efficiency is low, and the universality is lacking.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a big data processing method, apparatus, device, and medium, which aim to solve the problem of big data cleansing.
A big data processing method, the big data processing method comprising:
in response to a data processing instruction, acquiring data from the data processing instruction to generate dictionary information;
Defining a preset language, loading the dictionary information in the preset language, and generating a data extraction sentence;
determining a source system and a target system according to the data processing instruction;
identifying the data type of the source system, and extracting data from the source system as data to be processed according to the data type of the source system and the data extraction statement;
extracting basic cleaning conversion logic from the data to be processed, and preprocessing the data to be processed based on the basic cleaning conversion logic to obtain intermediate data;
and acquiring target cleaning conversion logic of the target system, processing the intermediate data based on the target cleaning conversion logic to obtain target data, and transmitting the target data to the target system.
According to a preferred embodiment of the present invention, the acquiring data from the data processing instruction to generate dictionary information includes:
acquiring position information and a data structure of a field to be acquired from the data processing instruction;
constructing a data acquisition function according to the position information of the field to be acquired;
generating a table falling model according to the data structure;
and packaging the data acquisition function and the table falling model to obtain the dictionary information.
According to a preferred embodiment of the present invention, the defining the preset language includes:
performing rule definition based on a scale analyzer, wherein the rule definition is used for calculating an initial value and obtaining a target value;
defining a mapping rule based on a scale analyzer, wherein the mapping rule is used for representing the mapping relation between source data and target data;
and loading grammar writing rules of the groovy language, and generating a sentence converter according to the grammar writing rules, the rule definitions and the mapping rules, wherein the sentence converter is used for converting any sentence into the groovy sentence.
According to a preferred embodiment of the present invention, the extracting data from the source system according to the data type of the source system and the data extraction statement includes:
when the data type of the source system is a database type, acquiring a database connection string and a login credential from the data processing instruction, connecting to the source system according to the database connection string and the login credential, and extracting data from the source system by utilizing the data extraction statement to obtain the data to be processed; or alternatively
And when the data type of the source system is the file type, extracting metadata from the source system by using the data extraction statement to obtain the data to be processed.
According to a preferred embodiment of the present invention, the preprocessing the data to be processed based on the basic cleaning conversion logic, to obtain intermediate data includes:
performing de-duplication processing on the data to be processed to obtain first data;
clustering the first data by adopting a clustering algorithm to obtain a plurality of subareas;
calculating the upper limit distance and the lower limit distance of all points in each sub-area relative to other points;
acquiring a configuration threshold value, and determining a subarea with the upper limit distance being greater than or equal to the configuration threshold value as a subarea to be screened;
obtaining isolated points from the subarea to be screened;
deleting the isolated points from the first data to obtain second data;
acquiring a missing value in the second data;
and filling the missing values based on a configuration filling mechanism to obtain the intermediate data.
According to a preferred embodiment of the present invention, the padding the missing values based on the configuration padding mechanism, and obtaining the intermediate data includes:
calculating the conditional probability of each sub-data in the second data;
sequencing the conditional probability of each sub data according to the sequence from big to small to obtain a target sequence;
Acquiring a first element from the target sequence as a filling value;
and replacing the missing value by the filling value to obtain the intermediate data.
According to a preferred embodiment of the present invention, before the preprocessing of the data to be processed based on the basic flush conversion logic, the method further comprises:
packaging the basic cleaning conversion logic into a class library, wherein a plurality of cleaning conversion logics are stored in the class library, and each cleaning conversion logic corresponds to one adapter;
configuring a target adapter corresponding to the base clean conversion logic for the class library;
after extracting the basic cleaning conversion logic from the data to be processed, connecting to the class library by using the target adapter;
and based on the basic cleaning conversion logic, acquiring processing logic from the class library to preprocess the data to be processed.
A big data processing apparatus, the big data processing apparatus comprising:
a generation unit for acquiring data from a data processing instruction in response to the data processing instruction to generate dictionary information;
the generating unit is further used for defining a preset language, loading the dictionary information in the preset language and generating a data extraction sentence;
The determining unit is used for determining a source system and a target system according to the data processing instruction;
the extraction unit is used for identifying the data type of the source system and extracting data from the source system as data to be processed according to the data type of the source system and the data extraction statement;
the preprocessing unit is used for extracting basic cleaning conversion logic from the data to be processed, and preprocessing the data to be processed based on the basic cleaning conversion logic to obtain intermediate data;
the processing unit is used for acquiring target cleaning conversion logic of the target system, processing the intermediate data based on the target cleaning conversion logic to obtain target data, and transmitting the target data to the target system.
A computer device, the computer device comprising:
a memory storing at least one instruction; and
And the processor executes the instructions stored in the memory to realize the big data processing method.
A computer readable storage medium having stored therein at least one instruction for execution by a processor in a computer device to implement the big data processing method.
According to the technical scheme, the method and the device can respond to the data processing instruction, acquire data from the data processing instruction to generate dictionary information, define a preset language, load the dictionary information in the preset language to generate a data extraction statement, determine a source system and a target system according to the data processing instruction, identify the data type of the source system, extract data from the source system according to the data type of the source system and the data extraction statement as data to be processed, extract the data from the source system according to different data types in a targeted manner, enable the acquired data to be more accurate, extract basic cleaning conversion logic from the data to be processed, preprocess the data to be processed based on the basic cleaning conversion logic to obtain intermediate data, realize first basic cleaning of the data, acquire target cleaning conversion logic of the target system, process the intermediate data based on the target cleaning conversion logic to obtain target data, transmit the target data to the target system, and further conduct secondary cleaning and conversion on the data according to different application scenes, and the requirements of the target system on the personalized data are met, and the application prospect is more guaranteed, and the applicability is better.
Drawings
FIG. 1 is a flow chart of a preferred embodiment of the big data processing method of the present invention.
FIG. 2 is a functional block diagram of a preferred embodiment of the big data processing apparatus of the present invention.
FIG. 3 is a schematic diagram of a computer device implementing a big data processing method according to a preferred embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in detail with reference to the accompanying drawings and specific embodiments.
FIG. 1 is a flow chart of a big data processing method according to a preferred embodiment of the present invention. The order of the steps in the flowchart may be changed and some steps may be omitted according to various needs.
The big data processing method is applied to one or more computer devices, wherein the computer device is a device capable of automatically performing numerical calculation and/or information processing according to preset or stored instructions, and the hardware comprises, but is not limited to, a microprocessor, an application specific integrated circuit (Application Specific Integrated Circuit, an ASIC), a programmable gate array (Field-Programmable Gate Array, an FPGA), a digital processor (Digital Signal Processor, a DSP), an embedded device and the like.
The computer device may be any electronic product that can interact with a user in a human-computer manner, such as a personal computer, tablet computer, smart phone, personal digital assistant (Personal Digital Assistant, PDA), game console, interactive internet protocol television (Internet Protocol Television, IPTV), smart wearable device, etc.
The computer device may also include a network device and/or a user device. Wherein the network device includes, but is not limited to, a single network server, a server group composed of a plurality of network servers, or a Cloud based Cloud Computing (Cloud Computing) composed of a large number of hosts or network servers.
The server may be an independent server, or may be a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, content delivery networks (Content Delivery Network, CDN), and basic cloud computing services such as big data and artificial intelligence platforms.
Among these, artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use knowledge to obtain optimal results.
Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.
The network in which the computer device is located includes, but is not limited to, the internet, a wide area network, a metropolitan area network, a local area network, a virtual private network (Virtual Private Network, VPN), and the like.
S10, responding to a data processing instruction, and acquiring data from the data processing instruction to generate dictionary information.
In at least one embodiment of the present invention, the data processing instructions may be triggered by an associated worker, such as a developer, tester, salesman, etc.
In at least one embodiment of the present invention, the acquiring data from the data processing instruction to generate dictionary information includes:
acquiring position information and a data structure of a field to be acquired from the data processing instruction;
constructing a data acquisition function according to the position information of the field to be acquired;
Generating a table falling model according to the data structure;
and packaging the data acquisition function and the table falling model to obtain the dictionary information.
For example: the data collection function may be a select () function or a regular expression, which is not limited by the present invention.
Through the data collection function, fields at specified positions can be collected, such as: data of 1-20 field positions are collected.
The concrete meaning of each field in the finally generated data can be defined through the table falling model, for example: the fields 0-11 represent the user's cell phone number and the fields 12-18 represent the user information.
S11, defining a preset language, loading the dictionary information in the preset language, and generating a data extraction sentence.
In at least one embodiment of the present invention, the defining the preset language includes:
performing rule definition based on a scale analyzer, wherein the rule definition is used for calculating an initial value and obtaining a target value;
defining a mapping rule based on a scale analyzer, wherein the mapping rule is used for representing the mapping relation between source data and target data;
and loading grammar writing rules of the groovy language, and generating a sentence converter according to the grammar writing rules, the rule definitions and the mapping rules, wherein the sentence converter is used for converting any sentence into the groovy sentence.
For example: the rule definition may be:
if(value=='RR')
'RR_MAPPER_123'
else if(value=='ATM')
'ATM_MAPPER_123'
else"EEROR_MAPPER"。
the mapping rule may be:
tagContext["tagModeName1"]=
srcContext["srcModeName1"]
tagContext["tagModeName8"]=
hardCode("GALAXY-AQUILA")。
through the implementation mode, improvement can be carried out on the groovy language, so that the obtained language learning cost is lower, and the use is easy.
In at least one embodiment of the present invention, the loading the dictionary information in the preset language, and generating the data extraction sentence includes:
and generating JAVA class byte codes according to the dictionary information based on a dynamic compiling technology of the groovy language to obtain the data extraction statement.
S12, determining a source system and a target system according to the data processing instruction.
In at least one embodiment of the present invention, the determining a source system and a target system according to the data processing instructions includes:
analyzing the data processing instruction to obtain a source system identification code and a target system identification code;
the source system is determined according to the source system identification code, and the target system is determined according to the target system identification code.
The source system identification code and the target system identification code have uniqueness, so that one source system can be uniquely determined by the source system identification code, and one target system can be uniquely determined by the target system identification code.
In this embodiment, the source system may include a service system and the like, and the target system may include a downstream system of an application and the like.
S13, identifying the data type of the source system, and extracting data from the source system as data to be processed according to the data type of the source system and the data extraction statement.
In at least one embodiment of the present invention, the data types of the source system may include, but are not limited to: database type, file type.
In at least one embodiment of the present invention, the extracting data from the source system according to the data type of the source system and the data extraction statement includes:
when the data type of the source system is a database type, acquiring a database connection string and a login credential from the data processing instruction, connecting to the source system according to the database connection string and the login credential, and extracting data from the source system by utilizing the data extraction statement to obtain the data to be processed; or alternatively
And when the data type of the source system is the file type, extracting metadata from the source system by using the data extraction statement to obtain the data to be processed.
In this embodiment, the login credentials may include, but are not limited to: a user name and a password.
According to the embodiment, the data can be extracted from the source system in a targeted manner according to different data types, so that the acquired data is more accurate.
S14, extracting basic cleaning conversion logic from the data to be processed, and preprocessing the data to be processed based on the basic cleaning conversion logic to obtain intermediate data.
In at least one embodiment of the present invention, the base flush conversion logic may be configured according to history processing logic to implement a first step of base flush of data.
In at least one embodiment of the present invention, the preprocessing the data to be processed based on the basic cleaning conversion logic, to obtain intermediate data includes:
performing de-duplication processing on the data to be processed to obtain first data;
clustering the first data by adopting a clustering algorithm to obtain a plurality of subareas;
calculating the upper limit distance and the lower limit distance of all points in each sub-area relative to other points;
acquiring a configuration threshold value, and determining a subarea with the upper limit distance being greater than or equal to the configuration threshold value as a subarea to be screened;
Obtaining isolated points from the subarea to be screened;
deleting the isolated points from the first data to obtain second data;
acquiring a missing value in the second data;
and filling the missing values based on a configuration filling mechanism to obtain the intermediate data.
Through the implementation mode, the basic cleaning and conversion of the data to be processed can be realized, and the subsequent calculation can be simplified through screening the isolated points, so that the data processing efficiency is improved.
Specifically, the filling the missing values based on the configuration filling mechanism to obtain the intermediate data includes:
calculating the conditional probability of each sub-data in the second data;
sequencing the conditional probability of each sub data according to the sequence from big to small to obtain a target sequence;
acquiring a first element from the target sequence as a filling value;
and replacing the missing value by the filling value to obtain the intermediate data.
In the above embodiment, the discrete missing values are filled with the largest conditional probability values, and the filled values have a correlation with other values than the conventional filling with a fixed value (e.g., 0).
In at least one embodiment of the present invention, before the preprocessing of the data to be processed based on the basic flush conversion logic, the method further comprises:
packaging the basic cleaning conversion logic into a class library, wherein a plurality of cleaning conversion logics are stored in the class library, and each cleaning conversion logic corresponds to one adapter;
configuring a target adapter corresponding to the base clean conversion logic for the class library;
after extracting the basic cleaning conversion logic from the data to be processed, connecting to the class library by using the target adapter;
and based on the basic cleaning conversion logic, acquiring processing logic from the class library to preprocess the data to be processed.
Through the implementation mode, the cleaning conversion logic can be packaged, when new processing logic exists, the new processing logic can be directly changed in the class library, and a new adapter is configured, so that the cleaning conversion logic is convenient to call.
S15, acquiring target cleaning conversion logic of the target system, processing the intermediate data based on the target cleaning conversion logic to obtain target data, and transmitting the target data to the target system.
In this embodiment, the target cleaning conversion logic refers to a data processing manner adapted to an actual application scenario corresponding to the target system.
It can be understood that each application scene may have different requirements on data structures such as data formats, so that after basic data cleaning and conversion, secondary cleaning and conversion can be further performed on the personalized requirements of the data according to different application scenes, so as to ensure that the data is matched with the application scenes, and the application scene has higher applicability.
For example: when the target cleaning conversion logic of the actual use scene requires data to have consistency, a consistency detection requirement can be obtained from the target cleaning conversion logic, and the intermediate data is processed according to the consistency detection requirement to obtain the target data.
Further, the target data is transmitted to the target system for use by the target system.
It should be noted that, in order to further improve the security of the data and avoid the data from being tampered maliciously, the target data may be stored in a blockchain node.
According to the technical scheme, the method and the device can respond to the data processing instruction, acquire data from the data processing instruction to generate dictionary information, define a preset language, load the dictionary information in the preset language to generate a data extraction statement, determine a source system and a target system according to the data processing instruction, identify the data type of the source system, extract data from the source system according to the data type of the source system and the data extraction statement as data to be processed, extract the data from the source system according to different data types in a targeted manner, enable the acquired data to be more accurate, extract basic cleaning conversion logic from the data to be processed, preprocess the data to be processed based on the basic cleaning conversion logic to obtain intermediate data, realize first basic cleaning of the data, acquire target cleaning conversion logic of the target system, process the intermediate data based on the target cleaning conversion logic to obtain target data, transmit the target data to the target system, and further conduct secondary cleaning and conversion on the data according to different application scenes, and the requirements of the target system on the personalized data are met, and the application prospect is more guaranteed, and the applicability is better.
FIG. 2 is a functional block diagram of a preferred embodiment of the big data processing device of the present invention. The big data processing apparatus 11 includes a generating unit 110, a determining unit 111, an extracting unit 112, a preprocessing unit 113, and a processing unit 114. The module/unit referred to in the present invention refers to a series of computer program segments capable of being executed by the processor 13 and of performing a fixed function, which are stored in the memory 12. In the present embodiment, the functions of the respective modules/units will be described in detail in the following embodiments.
In response to the data processing instruction, the generation unit 110 acquires data from the data processing instruction to generate dictionary information.
In at least one embodiment of the present invention, the data processing instructions may be triggered by an associated worker, such as a developer, tester, salesman, etc.
In at least one embodiment of the present invention, the generating unit 110 acquiring data from the data processing instruction to generate dictionary information includes:
acquiring position information and a data structure of a field to be acquired from the data processing instruction;
constructing a data acquisition function according to the position information of the field to be acquired;
Generating a table falling model according to the data structure;
and packaging the data acquisition function and the table falling model to obtain the dictionary information.
For example: the data collection function may be a select () function or a regular expression, which is not limited by the present invention.
Through the data collection function, fields at specified positions can be collected, such as: data of 1-20 field positions are collected.
The concrete meaning of each field in the finally generated data can be defined through the table falling model, for example: the fields 0-11 represent the user's cell phone number and the fields 12-18 represent the user information.
The generating unit 110 defines a preset language, and loads the dictionary information in the preset language to generate a data extraction sentence.
In at least one embodiment of the present invention, the generating unit 110 defines a preset language including:
performing rule definition based on a scale analyzer, wherein the rule definition is used for calculating an initial value and obtaining a target value;
defining a mapping rule based on a scale analyzer, wherein the mapping rule is used for representing the mapping relation between source data and target data;
and loading grammar writing rules of the groovy language, and generating a sentence converter according to the grammar writing rules, the rule definitions and the mapping rules, wherein the sentence converter is used for converting any sentence into the groovy sentence.
For example: the rule definition may be:
if(value=='RR')
'RR_MAPPER_123'
else if(value=='ATM')
'ATM_MAPPER_123'
else"EEROR_MAPPER"。
the mapping rule may be:
tagContext["tagModeName1"]=
srcContext["srcModeName1"]
tagContext["tagModeName8"]=
hardCode("GALAXY-AQUILA")。
through the implementation mode, improvement can be carried out on the groovy language, so that the obtained language learning cost is lower, and the use is easy.
In at least one embodiment of the present invention, the generating unit 110 loads the dictionary information in the preset language, and generating the data extraction sentence includes:
and generating JAVA class byte codes according to the dictionary information based on a dynamic compiling technology of the groovy language to obtain the data extraction statement.
The determination unit 111 determines a source system and a target system according to the data processing instruction.
In at least one embodiment of the present invention, the determining unit 111 determines a source system and a target system according to the data processing instruction, including:
analyzing the data processing instruction to obtain a source system identification code and a target system identification code;
the source system is determined according to the source system identification code, and the target system is determined according to the target system identification code.
The source system identification code and the target system identification code have uniqueness, so that one source system can be uniquely determined by the source system identification code, and one target system can be uniquely determined by the target system identification code.
In this embodiment, the source system may include a service system and the like, and the target system may include a downstream system of an application and the like.
The extraction unit 112 identifies the data type of the source system, and extracts data from the source system as data to be processed according to the data type of the source system and the data extraction statement.
In at least one embodiment of the present invention, the data types of the source system may include, but are not limited to: database type, file type.
In at least one embodiment of the present invention, the extracting unit 112 extracts data from the source system as data to be processed according to the data type of the source system and the data extraction statement includes:
when the data type of the source system is a database type, acquiring a database connection string and a login credential from the data processing instruction, connecting to the source system according to the database connection string and the login credential, and extracting data from the source system by utilizing the data extraction statement to obtain the data to be processed; or alternatively
And when the data type of the source system is the file type, extracting metadata from the source system by using the data extraction statement to obtain the data to be processed.
In this embodiment, the login credentials may include, but are not limited to: a user name and a password.
According to the embodiment, the data can be extracted from the source system in a targeted manner according to different data types, so that the acquired data is more accurate.
The preprocessing unit 113 extracts basic cleaning conversion logic from the data to be processed, and performs preprocessing on the data to be processed based on the basic cleaning conversion logic to obtain intermediate data.
In at least one embodiment of the present invention, the base flush conversion logic may be configured according to history processing logic to implement a first step of base flush of data.
In at least one embodiment of the present invention, the preprocessing unit 113 performs preprocessing on the data to be processed based on the basic cleaning conversion logic, and obtaining intermediate data includes:
performing de-duplication processing on the data to be processed to obtain first data;
clustering the first data by adopting a clustering algorithm to obtain a plurality of subareas;
calculating the upper limit distance and the lower limit distance of all points in each sub-area relative to other points;
acquiring a configuration threshold value, and determining a subarea with the upper limit distance being greater than or equal to the configuration threshold value as a subarea to be screened;
Obtaining isolated points from the subarea to be screened;
deleting the isolated points from the first data to obtain second data;
acquiring a missing value in the second data;
and filling the missing values based on a configuration filling mechanism to obtain the intermediate data.
Through the implementation mode, the basic cleaning and conversion of the data to be processed can be realized, and the subsequent calculation can be simplified through screening the isolated points, so that the data processing efficiency is improved.
Specifically, the preprocessing unit 113 performs padding processing on the missing values based on a configuration padding mechanism, and the obtaining the intermediate data includes:
calculating the conditional probability of each sub-data in the second data;
sequencing the conditional probability of each sub data according to the sequence from big to small to obtain a target sequence;
acquiring a first element from the target sequence as a filling value;
and replacing the missing value by the filling value to obtain the intermediate data.
In the above embodiment, the discrete missing values are filled with the largest conditional probability values, and the filled values have a correlation with other values than the conventional filling with a fixed value (e.g., 0).
In at least one embodiment of the present invention, before the preprocessing of the data to be processed based on the basic flush conversion logic, the basic flush conversion logic is packaged into a class library, wherein a plurality of flush conversion logics are stored in the class library, and each flush conversion logic corresponds to an adapter;
configuring a target adapter corresponding to the base clean conversion logic for the class library;
after extracting the basic cleaning conversion logic from the data to be processed, connecting to the class library by using the target adapter;
and based on the basic cleaning conversion logic, acquiring processing logic from the class library to preprocess the data to be processed.
Through the implementation mode, the cleaning conversion logic can be packaged, when new processing logic exists, the new processing logic can be directly changed in the class library, and a new adapter is configured, so that the cleaning conversion logic is convenient to call.
The processing unit 114 obtains the target cleaning conversion logic of the target system, processes the intermediate data based on the target cleaning conversion logic to obtain target data, and transmits the target data to the target system.
In this embodiment, the target cleaning conversion logic refers to a data processing manner adapted to an actual application scenario corresponding to the target system.
It can be understood that each application scene may have different requirements on data structures such as data formats, so that after basic data cleaning and conversion, secondary cleaning and conversion can be further performed on the personalized requirements of the data according to different application scenes, so as to ensure that the data is matched with the application scenes, and the application scene has higher applicability.
For example: when the target cleaning conversion logic of the actual use scene requires data to have consistency, a consistency detection requirement can be obtained from the target cleaning conversion logic, and the intermediate data is processed according to the consistency detection requirement to obtain the target data.
Further, the target data is transmitted to the target system for use by the target system.
It should be noted that, in order to further improve the security of the data and avoid the data from being tampered maliciously, the target data may be stored in a blockchain node.
According to the technical scheme, the method and the device can respond to the data processing instruction, acquire data from the data processing instruction to generate dictionary information, define a preset language, load the dictionary information in the preset language to generate a data extraction statement, determine a source system and a target system according to the data processing instruction, identify the data type of the source system, extract data from the source system according to the data type of the source system and the data extraction statement as data to be processed, extract the data from the source system according to different data types in a targeted manner, enable the acquired data to be more accurate, extract basic cleaning conversion logic from the data to be processed, preprocess the data to be processed based on the basic cleaning conversion logic to obtain intermediate data, realize first basic cleaning of the data, acquire target cleaning conversion logic of the target system, process the intermediate data based on the target cleaning conversion logic to obtain target data, transmit the target data to the target system, and further conduct secondary cleaning and conversion on the data according to different application scenes, and the requirements of the target system on the personalized data are met, and the application prospect is more guaranteed, and the applicability is better.
FIG. 3 is a schematic diagram of a computer device for implementing a big data processing method according to a preferred embodiment of the present invention.
The computer device 1 may comprise a memory 12, a processor 13 and a bus, and may further comprise a computer program, such as a big data processing program, stored in the memory 12 and executable on the processor 13.
It will be appreciated by those skilled in the art that the schematic diagram is merely an example of the computer device 1 and does not constitute a limitation of the computer device 1, the computer device 1 may be a bus type structure, a star type structure, the computer device 1 may further comprise more or less other hardware or software than illustrated, or a different arrangement of components, for example, the computer device 1 may further comprise an input-output device, a network access device, etc.
It should be noted that the computer device 1 is only used as an example, and other electronic products that may be present in the present invention or may be present in the future are also included in the scope of the present invention by way of reference.
The memory 12 includes at least one type of readable storage medium including flash memory, a removable hard disk, a multimedia card, a card memory (e.g., SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, etc. The memory 12 may in some embodiments be an internal storage unit of the computer device 1, such as a removable hard disk of the computer device 1. The memory 12 may in other embodiments also be an external storage device of the computer device 1, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the computer device 1. Further, the memory 12 may also include both an internal storage unit and an external storage device of the computer device 1. The memory 12 may be used not only for storing application software installed in the computer device 1 and various types of data, such as codes of large data processing programs, but also for temporarily storing data that has been output or is to be output.
The processor 13 may be comprised of integrated circuits in some embodiments, for example, a single packaged integrated circuit, or may be comprised of multiple integrated circuits packaged with the same or different functions, including one or more central processing units (Central Processing unit, CPU), microprocessors, digital processing chips, graphics processors, a combination of various control chips, and the like. The processor 13 is a Control Unit (Control Unit) of the computer device 1, connects the respective components of the entire computer device 1 using various interfaces and lines, executes programs or modules stored in the memory 12 (for example, executes a big data processing program or the like), and invokes data stored in the memory 12 to perform various functions of the computer device 1 and process data.
The processor 13 executes the operating system of the computer device 1 and various types of applications installed. The processor 13 executes the application program to implement the steps of the various big data processing method embodiments described above, such as the steps shown in fig. 1.
Illustratively, the computer program may be partitioned into one or more modules/units that are stored in the memory 12 and executed by the processor 13 to complete the present invention. The one or more modules/units may be a series of computer readable instruction segments capable of performing the specified functions, which instruction segments describe the execution of the computer program in the computer device 1. For example, the computer program may be divided into a generating unit 110, a determining unit 111, an extracting unit 112, a preprocessing unit 113, a processing unit 114.
The integrated units implemented in the form of software functional modules described above may be stored in a computer readable storage medium. The software functional modules described above are stored in a storage medium and include instructions for causing a computer device (which may be a personal computer, a computer device, or a network device, etc.) or a processor (processor) to perform portions of the big data processing method according to the embodiments of the present invention.
The modules/units integrated in the computer device 1 may be stored in a computer readable storage medium if implemented in the form of software functional units and sold or used as separate products. Based on this understanding, the present invention may also be implemented by a computer program for instructing a relevant hardware device to implement all or part of the procedures of the above-mentioned embodiment method, where the computer program may be stored in a computer readable storage medium and the computer program may be executed by a processor to implement the steps of each of the above-mentioned method embodiments.
Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory, or the like.
Further, the computer-readable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created from the use of blockchain nodes, and the like.
The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm and the like. The Blockchain (Blockchain), which is essentially a decentralised database, is a string of data blocks that are generated by cryptographic means in association, each data block containing a batch of information of network transactions for verifying the validity of the information (anti-counterfeiting) and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, an application services layer, and the like.
The bus may be a peripheral component interconnect standard (peripheral component interconnect, PCI) bus or an extended industry standard architecture (extended industry standard architecture, EISA) bus, among others. The bus may be classified as an address bus, a data bus, a control bus, etc. For ease of illustration, only one straight line is shown in fig. 3, but not only one bus or one type of bus. The bus is arranged to enable a connection communication between the memory 12 and at least one processor 13 or the like.
Although not shown, the computer device 1 may further comprise a power source (such as a battery) for powering the various components, preferably the power source may be logically connected to the at least one processor 13 via a power management means, whereby the functions of charge management, discharge management, and power consumption management are achieved by the power management means. The power supply may also include one or more of any of a direct current or alternating current power supply, recharging device, power failure detection circuit, power converter or inverter, power status indicator, etc. The computer device 1 may further include various sensors, bluetooth modules, wi-Fi modules, etc., which will not be described in detail herein.
Further, the computer device 1 may also comprise a network interface, optionally comprising a wired interface and/or a wireless interface (e.g. WI-FI interface, bluetooth interface, etc.), typically used for establishing a communication connection between the computer device 1 and other computer devices.
The computer device 1 may optionally further comprise a user interface, which may be a Display, an input unit, such as a Keyboard (Keyboard), or a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch, or the like. The display may also be referred to as a display screen or display unit, as appropriate, for displaying information processed in the computer device 1 and for displaying a visual user interface.
It should be understood that the embodiments described are for illustrative purposes only and are not limited to this configuration in the scope of the patent application.
Fig. 3 shows only a computer device 1 with components 12-13, it being understood by those skilled in the art that the structure shown in fig. 3 is not limiting of the computer device 1 and may include fewer or more components than shown, or may combine certain components, or a different arrangement of components.
In connection with fig. 1, the memory 12 in the computer device 1 stores a plurality of instructions to implement a big data processing method, the processor 13 being executable to implement:
in response to a data processing instruction, acquiring data from the data processing instruction to generate dictionary information;
defining a preset language, loading the dictionary information in the preset language, and generating a data extraction sentence;
determining a source system and a target system according to the data processing instruction;
identifying the data type of the source system, and extracting data from the source system as data to be processed according to the data type of the source system and the data extraction statement;
extracting basic cleaning conversion logic from the data to be processed, and preprocessing the data to be processed based on the basic cleaning conversion logic to obtain intermediate data;
And acquiring target cleaning conversion logic of the target system, processing the intermediate data based on the target cleaning conversion logic to obtain target data, and transmitting the target data to the target system.
Specifically, the specific implementation method of the above instructions by the processor 13 may refer to the description of the relevant steps in the corresponding embodiment of fig. 1, which is not repeated herein.
In the several embodiments provided in the present invention, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical function division, and there may be other manners of division when actually implemented.
The invention is operational with numerous general purpose or special purpose computer system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like. The invention may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
The modules described as separate components may or may not be physically separate, and components shown as modules may or may not be physical units, may be located in one place, or may be distributed over multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional module in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units can be realized in a form of hardware or a form of hardware and a form of software functional modules.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof.
The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.
Furthermore, it is evident that the word "comprising" does not exclude other elements or steps, and that the singular does not exclude a plurality. The units or means stated in the invention may also be implemented by one unit or means, either by software or hardware. The terms first, second, etc. are used to denote a name, but not any particular order.
Finally, it should be noted that the above-mentioned embodiments are merely for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made to the technical solution of the present invention without departing from the spirit and scope of the technical solution of the present invention.
Claims (9)
1. A big data processing method, characterized in that the big data processing method comprises:
in response to a data processing instruction, acquiring data from the data processing instruction to generate dictionary information;
defining a preset language, loading the dictionary information in the preset language, and generating a data extraction sentence;
determining a source system and a target system according to the data processing instruction;
identifying the data type of the source system, and extracting data from the source system as data to be processed according to the data type of the source system and the data extraction statement;
Extracting basic cleaning conversion logic from the data to be processed, and preprocessing the data to be processed based on the basic cleaning conversion logic to obtain intermediate data;
acquiring target cleaning conversion logic of the target system, processing the intermediate data based on the target cleaning conversion logic to obtain target data, and transmitting the target data to the target system;
the preprocessing the data to be processed based on the basic cleaning conversion logic to obtain intermediate data comprises the following steps:
performing de-duplication processing on the data to be processed to obtain first data;
clustering the first data by adopting a clustering algorithm to obtain a plurality of subareas;
calculating the upper limit distance and the lower limit distance of all points in each sub-area relative to other points;
acquiring a configuration threshold value, and determining a subarea with the upper limit distance being greater than or equal to the configuration threshold value as a subarea to be screened;
obtaining isolated points from the subarea to be screened;
deleting the isolated points from the first data to obtain second data;
acquiring a missing value in the second data;
And filling the missing values based on a configuration filling mechanism to obtain the intermediate data.
2. The big data processing method of claim 1, wherein the acquiring data from the data processing instruction to generate dictionary information includes:
acquiring position information and a data structure of a field to be acquired from the data processing instruction;
constructing a data acquisition function according to the position information of the field to be acquired;
generating a table falling model according to the data structure;
and packaging the data acquisition function and the table falling model to obtain the dictionary information.
3. The big data processing method of claim 1, wherein defining the preset language includes:
performing rule definition based on a scale analyzer, wherein the rule definition is used for calculating an initial value and obtaining a target value;
defining a mapping rule based on a scale analyzer, wherein the mapping rule is used for representing the mapping relation between source data and target data;
and loading grammar writing rules of the groovy language, and generating a sentence converter according to the grammar writing rules, the rule definitions and the mapping rules, wherein the sentence converter is used for converting any sentence into the groovy sentence.
4. The big data processing method of claim 1, wherein the extracting data from the source system as data to be processed according to the data type of the source system and the data extraction statement comprises:
when the data type of the source system is a database type, acquiring a database connection string and a login credential from the data processing instruction, connecting to the source system according to the database connection string and the login credential, and extracting data from the source system by utilizing the data extraction statement to obtain the data to be processed; or alternatively
And when the data type of the source system is the file type, extracting metadata from the source system by using the data extraction statement to obtain the data to be processed.
5. The big data processing method of claim 1, wherein the filling the missing values based on a configuration filling mechanism to obtain the intermediate data comprises:
calculating the conditional probability of each sub-data in the second data;
sequencing the conditional probability of each sub data according to the sequence from big to small to obtain a target sequence;
acquiring a first element from the target sequence as a filling value;
And replacing the missing value by the filling value to obtain the intermediate data.
6. The big data processing method of claim 1, wherein prior to the preprocessing of the data to be processed based on the base flush transition logic, the method further comprises:
packaging the basic cleaning conversion logic into a class library, wherein a plurality of cleaning conversion logics are stored in the class library, and each cleaning conversion logic corresponds to one adapter;
configuring a target adapter corresponding to the base clean conversion logic for the class library;
after extracting the basic cleaning conversion logic from the data to be processed, connecting to the class library by using the target adapter;
and based on the basic cleaning conversion logic, acquiring processing logic from the class library to preprocess the data to be processed.
7. A big data processing apparatus, characterized in that the big data processing apparatus comprises:
a generation unit for acquiring data from a data processing instruction in response to the data processing instruction to generate dictionary information;
the generating unit is further used for defining a preset language, loading the dictionary information in the preset language and generating a data extraction sentence;
The determining unit is used for determining a source system and a target system according to the data processing instruction;
the extraction unit is used for identifying the data type of the source system and extracting data from the source system as data to be processed according to the data type of the source system and the data extraction statement;
the preprocessing unit is used for extracting basic cleaning conversion logic from the data to be processed, and preprocessing the data to be processed based on the basic cleaning conversion logic to obtain intermediate data;
the processing unit is used for acquiring target cleaning conversion logic of the target system, processing the intermediate data based on the target cleaning conversion logic to obtain target data, and transmitting the target data to the target system;
the preprocessing the data to be processed based on the basic cleaning conversion logic to obtain intermediate data comprises the following steps:
performing de-duplication processing on the data to be processed to obtain first data;
clustering the first data by adopting a clustering algorithm to obtain a plurality of subareas;
calculating the upper limit distance and the lower limit distance of all points in each sub-area relative to other points;
Acquiring a configuration threshold value, and determining a subarea with the upper limit distance being greater than or equal to the configuration threshold value as a subarea to be screened;
obtaining isolated points from the subarea to be screened;
deleting the isolated points from the first data to obtain second data;
acquiring a missing value in the second data;
and filling the missing values based on a configuration filling mechanism to obtain the intermediate data.
8. A computer device, the computer device comprising:
a memory storing at least one instruction; and
A processor executing instructions stored in the memory to implement the big data processing method according to any of claims 1 to 6.
9. A computer-readable storage medium, characterized by: the computer readable storage medium has stored therein at least one instruction that is executed by a processor in a computer device to implement the big data processing method of any of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111107162.1A CN113806434B (en) | 2021-09-22 | 2021-09-22 | Big data processing method, device, equipment and medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111107162.1A CN113806434B (en) | 2021-09-22 | 2021-09-22 | Big data processing method, device, equipment and medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113806434A CN113806434A (en) | 2021-12-17 |
CN113806434B true CN113806434B (en) | 2023-09-05 |
Family
ID=78939891
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111107162.1A Active CN113806434B (en) | 2021-09-22 | 2021-09-22 | Big data processing method, device, equipment and medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113806434B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114282895A (en) * | 2021-12-22 | 2022-04-05 | 中国农业银行股份有限公司 | Data processing method and device, electronic equipment and storage medium |
CN114860349B (en) * | 2022-07-06 | 2022-11-08 | 深圳华锐分布式技术股份有限公司 | Data loading method, device, equipment and medium |
CN114936208B (en) * | 2022-07-26 | 2022-09-23 | 广州天维信息技术股份有限公司 | Information analysis system based on data cleaning |
CN115862882B (en) * | 2022-12-02 | 2024-02-13 | 北京百度网讯科技有限公司 | Data extraction method, device, equipment and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107463709A (en) * | 2017-08-21 | 2017-12-12 | 北京奇艺世纪科技有限公司 | A kind of ETL processing method and processing devices based on multi-data source |
CN109918349A (en) * | 2019-02-25 | 2019-06-21 | 网易(杭州)网络有限公司 | Log processing method, device, storage medium and electronic device |
CN112231417A (en) * | 2020-10-14 | 2021-01-15 | 平安国际智慧城市科技股份有限公司 | Data classification method and device, electronic equipment and storage medium |
CN112835879A (en) * | 2021-01-22 | 2021-05-25 | 深圳市汉云科技有限公司 | Data extraction method and device |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150026114A1 (en) * | 2013-07-18 | 2015-01-22 | Dania M. Triff | System and method of automatically extracting data from plurality of data sources and loading the same to plurality of target databases |
-
2021
- 2021-09-22 CN CN202111107162.1A patent/CN113806434B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107463709A (en) * | 2017-08-21 | 2017-12-12 | 北京奇艺世纪科技有限公司 | A kind of ETL processing method and processing devices based on multi-data source |
CN109918349A (en) * | 2019-02-25 | 2019-06-21 | 网易(杭州)网络有限公司 | Log processing method, device, storage medium and electronic device |
CN112231417A (en) * | 2020-10-14 | 2021-01-15 | 平安国际智慧城市科技股份有限公司 | Data classification method and device, electronic equipment and storage medium |
CN112835879A (en) * | 2021-01-22 | 2021-05-25 | 深圳市汉云科技有限公司 | Data extraction method and device |
Also Published As
Publication number | Publication date |
---|---|
CN113806434A (en) | 2021-12-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113806434B (en) | Big data processing method, device, equipment and medium | |
CN114237829B (en) | Data acquisition and processing method for power equipment | |
CN114185776A (en) | Big data point burying method, device, equipment and medium for application program | |
CN113868528A (en) | Information recommendation method and device, electronic equipment and readable storage medium | |
CN113890712A (en) | Data transmission method and device, electronic equipment and readable storage medium | |
CN113887941A (en) | Business process generation method and device, electronic equipment and medium | |
CN114138243B (en) | Function calling method, device, equipment and storage medium based on development platform | |
CN111950707B (en) | Behavior prediction method, device, equipment and medium based on behavior co-occurrence network | |
CN112132037B (en) | Pavement detection method, device, equipment and medium based on artificial intelligence | |
CN116823437A (en) | Access method, device, equipment and medium based on configured wind control strategy | |
CN114816371B (en) | Message processing method, device, equipment and medium | |
CN113923218B (en) | Distributed deployment method, device, equipment and medium for coding and decoding plug-in | |
CN114942855A (en) | Interface calling method and device, electronic equipment and storage medium | |
CN115033605A (en) | Data query method and device, electronic equipment and storage medium | |
CN114547011A (en) | Data extraction method and device, electronic equipment and storage medium | |
CN113704616A (en) | Information pushing method and device, electronic equipment and readable storage medium | |
CN114268559A (en) | Directional network detection method, device, equipment and medium based on TF-IDF algorithm | |
CN116934263B (en) | Product batch admittance method, device, equipment and medium | |
CN116414366B (en) | Middleware interface generation method, device, equipment and medium | |
CN116976821B (en) | Enterprise problem feedback information processing method, device, equipment and medium | |
CN115221875B (en) | Word weight generation method, device, electronic equipment and storage medium | |
CN115934576B (en) | Test case generation method, device, equipment and medium in transaction scene | |
CN118503141B (en) | Test case generation method, device, equipment and medium | |
CN114416575A (en) | Method, device and equipment for generating Mock data and storage medium | |
CN117438032A (en) | Information acquisition method, device, equipment and medium for refined hemp medicines |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |