[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN110728118B - Cross-data-platform data processing method, device, equipment and storage medium - Google Patents

Cross-data-platform data processing method, device, equipment and storage medium Download PDF

Info

Publication number
CN110728118B
CN110728118B CN201910851205.3A CN201910851205A CN110728118B CN 110728118 B CN110728118 B CN 110728118B CN 201910851205 A CN201910851205 A CN 201910851205A CN 110728118 B CN110728118 B CN 110728118B
Authority
CN
China
Prior art keywords
data
model
service
platform
service data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910851205.3A
Other languages
Chinese (zh)
Other versions
CN110728118A (en
Inventor
蔡昀
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Life Insurance Company of China Ltd
Original Assignee
Ping An Life Insurance Company of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Life Insurance Company of China Ltd filed Critical Ping An Life Insurance Company of China Ltd
Priority to CN201910851205.3A priority Critical patent/CN110728118B/en
Publication of CN110728118A publication Critical patent/CN110728118A/en
Application granted granted Critical
Publication of CN110728118B publication Critical patent/CN110728118B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the application provides a data processing method, a data processing device, computer equipment and a computer readable storage medium across data platforms. The embodiment of the application belongs to the technical field of data processing, and the method comprises the following steps of: receiving first business data, wherein the first business data is generated based on a first data model constructed by a first data platform; preprocessing the first service data through a model analyzer to obtain second service data in a preset format; inputting the second business data into a second data model constructed based on a second data platform; and processing the second service data through the second data model to obtain a data processing result, wherein the first data platform and the second data platform are different platforms, so that the respective advantages of the first data platform and the second data platform can be exerted, and the data processing efficiency is improved.

Description

Cross-data-platform data processing method, device, equipment and storage medium
Technical Field
The present disclosure relates to the field of data processing technologies, and in particular, to a data processing method, apparatus, computer device, and computer readable storage medium for cross-data platform.
Background
In the traditional technology, model construction is generally based on a tool or platform, and currently mainstream modeling platforms comprise an SAS platform, a SPARK platform or a Tensorflow platform and the like, but each platform has respective advantages and disadvantages, and the efficiency and the progress of modeling by adopting the traditional technology are limited, so that the efficiency of data processing is reduced.
Disclosure of Invention
The embodiment of the application provides a data processing method, a data processing device, computer equipment and a computer readable storage medium across data platforms, which can solve the problem of low data processing efficiency in the traditional technology.
In a first aspect, embodiments of the present application provide a data processing method across data platforms, the method including receiving first service data, the first service data being generated based on a first data model constructed by a first data platform; preprocessing the first service data through a model analyzer to obtain second service data in a preset format; inputting the second business data into a second data model constructed based on a second data platform; and processing the second service data through the second data model to obtain a data processing result.
In a second aspect, an embodiment of the present application provides a data processing apparatus across a data platform, where the apparatus includes: a receiving unit, configured to receive first service data, where the first service data is generated based on a first data model constructed by a first data platform; the preprocessing unit is used for preprocessing the first service data through the model analyzer to obtain second service data in a preset format; an input unit for inputting the second service data into a second data model constructed based on a second data platform; and the processing unit is used for processing the second service data through the second data model to obtain a data processing result.
In a third aspect, an embodiment of the present application further provides a computer device, including a memory and a processor, where the memory stores a computer program, and the processor implements the steps of the data processing method of the cross-data platform when executing the computer program.
In a fourth aspect, embodiments of the present application further provide a computer readable storage medium storing a computer program, where the computer program when executed by a processor causes the processor to perform the steps of the cross-data-platform data processing method.
The embodiment of the application provides a data processing method, a data processing device, computer equipment and a computer readable storage medium across data platforms. When the embodiment of the application realizes data processing, the method comprises the following steps: the method comprises the steps of receiving first service data, wherein the first service data is generated based on a first data model constructed by a first data platform, preprocessing the first service data through a model analyzer to obtain second service data in a preset format, inputting the second service data into a second data model constructed based on a second data platform, and processing the second service data through the second data model to obtain a data processing result.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic diagram of an application scenario of a data processing method of a cross-data platform according to an embodiment of the present application;
FIG. 2 is a schematic flow chart of a data processing method across data platforms provided in an embodiment of the present application;
FIG. 3 is a schematic flow chart of a sub-flow of a data processing method across data platforms provided in an embodiment of the present application;
FIG. 4 is another schematic flow chart of a data processing method across data platforms provided in an embodiment of the present application;
fig. 5 is a schematic diagram of another application scenario architecture of a data processing method of a cross-data platform according to an embodiment of the present application;
FIG. 6 is a schematic block diagram of a cross-data platform data processing apparatus provided in an embodiment of the present application; and
fig. 7 is a schematic block diagram of a computer device provided in an embodiment of the present application.
Detailed Description
All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.
It should be understood that the terms "comprises" and "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
Referring to fig. 1, fig. 1 is a schematic diagram of an application scenario of a data processing method of a cross-data platform according to an embodiment of the present application. The application scene comprises:
(1) And the terminal is provided with an application program for transacting business, and a user transacts business through the application program to correspondingly generate business data, namely first business data. For example, in the insurance service system, a user may transact an agent application service, a claim fraud detection service, a product recommendation service, an application pricing service, and the like through an application program on the terminal so as to correspondingly generate agent application service data, a claim fraud detection service, a product recommendation service data, an application pricing service data, and the like, where the first service data refers to agent application service data, a claim fraud detection service, a product recommendation service data, or an application pricing service data. The terminal, which may be referred to as a front end, may be a computer device such as a notebook, a desktop, a smart phone, a tablet computer, or a smart watch.
(2) And a server. The server receives the first service data sent by the terminal, performs preprocessing on the first service data to obtain second service data, and processes the second service data to obtain a data processing result. The server can be a single server or a distributed server cluster, can also be a cloud server, receives the access of an external terminal, and is connected with the terminal through a wired network or a wireless network.
The operation of the individual bodies in fig. 1 is as follows: the method comprises the steps that a user processes business through an application program on a terminal to generate first business data, the first business data are generated based on a first data model constructed by a first data platform, the terminal sends the first business data to a server, the server receives the first business data, and the first business data are preprocessed through a model parser to convert the format of the first business data into a preset format, so that second business data in the preset format are obtained; and the server inputs the second service data into a second data model constructed based on a second data platform, wherein the second data platform supports the processing of the data in the preset format, and the second service data is processed through the second data model to obtain a data processing result.
It should be noted that, the application scenario architecture schematic diagram of the data processing method across data platforms is only used to illustrate the technical scheme of the application, and is not limited to the technical scheme of the application, and the connection relationship may also have other forms.
Referring to fig. 2, fig. 2 is a schematic flowchart of a data processing method of a cross-data platform according to an embodiment of the present application. The data processing method of the cross-data platform is applied to the server in fig. 1 to complete all or part of functions of the data processing method of the cross-data platform. As shown in fig. 2, the method includes the following steps S201 to S204:
s201, receiving first service data, wherein the first service data is generated based on a first data model constructed by a first data platform.
The data platform may be called a modeling platform or a modeling tool, and the data platform refers to a tool or a platform used for building a model, for example, a Spark platform, a Tesorsurface platform, a Sklearn platform or an R platform. Spark, which may also be referred to as Apache Spark, is a fast and general-purpose computing engine designed for large-scale data processing, spark is an open-source clustered computing environment that can be used to perform a variety of operations, including SQL queries, text processing, machine learning, and the like. The TensorFlow platform is a symbol mathematical system based on data flow programming (Dataflow Programming in English) and is widely applied to the programming realization of various Machine Learning algorithms. Sklearn is a well-known machine learning library of Python, which encapsulates a large number of machine learning algorithms, embeds a large number of public datasets, and has perfect documentation, which is a tool for machine learning and practice. The R platform refers to R language, english is The R Programming Language, R is language and operation environment for statistical analysis and drawing, and the R platform is a tool for statistical calculation and statistical drawing.
Models which can be built on the modeling platform comprise an LR model, an FM model, a GBDT model, an XGBT model or a DNN model, and various models can be built on one modeling platform, such as the LR model, the FM model, the GBDT model, the XGBT model and the like can be built on a Tesorsurface platform. Wherein, the LR model, english is Logistic Regression, abbreviated as LR, refers to a logistic regression model. The FM model, english Factorization Machine, is abbreviated as FM, refers to a decomposition machine model, and is also called a factorization machine model. The GBDT model, english Gradient Boosting Decision Tree, is abbreviated as GBDT, refers to a gradient lifting decision tree model. The XGBT model, english eXtreme Gradient Boosting, may also be referred to as XGBoost model, refers to an extreme gradient lifting model. The DNN model, english Deep Neural Networks, is simply DNN, refers to a deep neural network.
In particular, generally, since the service system processes various services, various corresponding service data can be generated during the process of processing various services, especially in the big data age, the service data volume generated by each service system may be quite huge, for example, in the insurance service system, the service content such as agent application service, claim fraud detection service, product recommendation service and application pricing service may be included, and during the process of processing each service, corresponding service data can be generated.
The terminal generates first service data, a first data model of the first service data is constructed based on the first data platform, the first service data is generated based on the first data model, the terminal sends the first service data to the server, and the server receives the first service data. For example, the server receives a call of a service system, receives first service data in an initial format sent by the service system, wherein a data model of the first service data is constructed by the first data platform, and the initial format is a format corresponding to the first data platform. The first data platform may be Spark format, tesorf low format, sklearn format, R format, etc., and is specifically determined according to requirements of each item of service data, for example, if the first service data belongs to massive big data, the first data platform may be a Spark platform, and the first service data corresponds to the Spark format. If the first service data needs rich algorithms, the first data platform can adopt a Tesorsurface platform, and the first service data corresponds to a Tesorsurface format.
Further, in the embodiment of the present application, due to processing of service data, a first data platform and a second data platform are involved, and the first data platform and the second data platform are data platforms with different modeling types, and the first data platform and the second data platform for determining corresponding service data processing for each specific service application are determined according to service data processing performance and algorithm requirements of the service application. For example, referring to table 1, table 1 shows an example of a first data platform and a second data platform corresponding to each of three services in an insurance service, where a data model constructed by the first data platform is an initial format of first service data, and a data model constructed by the second data platform is a unified data with a preset format obtained by converting the initial format of the first service data, for example, if the first data platform is a Spark platform, the Spark is good at processing massive big data, so that the first data platform can fully utilize the advantage of Spark being good at processing massive big data, and if the second data platform is a Tesorsurface platform, the Tensorflow algorithm is abundant, so that the data can be processed by using the Tensorflow algorithm, thereby fully utilizing the respective advantages of the first data platform and the second data platform, and realizing data processing by adopting a composite model based on the cross data platform.
Table 1
Business application First data platform Second data platform
Agent application service Spark platform Tesorsurface platform
Claims fraud detection service Spark platform R platform
Product recommendation service R platform Sklearn platform
S202, preprocessing the first service data through a model analyzer to obtain second service data in a preset format.
The Parser, english is Parser, is a code that can read a document and analyze its structure, i.e. a module for reading and interpreting the source code. The model parser is a module for reading and interpreting source codes corresponding to the model, and converts the model from one format to another format through a predetermined parsing mode to achieve a specific implementation of parsing processing of the model, wherein the predetermined parsing mode can be parsing through a preset function, parsing through a preset formula or parsing through an existing tool, for example, the HTML parser is a parser for parsing HTML, jsoup is a Java HTML parser, a URL address, HTML text content and the like can be directly parsed, a PMML file (Python language) can be derived in Spark, and a Jpmml-sparkml installation package, that is, jpmml-sparkml-package can be installed as the model parser and the like.
The preprocessing refers to a processing procedure of converting an initial format of the first service data into a unified preset format before processing the first service data to obtain a data processing result, wherein the preset format can be a PMML format, a Protobuf format or a custom format. Wherein, PMML format, english is Predictive Model Markup Language, abbreviated PMML, predictive model markup language, XML is utilized to describe and store data mining model. The Protocol buffer format is also called Protocol Buffers, and is a data exchange format which is not related to a language platform, and the internal data is in a pure binary format and is independent of the language and the platform.
Specifically, since the first data platform and the second data platform are different platforms, the first data platform and the second data platform respectively correspond to different data formats, so that the second data platform cannot directly process first service data constructed by the data formats corresponding to the first data platform, the first data platform corresponding to the initial format of the first service data is required to be called by a model parser to preprocess the first service data, the first service data is converted into a unified preset format, the second service data is obtained, and the second service data can be processed by a data model constructed by the second data platform supporting the preset format to obtain a data processing result.
Further, according to a first data platform to which the first service data of the initial format belongs, calling a formula corresponding to the first data platform through a model analyzer, outputting the first service data of the initial format into second service data of a preset unified preset format, namely receiving the first service data of the initial format, and performing data preprocessing on the first service data of the initial format by using the first data platform, wherein the first platform is a Spark platform for example. Taking a first platform, for example, a Spark platform, for implementation, a data model cross-platform language and a data cross-platform language specification need to be determined to support cross-platform analysis and data analysis between different platforms, then working links of each platform are defined, and a data circulation flow is determined, which specifically includes the following steps:
1) Spark performs data preparation and data processing, and also supports partial model construction.
Specifically, spark performs data preparation and data processing, namely performs feature engineering processing by utilizing the computing capability of Spark, including variable quantization, ONE-HOT coding, GBDT coding and the like, and outputs a lightweight model constructed by Spark into a PMML format, wherein Spark is a big data processing framework based on memory, and simultaneously supports a model constructed based on MLlib/ML to construct a finite machine learning algorithm.
The Spark performs data preparation and data processing, wherein one part is preprocessing of data, the other part is creating some models, and the created models and data are output to a PMML format for a second data platform such as a Tesorfilow platform to perform data analysis and model analysis, so that the first service data is processed.
2) And preprocessing the first service data through a model analyzer to obtain second service data in a preset format.
Specifically, a model parser is required to parse the first service data first. And the model parser parses the first service data, first, a unified preset format needs to be defined as a standard format. For example, taking a unified preset format as a PMML format as an example, whether the model of the first service data is generated by Sklearn, R or SparkMLlib, it may be converted into an XML format of standard PMML to be stored. When the PMML model is required to be deployed, a library of the analytic PMML model of the target environment can be used for loading the model, prediction is carried out, model files in standard formats are generated based on the PMML for modeling different platforms, and model file reading analysis is carried out on the PMML through Tesorsurface. Wherein PMML, english Predictive Model Markup Language, abbreviated PMML, predictive model markup language is a factual standard language that can present predictive analysis models, PMML is a common specification for data mining that describes our generated machine learning model in a unified XML format.
S203, inputting the second business data into a second data model constructed based on a second data platform;
s204, processing the second service data through the second data model to obtain a data processing result.
Wherein the second data model comprises an LR model, an FM model, a GBDT model, an XGBT model, a DNN model and/or the like.
Specifically, the first service data is preprocessed through a model parser to obtain second service data in a preset format, the second data platform supports processing of the data in the preset format, the second service data is input into a second data model constructed based on the second data platform, and the second service data is processed through the second data model to obtain a data processing result, so that cross-platform processing of the data by adopting a composite model is achieved.
The embodiment of the application provides a data processing method across data platforms, which relates to a first data platform and a second data platform when data processing is realized, wherein a first data model of first service data is constructed based on the first data platform, a second data model used for processing data on a server is constructed based on the second data platform, and the first data platform and the second data platform are data platforms with different modeling types, and the method comprises the following steps: the method comprises the steps of receiving first service data, wherein the first service data is generated based on a first data model constructed by a first data platform, preprocessing the first service data through a model analyzer to obtain second service data in a preset format, inputting the second service data into a second data model constructed based on a second data platform, wherein the second data platform supports processing of the data in the preset format, and the second data model processes the second service data to obtain a data processing result.
Referring to fig. 3, fig. 3 is a schematic flow chart of a sub-flow of a data processing method of cross-data platform according to an embodiment of the present application. As shown in fig. 3, in this embodiment, the step of preprocessing the first service data by a model parser to obtain second service data in a preset format includes:
s301, performing data conversion on the first service data in a second preset mode to obtain third service data, wherein the data conversion refers to conversion of the first service data from one expression form to another expression form.
The Data conversion, which may also be referred to as Data conversion, is a process of changing Data from one representation form to another representation form.
Specifically, the preset data conversion modes include logarithmic conversion, square root conversion, arcsine square root conversion, square conversion or reciprocal conversion, for example, the digital conversion of data 1-100 is converted into data between 0 and 1, so that the data can be better filtered through indexes.
S302, data screening is carried out on the third service data according to preset conditions to obtain fourth service data.
The Data screening comprises three parts, namely Data extraction, data cleaning and Data loading, and the converted Data is screened mainly according to variable information values, base indexes and relative deviation rates.
The variable information value, also called information value, is Infromation Value, IV for short, and the information value can be used to represent the prediction capability of the variable.
The base index, gini index, is an inequality measure, typically used to measure revenue imbalance, and can be used to measure any uneven distribution, numbers between 0 and 1, 0-perfect equality, 1-perfect inequality, the more cluttered the categories contained in the population, the greater the Gini index (much like the concept of entropy).
The relative deviation refers to the percentage of the absolute deviation of a certain measurement to the average value, and the relative deviation can only be used for measuring the deviation degree of a single measurement result from the average value, namely, the relative deviation refers to the ratio of the difference between one data and the average value to the average value, the relative deviation is = [ (A-average value)/(average value ] ×100%, for example, 35 and 32 are substituted into a formula to obtain: the relative deviation = [ (35-33.5)/(33.5 ] ×100% ≡4%). The relative deviation rate is the proportion of the data of the relative deviation in all the data.
Specifically, according to preset conditions such as a variable information value, a base index, a relative deviation rate and the like, the converted third service data is subjected to data screening to obtain fourth service data.
S303, inputting the fourth service data into a preset third data model;
s304, converting the fourth service data into second service data in a preset format through the third data model.
Specifically, an advanced data model construction is required to convert the fourth service data obtained after screening into a unified preset format through a data model. Because PMML is a general specification of data mining, a model generated by adopting a unified XML format description can be converted into a standard XML format for storage no matter whether the model is generated by Sklearn, R or Spark MLlib, and when the model of PMML is required to be used for deployment, a library of the analyzed PMML model of a target environment can be used for loading the model and making predictions. For example, spark MLlib supports model export to predictive model language PMML, and exports the model to PMML, just by calling the model.
Further, parameter determination is needed to be performed on the constructed data model, wherein the parameter determination refers to determining optimal parameters through a loss function so that the data conversion has a good effect. The parameter determination includes determining parameters such as a Loss Function or a Cost Function, wherein the Loss Function (english is Loss Function) or the Cost Function (english is Cost Function) is a Function of mapping a random event or a value of a related random variable thereof into a non-negative real number to represent a "risk" or a "Loss" of the random event. In applications, the loss function is typically associated with the optimization problem as a learning criterion, i.e. solving and evaluating the model by minimizing the loss function, such as parameter estimation (Parameteric estimation in english) used for the model in machine learning.
And finally, calling a first data platform corresponding to the first service data in the initial format through a model analyzer to preprocess the data, and converting the first service data into a unified preset format.
Referring to fig. 4 and fig. 5, fig. 4 is another schematic flowchart of a cross-data-platform data processing method provided in an embodiment of the present application, and fig. 5 is another application scenario architecture schematic diagram of the cross-data-platform data processing method provided in an embodiment of the present application. As shown in fig. 4 and 5, in this embodiment, the first data platform is one data platform in a first preset data platform set, where the first preset data platform set includes a plurality of different types of data platforms, and the second data platform is one data platform in a second preset data platform set, where the second preset data platform set also includes a plurality of different types of data platforms.
Specifically, since the service system has multiple services, the processing performance and algorithm requirements of data required by each service application are different, each service corresponds to a group of first data platform and second data platform to process the service data, and multiple services of the service system corresponds to a plurality of groups of first data platform and second data platform to process the service data of the service system, especially for big data, since the data volume is huge, a distributed deployment server can be adopted to process massive big data of the service system through a server cluster. For example, please continue to refer to table 1, three services in the insurance service system shown in table 1 correspond to three different sets of the first data platform and the second data platform to process the service data of the service system. In this embodiment of the present application, the first data platform is one data platform in a first preset data platform set, where the first preset data platform set includes a plurality of different types of data platforms, please refer to fig. 5, the first preset data platform set includes a plurality of preset platforms such as Spark platform in Spark format, tesorf low platform in Tesorf low format, sklearn platform in Sklearn format, or R-format, the second data platform is one data platform in a second preset data platform set, where the second preset data platform set also includes a plurality of different types of data platforms, please refer to fig. 5, and the second preset data platform set includes a data platform corresponding to an LR model constructed based on a Tesorlow platform, a data platform corresponding to an FM model, a data platform corresponding to an XGBDT model, a data platform corresponding to an DNN model, and so on, where each data platform corresponding to a Tesorw model may also be a data platform based on a different tool or a different Sklearn tool.
With continued reference to fig. 4, as shown in fig. 4, in this embodiment, the step of receiving first service data, where a data model of the first service data is constructed by the first data platform includes:
s401, receiving first service data, wherein the first service data is generated based on a first data model constructed by a first data platform, and the first service data carries identification information for identifying the first service data.
The identification information refers to identification information such as a service data name, a service data code or a data identification of the service data of the first service data.
Specifically, the service system sends first service data, the load balancer distributes the first service data to a server contained in an execution unit of data processing according to a load distribution strategy according to identification information of the first service data, the server receives the first service data, a first data model of the first service data is constructed by the first data platform, the first service data is generated based on the first data model, and the first service data carries identification information for identifying the first service data.
The step of preprocessing the first service data by a model parser to obtain second service data in a preset format includes:
s402, determining a mode of preprocessing the first service data according to the identification information, and taking the mode as a first preset mode;
s403, preprocessing the first service data by using the first preset mode through a model analyzer to obtain second service data in a preset format.
Specifically, since the service system includes multiple service data, each service data corresponds to a different first data platform, for example, each service data may be based on a Spark platform of Spark format, a Tesorsurface platform of Tesorsurface format, a Sklearn platform of Sklearn format, or an R platform of R format, after receiving the first service data, the server identifies a platform to which an initial format of the first service data belongs according to the identification information, determines a mode of preprocessing the first service data according to the first data platform based on the first service data, and uses the mode as a first preset mode, so that the first service data is preprocessed by using the first preset mode corresponding to the first data platform through the model parser to obtain second service data of a preset format. For example, if the first service data is based on a Spark platform, a first preset mode a corresponding to the Spark platform needs to be adopted to preprocess the first service data to obtain second service data in a preset format, if the first service data is based on a terrflow platform, a first preset mode B corresponding to the terrflow platform needs to be adopted to preprocess the first service data to obtain second service data in a preset format, that is, the preset modes of preprocessing the service data corresponding to the data platforms to which each data model belongs are different, for example, the Spark platform corresponds to the preset mode a, and the terrflow platform corresponds to the preset mode B.
Further, in one embodiment, the step of preprocessing, by the model parser, the first service data in the first preset manner to obtain second service data in a preset format includes:
and calling a preset formula corresponding to the first data platform through a model analyzer to convert the first service data into a preset format so as to obtain second service data.
In particular, since the PMML standard supports some common data conversion preprocessing operations and supports conversion using a functional expression on the basis thereof, for example, data conversion operations defined by the PMML standard listed below may be adopted:
1) Normalization (Normalization) converts data values into numerical values, while applying to continuous and discrete variables.
2) Discretization (Discretization) converts continuous variables into discrete variables.
3) Data mapping (Value mapping) maps a current discrete variable to another discrete variable.
4) Functions (Functions), PMML has built in a number of common Functions, and the user can define his own Functions.
5) Aggregation (agaggregation), polymerization operations such as averaging, maximum, minimum, etc.
Therefore, the server receives the first service data, the first data model of the first service data is constructed by the first data platform, the first service data is generated based on the first data model, the first service data carries identification information for identifying the first service data, a preset formula for preprocessing the first service data is determined according to the identification information, the preset formula is used as a first preset mode, the first service data is converted into a preset format by calling the preset formula corresponding to the first data platform through a model parser to obtain second service data, for example, when the format of the first service data is converted into the format of PMML, a common function built in PMML in the mode or a function defined by a user can be adopted, the first service data is converted into a preset format to obtain second service data, so as to later call the second data model which is configured in a model service cluster and is preprocessed by the corresponding second data platform according to the identification information, the second data model is input into a model of gblr, the second data model is processed by a gblr, and the gblr is a model, the gblr is a model is processed in a gblr cluster service, and the gblr is a graph, and the gblr is a data cluster service is processed, and the gblr is a service.
The step of inputting the second service data into a second data model constructed based on a second data platform includes:
s404, calling a second data model which is pre-configured in the model service cluster and is constructed based on a second data platform through a model router according to the identification information;
s405, inputting the second service data into the second data model.
Specifically, since the service system includes multiple service data, each service data corresponds to a different second data platform, for example, each service data may be based on a Spark platform in Spark format, a Tesorsurface platform in Tesorsurface format, a Sklearn platform in Sklearn format, or an R platform in R format, the server receives the second service data obtained by converting the first service data, identifies the second data platform to which the second service data belongs according to the identification information, and invokes a second data model configured in advance and constructed by the corresponding second data platform from the model service cluster according to a preset matching relationship set in advance. For example, if the second service data is based on the Spark platform, the second data model a corresponding to the Spark platform needs to be used for processing the second service data to obtain a data processing result, if the second service data is based on the tesorf low platform, the second data model B corresponding to the tesorf low platform needs to be used for processing the second service data to obtain a data processing result, that is, the second data models corresponding to the second data platforms to which each service data belongs are different, for example, the second data model corresponding to the Spark platform may be a, the second data model corresponding to the tesorf low platform may be B, and the like, and according to the identification information, a model router invokes a second data model a or a second data model B, and the like, which is preset in a model service cluster and is constructed by the corresponding second data platform, and inputs the second service data into the second data model to process the second service data to obtain the data processing result. The first business data and the first data model and the second data model have preset matching relations, the preset matching relations are determined according to the performance and algorithm requirements of the first business data, the first data model and the second data model are established to establish the preset matching relations, namely, the first business data adopts a model which is used as the first data model in an initial format, and the subsequent second data model is established based on a platform. The preset matching relationship may also be called a corresponding relationship between the service data description manner and the data processing, for example, the pre-specified service data a data is described by the first data model B as an initial format and processed by the second data model C, and the preset matching relationship between a and B and C is satisfied.
S406, processing the second service data through the second data model to obtain a data processing result.
Specifically, step S406 in the embodiment shown in fig. 4 is the same as step S204 in the embodiment shown in fig. 2, and step S204 in the embodiment shown in fig. 2 is referred to herein by way of reference, and will not be described again here.
Further, the second business data is called and input into a second data model corresponding to a second data platform corresponding to the second business data according to the second business data, so that the second business data is calculated through the second data model to obtain a data calculation result, namely, the data model created by the second data platform is used for data processing to obtain a data processing result, for example, an LR model, an FM model, a GBDT model, an XGBT model or a DNN model created by the second data platform is used for processing the second business data to obtain a final data processing result. The second data platform comprises a Tensorflow platform, wherein the Tensorflow can be used for model construction and model application, and model release, and the model parser is also constructed based on the Tensorflow. For example, the result processed by the SPARK platform is output to the Tensorflow for analysis, and as the output content of the SPARK platform comprises PMML data and a model created by the SPARK platform in the PMML format, the analysis on the Tensorflow platform can comprise analysis data and an analysis model, and meanwhile, deep learning, migration learning and other models and Ensubele model construction are performed on the Tensorflow, and a final result is output. For modeling of different platforms, generating a standard format model problem based on PMML, reading and analyzing a model file of the PMML through Tesorsurface, thereby integrating Spark and Tensorflow, playing respective advantages, and performing cross-platform composite modeling. The overall architecture can be built based on a Tensorflow platform, and because the whole overall architecture is mainly built based on the Tensorflow platform, the SPARK platform is a part of the overall architecture built based on the Tensorflow platform, meanwhile, the Tensorflow platform is suitable for interacting with other systems, an application service is externally connected through Tensorflow server API, and load balancing is achieved through Java encapsulation.
The server receives the call of the service system, receives the first service data in the initial format sent by the service system, calls a first data platform corresponding to the first service data through a model analyzer to preprocess the data, converts the first service data into a unified preset format to obtain second service data, and then calls a corresponding model established by the second data platform in the model service cluster through a model router to process the second service data, which can be realized through the architecture shown in fig. 5. In the embodiment shown in fig. 5, the service system sends first service data in an initial format, the first service data is sent to a server of an execution unit corresponding to the first service data in a server cluster formed by a plurality of single servers through distribution of a load balancer, and a server of a subsequent execution unit processes the first service data to obtain a data processing result. The servers of the execution unit are units formed by a plurality of servers for processing data of a certain service type, and the servers of the execution unit are a subset of a server cluster. For example, in the insurance service system, since the insurance service includes service contents such as agent application service, claim fraud detection service, product recommendation service and application pricing service, a Server cluster C composed of seven servers such as Server1, server3, server4, server5, server6 and Server7 may be used to provide data service for the insurance service, where service data generated by agent application service may be processed by Server1 as an execution unit, service data generated by claim fraud detection service may be processed by an execution unit composed of two servers of Server1 and Server3, service data generated by product recommendation service may be processed by Server4 as an execution unit, and service data generated by application pricing service may be processed by an execution unit composed of three servers of Server5, server6 and Server 7. The data platform in the embodiment of the application refers to a platform formed by application software environments such as tools used for constructing a data model, the server and the server cluster refer to hardware equipment, and the execution unit is an execution main body formed by the hardware equipment and a software system on the hardware equipment.
With continued reference to fig. 5, as shown in fig. 5, the architecture diagram shown in fig. 5 includes:
1) And the interface layer is used for calling corresponding business application services aiming at business data of different applications, for example, calling corresponding business application services through Tensorflow server API to an external interface, and can be realized through Java encapsulation and load balancing. For example, in an insurance service, if service data of an agent application is received, a service interface of the agent application is called to process the service data of the agent application.
2) And the model processing layer is used for identifying different platforms to build models through a definition model analyzer and supporting the models built by Spark/Tesorsurface/Sklearn/R. The method specifically comprises the following steps: (1) the model analyzer is used for converting the initial format of the first service data into second service data with a unified preset format through analysis of the model analyzer, wherein the preset format is a custom format, a PMML format or a Protobuf format; (2) the model serves the cluster.
3) And the data calculation layer is used for processing the second business data by calling a data model in the model service cluster to obtain a data processing result, such as real-time feature calculation, custom feature processing (+, -, /), feature correlation operator (correlation calculation), feature query (nested query) and the like by Kafka+spark streaming processing to obtain the data processing result. The real-time data processing method comprises the steps of taking Kafka as a real-time distributed message queue, real-time producing and consuming messages, and reading data in Kafka in real time by utilizing a Spark Streaming real-time computing framework and then computing, wherein Spark Streaming belongs to a core API of Spark, and supports high-throughput and fault-tolerant real-time stream data processing.
Further, the content of the data calculation includes:
(1) Real-time feature calculation, which is the calculation of a pointer to the real-time feature;
(2) The custom feature processing is the processing of the pointer to the custom feature;
(3) The feature correlation operator, the foreign name operator, the operator is a mapping O from function space to function space: X.fwdarw.X. The operator in the broad sense can be generalized to any space, such as an inner product space and the like, and refers to the relevance processing of the features in the data;
(4) The characteristic query operator refers to the processing of characteristic query from a database.
With continued reference to fig. 5, as shown in fig. 5, the process of data processing across data platforms is described by taking the agent application service, the claim fraud detection service, the product recommendation service, and the application pricing service included in the insurance service as examples as follows:
1) The service system sends first service data to the load balancer through service call, that is, the front-end computer equipment sends the first service data to the load balancer through service call, wherein a data model of the first service data is constructed by a first data platform, for example, the data model of the first service data is constructed by a Spark platform, a Tesorsurface platform, a Sklearn platform or an R platform.
2) After the load balancer receives the first service data, identifying the service type to which the first service data belongs, for example, the first service data belongs to the service types of agent application service, claim fraud detection service, product recommendation service, application pricing service and the like, and sending the first service data to a model analyzer in a model processing layer according to the service type of the first service data and combining a load distribution strategy.
3) And the model analyzer calls a formula corresponding to the first data platform according to the first data platform to which the first service data belongs, and converts the first service data from an initial format of the first data platform into second service data in a unified preset format through preprocessing. For example, the model parser recognizes that the first service data is Spark format based on a Spark platform, and invokes a formula corresponding to the Spark format to preprocess the first service data, so as to convert the Spark format of the first service data to obtain second service data in a unified and preset PMML format.
4) The data calculation layer, namely the model router, receives the second service data, and calls a second data model in a model service cluster corresponding to the preset service type according to the service type of the second service data to process the second service data so as to obtain a data processing result, wherein the processing of the second service data by the data calculation layer comprises real-time feature calculation, custom feature processing, feature correlation operators and feature query operators. For example, please continue to refer to table 1, if the preset first data platform of the agent application service in the insurance service is a Spark platform, that is, the data model of the first service data in the agent application service is constructed by the Spark platform, and the second data platform is a tesorf low platform, that is, the second data model of the agent application service is constructed by the tesorf low platform, the second data model for processing the agent application service is a DNN model constructed based on the tesorf low platform, the tesorf low platform supports processing PMML format data, and after receiving the second service data in PMML format, the model router invokes the DNN model corresponding to the agent application service in the model service cluster to process the second service data, thereby obtaining a data processing result.
In one embodiment, before the step of inputting the second service data into the second data model constructed based on the second data platform, the method further includes:
receiving a first preset model created based on the first data platform;
outputting the first preset model in the preset format through a model analyzer to obtain a first preset model in the preset format:
the step of inputting the second business data into a second data model constructed based on a second data platform further comprises:
and inputting the second service data into the first preset model of the preset format to process the second service data through the first preset model of the preset format so as to obtain a data processing result.
Specifically, the first data platform may also build some models, for example, the Spark platform may build some models, and output the built models as PMML, protobuf or custom formats for use by the second data platform, for example, tesorf low, where the model parser supports PMML, protobuf and custom formats, and the custom formats refer to formats that are used only for a specific item, and are related to formats having a universality standard, such as PMML format or Protobuf format. The selection of the first data platform and the second data platform is respectively based on the actual business data to make the data processing performance and the algorithm rich requirements. Ordering the platforms from the performance and algorithm angles of the platforms, wherein R refers to R language, and the ordering of the platforms is as follows:
The data processing performance of the platform is sequentially from weak to strong: r, sklearn, tesorflow and Spark;
the platform algorithm is rich in the sequence from weak to strong, and comprises the following steps: spark, R, sklearn and Tesorsurface.
In one embodiment, the step of processing the second service data by the second data model to obtain a data processing result includes:
analyzing the second service data through the second data model to obtain analysis data;
and processing the analysis data through the second data model to obtain a data processing result.
Specifically, the second business data is analyzed through the second data model to obtain analysis data, and the analysis data is processed through the second data model to obtain a data processing result. The selection of the first data platform and the second data platform is respectively selected according to the requirements of the actual service data on the data processing performance and the richness of the algorithm, and the ordering of the platforms is as described above.
In one embodiment, the first business data includes data and a second preset model created based on the first data platform;
The step of parsing the second service data through the second data model to obtain parsed data includes:
and carrying out data analysis and second preset model analysis on the second service data through the second data model to obtain analysis data.
Specifically, the first data platform may also build some models, for example, the Spark platform may build some models, and output the built models as PMML, protobuf or custom formats, the model parser supports PMML, protobuf and custom formats, that is, not only the first data platform may generate service data, but also a second preset model may be built, the first service data may be preprocessed to obtain second service data in a preset format, the corresponding second service data includes not only data, but also the second preset model in the preset format obtained through preprocessing, when the second service data is parsed by the second data model, not only data parsing is performed, but also model parsing is performed to obtain parsed data including data and models, and then the parsed data is processed by the second data model to obtain a data processing result, where the ordering of each platform is as described above.
In one embodiment, the step of receiving the first service data includes:
and receiving the first service data sent by the load balancer according to a preset strategy.
The preset strategy comprises load distribution according to the service type or load distribution according to the performance ratio used by each server for processing data. The load distribution is performed according to the service type, for example, the first type of service data is processed by the server a, the second type of service data is processed by the server B, the third type of service data is processed by the server C, and so on. The load is distributed according to the load amount of each server processing data, the performance of the A server processing data is already 90 percent, the B server processing data is distributed to the B server processing data, the performance of the B server processing data is already 95 percent, and the B server processing data is distributed to the C server processing data.
Specifically, since the first service data may be one of a plurality of service data, each service data is constructed by a different data platform to correspond to a different format, so that data processing is required for massive large data generated by the plurality of service data. When processing massive big data, because the processing capacity of a single server is limited, the data processing is generally performed by adopting a server cluster, the received first service data needs to be distributed among servers in the server cluster, and the received service data is generally allocated in the server cluster according to a preset strategy through a load balancer. Referring to fig. 5, in this embodiment, as shown in fig. 5, the service system sends first service data in an initial format, where the initial format is a preset Spark format, a tesorf low format, a Sklearn format, or an R format of a first data platform, the first service data is distributed by a load balancer and then sent to an operation execution unit of a server cluster, the server where the execution unit is located identifies that the format of the first service data is a Spark format, a tesorf low format, a Tklearn format, or an R format according to an identifier carried by the first service data, a model parser in the server calls a formula corresponding to the first platform according to the format of the first service data to parse the initial service data, converts the first service data in the Spark format, the tesorf low format, the Tklearn format, or the R format into second service data in a PMML format, a Protobuf format, or a specific custom format, and calls a gbdata processing result of the model created by the second data platform corresponding to the first service data in the model service cluster, and finally obtains a gbdata processing result.
In one embodiment, the first data platform is a Spark platform and the second data platform is a Tensorflow platform.
Specifically, through receiving the call of the service system, receiving the first service data in the initial format sent by the service system, invoking a first data platform corresponding to the first service data through a model analyzer to preprocess the data, converting the first service data into a unified preset format to obtain second service data, and then invoking a corresponding model established by the second data platform in the model service cluster by a model router to process the second service data to obtain a data processing result, so that the respective computing capability and rich algorithm capability of different platforms can be exerted, and the efficiency and the accuracy of constructing a composite model in a complex scene are improved. Because the big data processing platform and the modeling platform have various advantages and disadvantages, each platform has own advantages and disadvantages, the data processing across the data platforms is realized by combining the first data platform and the second data platform, for example, if the Spark platform is used as the first data platform and the Tensorflow platform is used as the second data platform, the models such as deep learning, migration learning and the like and the Ensubele model construction are carried out on the Tensorflow platform, and the final result is output, so that the following benefits can be brought:
1) The computing capability of the Spark platform is exerted, and meanwhile, the rich algorithm capability of the Tensorflow platform is exerted;
2) And the efficiency and the precision of constructing the composite model under the complex scene are improved.
Further, before the step of receiving the first service data, the data model of the first service data is constructed by the first data platform, the method further includes:
the first data model and the second data model are tested in an offline manner and in an online manner.
In particular, various methods may be employed to evaluate the effect of a data model included in a data processing method across data platforms, such as by evaluating the model effect in an offline manner and an online manner, with different methods.
Still further, before the step of receiving the first service data, the data model of the first service data is constructed by the first data platform, the method further includes:
testing the first data model through a first preset effect index;
and testing the second data model through a second preset effect index.
Specifically, the model is released online through the test, and the model accuracy is tracked by designing the effect index, for example, different business indexes are designed to follow the model accuracy according to different businesses aimed by the model, for example, the model accuracy is tracked by designing the conversion rate aiming at sales businesses.
It should be noted that, in the data processing method across data platforms described in the foregoing embodiments, the technical features included in different embodiments may be recombined according to needs to obtain a combined embodiment, which is within the scope of protection claimed in the present application.
With reference to fig. 6, fig. 6 is a schematic block diagram of a data processing apparatus with a cross-data platform according to an embodiment of the present application. Corresponding to the data processing method across data platforms, the embodiment of the application also provides a data processing device across data platforms. As shown in fig. 6, the data processing apparatus across data platforms includes a unit for executing the above data processing method across data platforms, and the apparatus may be configured in a computer device such as a server, and the apparatus involves a first data platform and a second data platform, where the first data platform and the second data platform are data platforms of different modeling types. Specifically, referring to fig. 6, the cross-data-platform data processing apparatus 600 includes a receiving unit 601, a preprocessing unit 602, an input unit 603, and a processing unit 604.
The receiving unit 601 is configured to receive first service data, where the first service data is generated based on a first data model constructed by the first data platform;
A preprocessing unit 602, configured to preprocess the first service data through a model parser to obtain second service data in a preset format;
an input unit 603, configured to input the second service data into a second data model constructed based on a second data platform;
and the processing unit 604 is configured to process the second service data through the second data model to obtain a data processing result.
In one embodiment, the first service data carries identification information for identifying the first service data;
the preprocessing unit 602 includes:
the determining subunit is used for determining a mode of preprocessing the first service data according to the identification information, and taking the mode as a first preset mode;
the preprocessing subunit is used for preprocessing the first service data by adopting the first preset mode through the model analyzer to obtain second service data in a preset format;
the input unit 603 includes:
the calling subunit is used for calling a second data model which is pre-configured in the model service cluster and is based on a second data platform through the model router according to the identification information;
And the input subunit is used for inputting the second service data into the second data model.
In one embodiment, the preprocessing subunit is configured to call, through a model parser, a preset formula corresponding to the first data platform to convert the first service data into a preset format to obtain second service data.
In one embodiment, the preprocessing unit 602 includes:
the first conversion subunit is used for carrying out data conversion on the first service data in a second preset mode to obtain third service data, wherein the data conversion refers to conversion of the first service data from one expression form to another expression form;
the screening subunit is used for carrying out data screening on the third service data according to preset conditions to obtain fourth service data;
an input subunit, configured to input the fourth service data into a preset third data model;
and the second conversion subunit is used for converting the fourth service data into second service data with a preset format through the third data model.
In one embodiment, the processing unit 604 includes:
the analysis subunit is used for analyzing the second service data through the second data model to obtain analysis data;
And the processing subunit is used for processing the analysis data through the second data model to obtain a data processing result.
In one embodiment, the receiving unit 601 is configured to receive first service data sent by a load balancer according to a preset policy.
In one embodiment, the first data platform is a Spark platform and the second data platform is a Tensorflow platform.
It should be noted that, as those skilled in the art can clearly understand, the specific implementation process of the data processing apparatus and each unit of the cross-data platform may refer to the corresponding description in the foregoing method embodiment, and for convenience and brevity of description, the description is omitted here.
Meanwhile, the above-mentioned dividing and connecting modes of each unit in the data processing device across data platforms are only used for illustration, in other embodiments, the data processing device across data platforms may be divided into different units according to the needs, and different connecting orders and modes may be adopted for each unit in the data processing device across data platforms, so as to complete all or part of functions of the data processing device across data platforms.
The above-described data processing apparatus across data platforms may be implemented in the form of a computer program which is executable on a computer device as shown in fig. 7.
Referring to fig. 7, fig. 7 is a schematic block diagram of a computer device according to an embodiment of the present application. The computer device 700 may be a computer device such as a desktop computer or a server, or may be a component or part of another device.
With reference to FIG. 7, the computer device 700 includes a processor 702, memory, and a network interface 705, which are connected by a system bus 701, wherein the memory may include a non-volatile storage medium 703 and an internal memory 704. The non-volatile storage medium 703 may store an operating system 7031 and a computer program 7032. The computer program 7032, when executed, can cause the processor 702 to perform one of the data processing methods described above across data platforms.
The processor 702 is used to provide computing and control capabilities to support the operation of the overall computer device 700.
The internal memory 704 provides an environment for the execution of a computer program 7032 in a non-volatile storage medium 703, which computer program 7032, when executed by the processor 702, causes the processor 702 to perform a data processing method as described above across data platforms.
The network interface 705 is used for network communication with other devices. Those skilled in the art will appreciate that the architecture shown in fig. 7 is merely a block diagram of a portion of the architecture in connection with the present application and is not intended to limit the computer device 700 to which the present application is applied, and that a particular computer device 700 may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components. For example, in some embodiments, the computer device may include only a memory and a processor, and in such embodiments, the structure and function of the memory and the processor are consistent with the embodiment shown in fig. 7, and will not be described again.
Taking the computer device as a server, the processor 702 is configured to execute a computer program 7032 stored in a memory, so as to implement the following steps: receiving first business data, wherein the first business data is generated based on a first data model constructed by a first data platform; preprocessing the first service data through a model analyzer to obtain second service data in a preset format; inputting the second business data into a second data model constructed based on a second data platform; and processing the second service data through the second data model to obtain a data processing result.
In an embodiment, when the step of receiving the first service data is implemented by the processor 702, the first service data carries identification information for identifying the first service data;
when implementing the step of preprocessing the first service data by the model parser to obtain second service data in a preset format, the processor 702 specifically implements the following steps:
determining a mode of preprocessing the first service data according to the identification information, and taking the mode as a first preset mode;
Preprocessing the first service data by a model analyzer in the first preset mode to obtain second service data in a preset format;
the processor 702 specifically implements the following steps when implementing the step of inputting the second service data into the second data model constructed based on the second data platform:
calling a second data model which is pre-configured in the model service cluster and is constructed based on a second data platform through a model router according to the identification information;
and inputting the second service data into the second data model.
In an embodiment, when the step of preprocessing the first service data by the model parser in the first preset manner to obtain second service data in a preset format is implemented by the processor 702, the following steps are specifically implemented:
and calling a preset formula corresponding to the first data platform through a model analyzer to convert the first service data into a preset format so as to obtain second service data.
In an embodiment, when implementing the step of preprocessing the first service data by the model parser to obtain second service data in a preset format, the processor 702 specifically implements the following steps:
Performing data conversion on the first service data in a second preset mode to obtain third service data, wherein the data conversion refers to conversion of the first service data from one expression form to another expression form;
data screening is carried out on the third service data according to preset conditions to obtain fourth service data;
inputting the fourth service data into a preset third data model;
and converting the fourth service data into second service data in a preset format through the third data model.
In an embodiment, when implementing the step of processing the second service data by the second data model to obtain a data processing result, the processor 702 specifically implements the following steps:
analyzing the second service data through the second data model to obtain analysis data;
and processing the analysis data through the second data model to obtain a data processing result.
In one embodiment, when implementing the step of receiving the first service data, the processor 702 specifically implements the following steps:
and receiving the first service data sent by the load balancer according to a preset strategy.
In an embodiment, when the processor 702 implements the steps of the data processing method of the cross-data platform, the first data platform is a Spark platform, and the second data platform is a Tensorflow platform.
It should be appreciated that in embodiments of the present application, the processor 702 may be a central processing unit (Central Processing Unit, CPU), the processor 702 may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSPs), application specific integrated circuits (Application Specific Integrated Circuit, ASICs), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. Wherein the general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
It will be appreciated by those skilled in the art that all or part of the flow of the method of the above embodiments may be implemented by a computer program, which may be stored on a computer readable storage medium. The computer program is executed by at least one processor in the computer system to implement the flow steps of the embodiments of the method described above.
Accordingly, the present application also provides a computer-readable storage medium. The computer readable storage medium may be a non-volatile computer readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of:
a computer program product which, when run on a computer, causes the computer to perform the steps of the cross-data-platform data processing method described in the above embodiments.
The computer readable storage medium may be an internal storage unit of the aforementioned device, such as a hard disk or a memory of the device. The computer readable storage medium may also be an external storage device of the device, such as a plug-in hard disk, smart Media Card (SMC), etc. provided on the device.
Those of ordinary skill in the art will appreciate that the elements and algorithm steps described in connection with the embodiments disclosed herein may be embodied in electronic hardware, in computer software, or in a combination of the two, and that the elements and steps of the examples have been generally described in terms of function in the foregoing description to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
While the invention has been described with reference to certain preferred embodiments, it will be understood by those skilled in the art that various changes and substitutions of equivalents may be made and equivalents will be apparent to those skilled in the art without departing from the scope of the invention. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (8)

1. A method of cross-data-platform data processing, the method comprising:
receiving first business data, wherein the first business data is generated based on a first data model constructed by a first data platform;
preprocessing the first service data through a model analyzer to obtain second service data in a preset format;
inputting the second business data into a second data model constructed based on a second data platform;
processing the second service data through the second data model to obtain a data processing result;
the first service data carries identification information for identifying the first service data;
the step of preprocessing the first service data by a model parser to obtain second service data in a preset format includes:
Determining a mode of preprocessing the first service data according to the identification information, and taking the mode as a first preset mode;
preprocessing the first service data by a model analyzer in the first preset mode to obtain second service data in a preset format;
the step of inputting the second service data into a second data model constructed based on a second data platform includes:
calling a second data model which is pre-configured in the model service cluster and is constructed based on a second data platform through a model router according to the identification information;
inputting the second business data into the second data model;
the step of preprocessing the first service data by a model parser to obtain second service data in a preset format includes:
performing data conversion on the first service data in a second preset mode to obtain third service data, wherein the data conversion refers to conversion of the first service data from one expression form to another expression form;
data screening is carried out on the third service data according to preset conditions to obtain fourth service data;
Inputting the fourth service data into a preset third data model;
and converting the fourth service data into second service data in a preset format through the third data model.
2. The method for processing data across data platforms according to claim 1, wherein the step of preprocessing the first service data by the model parser in the first preset manner to obtain second service data in a preset format includes:
and calling a preset formula corresponding to the first data platform through a model analyzer to convert the first service data into a preset format so as to obtain second service data.
3. The method for processing data across data platforms according to claim 1, wherein the step of processing the second service data by the second data model to obtain a data processing result comprises:
analyzing the second service data through the second data model to obtain analysis data;
and processing the analysis data through the second data model to obtain a data processing result.
4. The cross-data platform data processing method as claimed in claim 1, wherein the step of receiving the first service data comprises:
And receiving the first service data sent by the load balancer according to a preset strategy.
5. The method for processing data across data platforms according to claim 1, wherein the first data platform is a Spark platform and the second data platform is a Tensorflow platform.
6. A cross-data platform data processing apparatus, the apparatus comprising:
a receiving unit, configured to receive first service data, where the first service data is generated based on a first data model constructed by a first data platform;
the preprocessing unit is used for preprocessing the first service data through the model analyzer to obtain second service data in a preset format;
an input unit for inputting the second service data into a second data model constructed based on a second data platform;
the processing unit is used for processing the second service data through the second data model to obtain a data processing result;
the first service data carries identification information for identifying the first service data, and the preprocessing unit comprises:
the determining subunit is used for determining a mode of preprocessing the first service data according to the identification information, and taking the mode as a first preset mode;
The preprocessing subunit is used for preprocessing the first service data by adopting the first preset mode through the model analyzer to obtain second service data in a preset format;
the input unit includes:
the calling subunit is used for calling a second data model which is pre-configured in the model service cluster and is based on a second data platform through the model router according to the identification information;
an input subunit, configured to input the second service data into the second data model;
the preprocessing unit further includes:
the first conversion subunit is used for carrying out data conversion on the first service data in a second preset mode to obtain third service data, wherein the data conversion refers to conversion of the first service data from one expression form to another expression form;
the screening subunit is used for carrying out data screening on the third service data according to preset conditions to obtain fourth service data;
an input subunit, configured to input the fourth service data into a preset third data model;
and the second conversion subunit is used for converting the fourth service data into second service data with a preset format through the third data model.
7. A computer device comprising a memory and a processor coupled to the memory; the memory is used for storing a computer program; the processor is configured to execute a computer program stored in the memory to perform the steps of the cross-data-platform data processing method according to any of claims 1-5.
8. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program which, when executed by a processor, causes the processor to perform the steps of the cross-data-platform data processing method according to any of claims 1-5.
CN201910851205.3A 2019-09-10 2019-09-10 Cross-data-platform data processing method, device, equipment and storage medium Active CN110728118B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910851205.3A CN110728118B (en) 2019-09-10 2019-09-10 Cross-data-platform data processing method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910851205.3A CN110728118B (en) 2019-09-10 2019-09-10 Cross-data-platform data processing method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN110728118A CN110728118A (en) 2020-01-24
CN110728118B true CN110728118B (en) 2023-07-25

Family

ID=69218078

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910851205.3A Active CN110728118B (en) 2019-09-10 2019-09-10 Cross-data-platform data processing method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN110728118B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115550463A (en) * 2022-09-16 2022-12-30 深圳市润腾智慧科技有限公司 Cross-cloud Internet of things platform data processing method and device and related equipment
CN116308434B (en) * 2023-05-12 2023-08-11 杭州大鱼网络科技有限公司 Insurance fraud identification method and system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107908426A (en) * 2017-12-21 2018-04-13 江苏国泰新点软件有限公司 Design method, device, mobile terminal and the storage medium of cross-platform program
CN107995259A (en) * 2017-11-14 2018-05-04 北京思特奇信息技术股份有限公司 A kind of method and device handled cross-domain request
CN109308224A (en) * 2017-07-27 2019-02-05 阿里巴巴集团控股有限公司 The method, apparatus and system of cross-platform data communication, cross-platform data processing
CN109598289A (en) * 2018-11-16 2019-04-09 京东城市(南京)科技有限公司 Cross-platform data processing method, device, equipment and readable storage medium storing program for executing

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130045803A1 (en) * 2011-08-21 2013-02-21 Digital Harmony Games, Inc. Cross-platform gaming between multiple devices of multiple types

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109308224A (en) * 2017-07-27 2019-02-05 阿里巴巴集团控股有限公司 The method, apparatus and system of cross-platform data communication, cross-platform data processing
CN107995259A (en) * 2017-11-14 2018-05-04 北京思特奇信息技术股份有限公司 A kind of method and device handled cross-domain request
CN107908426A (en) * 2017-12-21 2018-04-13 江苏国泰新点软件有限公司 Design method, device, mobile terminal and the storage medium of cross-platform program
CN109598289A (en) * 2018-11-16 2019-04-09 京东城市(南京)科技有限公司 Cross-platform data processing method, device, equipment and readable storage medium storing program for executing

Also Published As

Publication number Publication date
CN110728118A (en) 2020-01-24

Similar Documents

Publication Publication Date Title
US9639444B2 (en) Architecture for end-to-end testing of long-running, multi-stage asynchronous data processing services
CN112015402B (en) Method and device for quickly establishing service scene and electronic equipment
JP2023036681A (en) Task processing method, processing device, electronic equipment, storage medium, and computer program
CN110728118B (en) Cross-data-platform data processing method, device, equipment and storage medium
CN112016285B (en) Logistics information processing method and processing system
CN114996557B (en) Service stability determination method, device, equipment and storage medium
CN116155628B (en) Network security detection method, training device, electronic equipment and medium
CN114358910A (en) Abnormal financial data processing method, device, equipment and storage medium
CN113743425B (en) Method and device for generating classification model
US20210141791A1 (en) Method and system for generating a hybrid data model
CN114579398A (en) Log storage method, device, equipment and storage medium
CN118689880B (en) Industrial Internet of things data storage optimization method, system and equipment
CN104765790A (en) Data searching method and device
CN113409136B (en) Combined service similarity analysis method, device, computer system and storage medium
EP2690554A2 (en) A method of operating a system for processing data and a system therefor
CN118378725B (en) Model optimization method, device, computer equipment and computer readable storage medium
CN118689880A (en) Industrial Internet of things data storage optimization method, system and equipment
CN118170811A (en) Data query method, device, apparatus, medium and program product
CN117390099A (en) Data query method, device, computer equipment and storage medium
WO2023028695A1 (en) Rule based machine learning for precise fraud detection
CN115421809A (en) File processing method and device, electronic equipment and computer readable medium
Zhang et al. Data-Aware Adaptive Compression for Stream Processing
US10210592B2 (en) System, method, and computer program product for efficient aggregation of data records of big data
CN114185943A (en) Data verification system and method and electronic equipment
CN113742243A (en) Application evaluation method and device, electronic equipment and computer readable medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant