WO2021228264A1 - 一种应用机器学习的方法、装置、电子设备及存储介质 - Google Patents
一种应用机器学习的方法、装置、电子设备及存储介质 Download PDFInfo
- Publication number
- WO2021228264A1 WO2021228264A1 PCT/CN2021/094202 CN2021094202W WO2021228264A1 WO 2021228264 A1 WO2021228264 A1 WO 2021228264A1 CN 2021094202 W CN2021094202 W CN 2021094202W WO 2021228264 A1 WO2021228264 A1 WO 2021228264A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- model
- data
- online
- database
- deployed
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 88
- 238000010801 machine learning Methods 0.000 title claims abstract description 79
- 238000013523 data management Methods 0.000 claims description 36
- 238000004422 calculation algorithm Methods 0.000 claims description 30
- 230000008569 process Effects 0.000 claims description 27
- 238000004364 calculation method Methods 0.000 claims description 19
- 238000012549 training Methods 0.000 claims description 11
- 238000004590 computer program Methods 0.000 claims description 10
- 238000011156 evaluation Methods 0.000 claims description 10
- 238000012545 processing Methods 0.000 claims description 10
- 230000002776 aggregation Effects 0.000 claims description 4
- 238000004220 aggregation Methods 0.000 claims description 4
- 238000001914 filtration Methods 0.000 claims description 3
- 230000000694 effects Effects 0.000 description 23
- 230000009471 action Effects 0.000 description 11
- 230000006870 function Effects 0.000 description 10
- 238000010586 diagram Methods 0.000 description 9
- 238000013528 artificial neural network Methods 0.000 description 5
- 238000009825 accumulation Methods 0.000 description 4
- 238000004891 communication Methods 0.000 description 3
- 238000007477 logistic regression Methods 0.000 description 3
- 238000003066 decision tree Methods 0.000 description 2
- 238000012423 maintenance Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000003058 natural language processing Methods 0.000 description 2
- 238000003672 processing method Methods 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011478 gradient descent method Methods 0.000 description 1
- 238000012417 linear regression Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 230000003252 repetitive effect Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 239000013598 vector Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/17—Details of further file system functions
- G06F16/1734—Details of monitoring file system events, e.g. by the use of hooks, filter drivers, logs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/182—Distributed file systems
Definitions
- the embodiments of the present disclosure relate to the technical field of machine learning, and in particular to a method, device, electronic device, and storage medium for applying machine learning.
- the application of machine learning may include but is not limited to: problem definition, machine learning model establishment (referred to as modeling), model online service, feedback information collection, and model iteration update processes.
- modeling is based on offline data exploration model, and then based on offline evaluation method to determine the model effect, after the model effect reaches the standard (that is, meets the preset requirements), the IT staff will deploy the model online and perform the model online service.
- the online effect of the model that meets the offline effect may not meet the requirements.
- the inventor of the present disclosure found that because the data used for modeling is inconsistent with the online data, it is difficult to guarantee the consistency of the calculated features during the modeling process, resulting in the model effect line There is a big difference between online and offline, which fails to meet expectations, making it difficult for the model to go online.
- At least one embodiment of the present disclosure provides a method, apparatus, electronic device, and storage medium for applying machine learning.
- an embodiment of the present disclosure proposes a method for applying machine learning.
- the method includes: acquiring online related data streams of a specified business scenario based on a data service interface; accumulating data in the related data streams into a first database
- the model solution is explored based on the data in the first database;
- the model solution includes the following program sub-items: feature engineering solution, model algorithm and model hyperparameters; will be explored
- the model solution of is deployed online to provide online model estimation service, wherein the online model estimation service is performed based on the relevant data stream of the specified business scenario obtained online by the data service interface.
- an embodiment of the present disclosure proposes an apparatus for applying machine learning.
- the apparatus includes: a data management module configured to obtain online related data streams of a specified business scenario based on a data service interface; The data of is accumulated in the first database; the model scheme exploration module is configured to explore the model scheme based on the data in the first database when the first preset condition is met; the model scheme includes the following scheme sub-items: Feature engineering solutions, model algorithms, and model hyperparameters; model online prediction service module, configured to deploy the model solution obtained by the model solution exploration module online to provide online model prediction services, wherein the model online prediction service module The estimation service is performed based on the relevant data flow of the specified business scenario obtained online by the data service interface.
- an embodiment of the present disclosure proposes an electronic device, including: a processor and a memory; the processor is configured to execute the application machine learning method described in the first aspect by calling a program or instruction stored in the memory Method steps.
- an embodiment of the present disclosure proposes a computer-readable storage medium configured to store a program or instruction that causes a computer to execute the steps of the method for applying machine learning as described in the first aspect.
- the embodiments of the present disclosure also provide a computer program product, including computer program instructions, which, when the computer program instructions are run on a computer device, implement the steps of the method for applying machine learning as described in the first aspect.
- the business scenario is directly connected, the business scenario-related data is accumulated, and then the model solution is explored to obtain the model solution and the offline model, so as to ensure that the data used in the offline model solution exploration and the online estimation service of the model are used
- the received data is of the same origin, realizing the homology of offline and online data.
- the estimated effect of the offline model deployed online is poor.
- the offline model is deployed online.
- the model solution After the model solution is deployed and launched, it can receive the estimation request (that is, the data of the request data stream) to obtain sample data with features and feedback, and then use the sample data with features and feedback for model self-learning, and the self-learning model can be deployed Go online to ensure that the data and feature engineering schemes used in model self-learning are consistent with the data and feature engineering schemes used in the model online estimation service, so as to achieve the consistency of model self-learning effects and model prediction effects.
- the estimation request that is, the data of the request data stream
- the self-learning model can be deployed Go online to ensure that the data and feature engineering schemes used in model self-learning are consistent with the data and feature engineering schemes used in the model online estimation service, so as to achieve the consistency of model self-learning effects and model prediction effects.
- FIG. 1 is an exemplary architecture diagram of an apparatus for applying machine learning provided by an embodiment of the present disclosure
- Fig. 2 is an exemplary architecture diagram of another apparatus for applying machine learning provided by an embodiment of the present disclosure
- Fig. 3 is an exemplary flow logic block diagram of the apparatus for applying machine learning shown in Fig. 2;
- Fig. 4 is an exemplary data flow diagram of the apparatus for applying machine learning shown in Fig. 2;
- FIG. 5 is an exemplary architecture diagram of an electronic device provided by an embodiment of the present disclosure.
- Fig. 6 is an exemplary flowchart of a method for applying machine learning provided by an embodiment of the present disclosure.
- FIG. 1 is an exemplary architecture diagram of an apparatus for applying machine learning provided by an embodiment of the present disclosure, wherein the apparatus for applying machine learning is suitable for supervised learning artificial intelligence modeling of various data, including but It is not limited to two-dimensional structured data, images, NLP (Natural Language Processing), voice, etc.
- NLP Natural Language Processing
- the apparatus for applying machine learning can be applied to a specified business scenario, wherein the specified business scenario has pre-defined information about the related data flow of the business scenario, and the related data flow may include, but is not limited to: request data flow, display The data stream, the feedback data stream, and the business data stream, wherein the data of the display data stream is the data displayed by the specified business scenario based on the requested data stream.
- the short video application scenario as an example, after the request data is for the user to swipe or click on the user terminal to refresh the short video, the application background will screen out a set of candidate videos to form the request data that needs to be modeled.
- Display data is what short videos the short video application actually shows to users.
- the feedback data is, for example, whether the user clicks or watches the short video displayed by the short video application.
- the business data is, for example, data related to business logic, such as comment data and like data when the user watches a short video.
- the information about the related data stream of the predefined business scenario can be understood as the fields included in the related data.
- the related data stream is the request data stream.
- the information about the predefined request data stream can be understood as the request.
- the fields included in the request data in the data stream, and the fields may be user ID, request content, request time, candidate material ID, and so on.
- the online model estimation service can be provided through the device applying machine learning as shown in FIG. 1.
- the device for applying machine learning may include, but is not limited to: a data management module 100, a model solution exploration module 200, a model online estimation service module 300, and other components required for applying machine learning, such as offline databases, online Database, etc.
- the data management module 100 is configured to store and manage data derived from a specified business scenario and data generated by the model online estimation service module 300.
- the data derived from the designated business scenario is a related data stream obtained online by the data management module 100 directly connecting to the designated business scenario based on a data service interface.
- the data service interface is an application programming interface (API, Application Programming Interface).
- API Application Programming Interface
- the data service interface is created by the data management module 100 based on pre-defined information related to the data flow of the specified business scenario.
- the data management module 100 may provide a user interface, and based on the user interface, receive information about the relevant data flow of the specified business scenario input by the user.
- the user may be The operation and maintenance engineer of the specified business scenario.
- the data management module 100 may create a data service interface based on the information about the relevant data flow of the specified business scenario input by the user.
- the data service interface and the related data flow are one-to-one, for example, request data flow, display data flow, feedback data flow, and business data flow respectively correspond to different data service interfaces.
- the data management module 100 may accumulate data in the relevant data stream of the specified business scenario into a first database, where the first database is an offline database, for example, the offline database may be a distributed database.
- the distributed file storage system (HDFS, Hadoop Distributed File System) can also be other offline databases.
- the data management module 100 may process the data requesting the data stream to obtain sample data, where the processing method includes, but is not limited to, processing using a filter and flattening. .
- the data management module 100 may accumulate the data of the request data stream, the sample data, the data of the feedback data stream, and the data of the service data stream into the first database.
- the data management module 100 can use a filter to make the request based on the data of the display data stream.
- the data of the data stream is filtered to obtain the intersection data. For example, there are 10 pieces of data in the display data stream, 12 pieces of data in the request data stream, and 5 pieces of the same data in the display data stream and the request data stream. Then through the filter, the 5 pieces of the same data are the intersection data. Filter out.
- the data management module 100 can obtain sample data by flattening the intersection data (the 5 pieces of the same data).
- the data management module 100 can accumulate the data of the display data stream and the sample data obtained by the filtering process into the first database.
- the data management module 100 may receive data table attribute information input by the user through a user interface, where the data table attribute information describes the number of columns included in the data table and the data attributes of each column, such as User ID is a discrete field, request time is a time field, browsing time is a numeric field, etc.
- the data management module 100 can receive a splicing scheme between data tables input by a user through a user interface, where the splicing scheme includes splicing keys for splicing different data tables, and the number relationship of the same splicing keys between primary and secondary tables, Timing relationship and aggregation relationship.
- the data management module 100 may maintain logical relationship information through the first database based on the attribute information of the data table and the spelling plan; wherein the logical relationship information is a description of different data tables. Information about interrelationships, where the logical relationship information includes: the data table attribute information and the table spelling scheme.
- the model solution exploration module 200 is configured to, when a first preset condition is met, based on the data in the first database (for example, the logical relationship information, the data of the requested data stream, the sample data, and the feedback One or more of the data of the data stream, the data of the business data stream, and the data of the display data stream) the exploration model scheme.
- the first preset condition may include at least one of data amount, time, and manual trigger.
- the first preset condition may be that the amount of data in the first database reaches the preset amount of data, or all The time length of data accumulation in the first database reaches the preset time length.
- the setting of the first preset condition may enable the model solution exploration module 200 to iteratively update the model solution.
- the model scheme includes the following scheme sub-items: feature engineering scheme, model algorithm and model hyperparameters.
- the feature engineering scheme is explored based on the logical relationship information. Therefore, the feature engineering scheme has at least a table-joining function. It should be noted that the table-joining scheme of the characteristic engineering scheme can be the same as the table-joining scheme input by the user. Can be different.
- the feature engineering solution may also have other functions, such as extracting features from data for use by model algorithms or models.
- the model algorithm may be a commonly used machine learning algorithm, such as a supervised learning algorithm, including but not limited to: LR (Logistic Regression), GBDT (Gradient Boosting Decision Tree, Gradient Boosting Iterative Decision Tree), DeepNN (Deep Neural Network, deep neural network) and so on.
- the hyperparameters of the model are pre-set parameters configured to assist model training before machine learning, such as the number of categories in the clustering algorithm, the step size of the gradient descent method, the number of layers of the neural network, and the number of training neural networks. Learning rate, etc.
- the model solution exploration module 200 may generate at least two model solutions when exploring the model solution, for example, may generate at least two model solutions based on the logical relationship information maintained by the first database. Among them, at least one project sub-item is different between different model projects. In some embodiments, the model solution exploration module 200 uses the at least two model solutions for model training based on the data in the first database to obtain the parameters of the model itself, where the parameters of the model itself are, for example, : Weights in neural networks, support vectors in support vector machines, coefficients in linear regression or logistic regression, etc.
- the model solution exploration module 200 may evaluate the models trained by the at least two model solutions based on the machine learning model evaluation index, and then obtain the results from the at least two model solutions based on the evaluation results. To choose from, get the explored model scheme.
- the machine learning model evaluation index is, for example, an AUC (Area Under Curve) value.
- the model online estimation service module 300 is configured to deploy the model solution explored by the model solution exploration module 200 online to provide online model estimation services, wherein the model online estimation service is based on the data service interface obtained online The relevant data flow of the specified business scenario is performed.
- the online model estimation service module 300 only deploys the model solution online, but does not deploy the offline model obtained during the exploration process of the model solution exploration module 200 online, which can prevent offline models from being directly deployed and online due to online deployment. There is an inconsistency between the data obtained from the upper feature calculation and the offline feature calculation, which leads to the problem that the estimated effect of the offline model deployed on the line is poor.
- model online estimation service module 300 since the model online estimation service module 300 only deploys the model solution online, and does not deploy the offline model online, it does not generate an estimation result when the model online estimation service is provided.
- the default estimation result is sent to the specified business scenario, and the specified business scenario ignores the default estimation result after receiving the default estimation result. Therefore, the model solution exploration module 200 in FIG. 1 points to the model online prediction result with a virtual arrow The estimation service module 300 indicates that the model solution will not provide online estimation services, but will still feed back the default estimation results.
- the online model prediction service module 300 when the online model prediction service module 300 deploys the model solution online, it also deploys the offline model obtained during the exploration process of the model solution exploration module 200, and the offline model is based on the first database.
- the relevant data of the specified business scenario accumulated in the offline database is obtained through training, and after the offline model is deployed and online, the relevant data of the specified business scenario is estimated. Therefore, although the online and offline features are calculated The data may be inconsistent, but the same source of online and offline data is still achieved.
- the online model estimation service module 300 can store the relevant data stream of the specified business scenario acquired by the data service interface in a second database, where the second The database is an online database, such as a real-time feature storage engine (rtidb).
- rtidb is a distributed feature database for AI hard real-time scenarios. It has the characteristics of high-efficiency computing, read-write separation, high concurrency, and high-performance query; the second database is also Can be other online databases.
- the model online estimation service module 300 uses the data in the second database and the received request data to perform online real-time feature calculation based on the feature engineering solution in the deployed model solution to obtain the prediction Estimate the characteristic data of the sample.
- the online model estimation service module 300 when the online model estimation service module 300 receives the requested data, it will collate the data in the second database and the received request data based on the feature engineering solution in the model solution deployed online. And online real-time feature calculation to obtain the wide table feature data, and the obtained feature data of the estimated sample is the wide table feature data.
- the online model prediction service module 300 can obtain the feature data (or wide-table feature data) of the predicted sample based on the deployed model solution, and splice the feature data and the feedback data to generate sample data with features and feedback.
- the sample data may also include other data, such as time stamp data, etc.; the feedback data is derived from a feedback data stream.
- the model online estimation service module 300 before the model online estimation service module 300 splices the characteristic data and the feedback data, the characteristic data and the display data are spliced to obtain the characteristic data with display data, and the display data is derived from the display data. Stream; and then splicing the feature data and feedback data with display data to generate sample data with display data, feature data and feedback data.
- the model online estimation service module 300 returns the sample data with features and feedback to the first database for model self-learning, and the model obtained by self-learning can be deployed online to ensure The data and feature engineering schemes used in the model self-learning are consistent with the data and feature engineering schemes used in the model online estimation service respectively, so as to realize the consistency of the model self-learning effect and the model prediction effect.
- the data management module 100, the model solution exploration module 200, and the model online estimation service module 300 constitute one Machine learning closed loop, because the data used in the exploration of the model scheme is the data in the first database, and the first database is an offline database, the data used in the exploration of the model scheme can be understood as offline data, and the model online estimation service
- the data used is online data, and the offline data and online data are all obtained from the specified business scenario by the data service interface. Therefore, it can ensure that the data used in the exploration of the model solution (referred to as offline data) and The data used by the model online estimation service (abbreviated as online data) is of the same origin, realizing the homology of offline and online data.
- FIG. 2 is another apparatus for applying machine learning according to an embodiment of the present disclosure.
- the apparatus for applying machine learning includes the data management module 100, the model solution exploration module 200, and the model online estimation service module 300 shown in FIG. , Also includes the model self-learning module 400 and other components required by machine learning applications, such as offline databases, online databases, and so on.
- the model self-learning module 400 is configured to perform model self-learning based on the sample data with features and feedback in the first database when the second preset condition is met.
- the second preset condition may include at least one of data amount, time, and manual trigger.
- the second preset condition may be that the amount of data in the first database reaches the preset amount of data, or all The time length of data accumulation in the first database reaches the preset time length.
- the setting of the second preset condition may enable the model self-learning module 400 to iteratively update the model.
- the model self-learning module 400 trains based on the sample data with features and feedback through the model algorithm and model hyperparameters in the model solution. , Get the machine learning model.
- the model online prediction service module 300 deploys the model solution online
- the initial model is also deployed online, wherein the initial model is generated during the process of the model solution exploration module 200 exploring the model solution.
- the model self-learning module 400 trains an initial model through the model algorithm and model hyperparameters in the model solution, updates the parameter values of the initial model itself, and obtains a machine learning model.
- the model self-learning module 400 uses the model algorithm and model in the model solution.
- the hyperparameter trains a random model to obtain a machine learning model, where the random model is a model generated based on the model algorithm, and the parameters of the model itself take random values.
- the model online estimation service module 300 can deploy the model obtained by the model self-learning module 400 to provide online model estimation service.
- the model online estimation service module 300 deploys the model obtained by the model self-learning module 400 online, when the request data is received, it is based on the data in the second database and the received request The data generates estimated samples with characteristics, and the estimated results of the estimated samples are obtained by deploying the online model.
- the difference from the model solution is that the deployed online models can obtain the estimated results of the estimated samples.
- the model online estimation service module 300 may send the estimation result to the specified business scenario for use or reference in the business scenario.
- the model online estimation service module 300 may replace the model obtained by the model self-learning module 400 with a machine learning model that has been deployed online; or, deploy the model obtained by the model self-learning module 400 Go online and provide online model estimation services together with the deployed machine learning models.
- the online model estimation service module 300 may replace the model solution obtained by the model solution exploration module 200 with the model solution that has been deployed online; or, replace the model solution obtained by the model solution exploration module 200 Deploy and go online, and do not go offline. The already deployed and online model solutions.
- the data management module 100, the model self-learning module 400, and the model online estimation service module 300 constitute a machine Learning closed loop, because the sample data with features and feedback used by the model self-learning module 400 to train the model is generated online based on the data in the second database (that is, the online database) and the received request data after the model solution is deployed and launched, and the model After the online estimation service module 300 deploys the model trained by the model self-learning module 400 online, it also provides estimation services based on the data in the second database. Therefore, it is ensured that the data and feature engineering schemes used in the model self-learning are different from the model.
- the data used in the online estimation service is consistent with the feature engineering scheme, realizing the consistency of the self-learning effect of the model and the prediction effect of the model.
- the division of each module in the device applying machine learning is only a logical function division, and there may be other division methods in actual implementation, such as the data management module 100, the model solution exploration module 200, and online model estimation.
- At least two of the service module 300 and the model self-learning module 400 can be implemented as one module; the data management module 100, the model solution exploration module 200, the model online estimation service module 300, or the model self-learning module 400 can also be divided into multiple modules.
- Sub-modules It can be understood that each module or sub-module can be implemented by electronic hardware or a combination of computer software and electronic hardware. Whether these functions are executed by hardware or software depends on the specific application and design constraint conditions of the technical solution. Those skilled in the art can use different methods for each specific application to realize the described functions.
- FIG. 3 is an exemplary flow logic block diagram of the apparatus for applying machine learning shown in FIG. 2.
- the user can input the data flow of the specified business scenario through the user interface.
- the user can also input the attribute information of the data table and the spelling scheme through the user interface during the process of the model scheme exploration 303.
- data management 302, model self-learning 305, and model online estimation service 304 constitute a small closed loop;
- data management 302, model solution exploration 303, and model online estimation service 304 constitute a large closed loop.
- the small closed loop guarantees that the data and feature engineering schemes used in the model self-learning 305 and the data and feature engineering schemes used in the model online estimation service 304 respectively, to achieve the consistency of the model self-learning effect and the model prediction effect.
- the large closed loop guarantees that the data used by the model solution exploration 303 (referred to as offline data) and the data used by the model online estimation service 304 (referred to as online data) are of the same origin, realizing the same origin of offline and online data.
- Fig. 4 is an exemplary data flow diagram of the apparatus for applying machine learning shown in Fig. 2.
- the English words in Figure 4 are explained as follows:
- GW is the gateway of the designated business scenario
- the retain-mixer is configured to implement the function of accumulating data in the relevant data stream of the specified business scenario in the data management module 100 into the first database;
- trial1-mixer and trial2-mixer can be understood as two parallel model online estimation service modules 300;
- HDFS is the first database
- rtidb1 and rtidb2 are two second databases
- self-learn1 and self-learn2 are two model self-learning modules 400;
- fedb1 and fedb2 can be understood as feature engineering schemes in the model scheme.
- the retain-mixer obtains the request, impression, action, and BOes from the specified business scenario based on the data service interface, and adds eventTime or ingestionTime to the request, impression, and action respectively, so that the data management module 100 can maintain the data sequence relationship information in the logical relationship information.
- eventTime belongs to the data management function of the data management module 100.
- the retain-mixer accumulates the request in HDFS, which is convenient for subsequent operation and maintenance.
- the retain-mixer adds ingestionTime to impression, action, and BOes to obtain impression’, action’, and BOes’, and accumulate impression’, action’, and BOes’ into HDFS.
- the addition of ingestionTime belongs to the data management function of the data management module 100.
- the retain-mixer processes the request and the impression through the filter operation to obtain the intersection data. For example, impression has 10 data, request has 12 data, request and impression have 5 identical data, then these 5 identical data are obtained through the filter operation, which is the intersection data. , Filter out the different data; then process the intersection data (the 5 same data) through the flatten operation to get flatten_req (sample data).
- the retain-mixer accumulates flatten_req into HDFS.
- AutoML can explore model schemes based on flatten_req, impression’, action’ and BOes’ in HDFS.
- impression', action', and BOes' are accumulated in rtidb1 and rtidb2, and user historical data, such as user behavior data, can be synchronized to rtidb1 and rtidb2 .
- each request data is obtained, and the accumulated data is obtained from rtidb1 and rtidb2 through fedb1 and fedb2 for feature engineering, and enrich1 and enrich2 are obtained.
- trial1-mixer and trial2-mixer In trial1-mixer and trial2-mixer, enrich1 and enrich2 are joined (spliced) and flattened with impression and action, respectively, to obtain viewlog1 and viewlog2.
- trial1-mixer and trial2-mixer accumulate viewlog1 and viewlog2 into HDFS.
- Self-learn1 and self-learn2 perform model self-learning based on viewlog1 and viewlog2, respectively, to obtain a machine learning model.
- trial1-mixer and trial2-mixer deploy the machine learning models obtained from self-learn1 and self-learn2 respectively, and provide online model estimation services.
- the device for applying machine learning disclosed in this embodiment may not rely on importing historical offline data from other databases, and may collect data from scratch.
- FIG. 5 is a schematic structural diagram of an electronic device provided by an embodiment of the present disclosure.
- the electronic device includes: at least one processor 501, at least one memory 502, and at least one communication interface 503.
- the various components in the electronic device are coupled together through the bus system 504.
- the communication interface 503 is configured for information transmission with external devices. Understandably, the bus system 504 is configured to implement connection and communication between these components.
- the bus system 504 also includes a power bus, a control bus, and a status signal bus. However, for the sake of clear description, various buses are marked as the bus system 504 in FIG. 5.
- the memory 502 in this embodiment may be a volatile memory or a non-volatile memory, or may include both volatile and non-volatile memory.
- the memory 502 stores the following elements, executable units or data structures, or a subset of them, or an extended set of them: operating systems and applications.
- the operating system includes various system programs, such as a framework layer, a core library layer, a driver layer, etc., which are configured to implement various basic services and process hardware-based tasks.
- Application programs including various application programs, such as a media player (Media Player), a browser (Browser), etc., are configured to implement various application services.
- a program that implements the method of applying machine learning provided by the embodiments of the present disclosure may be included in an application program.
- the processor 501 calls a program or instruction stored in the memory 502, specifically, a program or instruction stored in an application program, and the processor 501 is configured to execute the application machine learning provided by the embodiment of the present disclosure.
- the steps of the various embodiments of the method are described in detail below.
- the method for applying machine learning may be configured in the processor 501 or implemented by the processor 501.
- the processor 501 may be an integrated circuit chip with signal processing capability. In the implementation process, the steps of the foregoing method can be completed by an integrated logic circuit of hardware in the processor 501 or instructions in the form of software.
- the aforementioned processor 501 may be a general-purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (ASIC), a ready-made programmable gate array (Field Programmable Gate Array, FPGA) or other Programmable logic devices, discrete gates or transistor logic devices, discrete hardware components.
- the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
- the steps of the method for applying machine learning provided by the embodiments of the present disclosure may be directly embodied as being executed and completed by a hardware decoding processor, or executed and completed by a combination of hardware and software units in the decoding processor.
- the software unit may be located in a mature storage medium in the field, such as random access memory, flash memory, read-only memory, programmable read-only memory, or electrically erasable programmable memory, registers.
- the storage medium is located in the memory 502, and the processor 501 reads the information in the memory 502 and completes the steps of the method in combination with its hardware.
- Fig. 6 is an exemplary flowchart of a method for applying machine learning provided by an embodiment of the disclosure.
- the main body of execution of this method is electronic equipment.
- an electronic device is used as an execution subject to illustrate the process of the method of applying machine learning.
- the electronic device may provide a user interface, and based on the user interface, receive user-input information about related data streams of a specified business scenario, where the related data streams include, but are not limited to: request data streams, display Data flow, feedback data flow and business data flow.
- the information about the related data flow of the specified business scenario can be understood as the fields included in the related data.
- the electronic device creates a data service interface based on the information about the related data flow of the specified business scenario, for example, request data flow, display data flow, feedback data flow, and business data flow correspond to different data service interfaces.
- the electronic device may receive data table attribute information input by the user based on the user interface, where the data table attribute information describes the number of columns included in the data table and the data attributes of each column.
- the electronic device may also receive a splicing scheme between data tables input by a user through a user interface, where the splicing scheme includes splicing keys for splicing different data tables, and the quantitative relationship and time sequence of the same splicing keys between the primary and secondary tables. Relationships and aggregation relationships.
- the electronic device may maintain logical relationship information through the first database based on the attribute information of the data table and the spelling scheme; wherein the logical relationship information is information describing the relationship between different data tables ,
- the logical relationship information includes: data table attribute information and the table spelling scheme.
- the electronic device online obtains the relevant data stream of the specified business scenario based on the data service interface.
- the electronic device may obtain the display data stream of the specified business scenario online based on the data service interface, where the data of the display data is the data displayed by the specified business scenario based on the requested data stream.
- the electronic device accumulates the data in the related data stream into a first database.
- the first database is an offline database.
- the electronic device processes the data of the requested data stream to obtain sample data; and then combines the data of the requested data stream, the sample data, the data of the feedback data stream, and the data of the service data stream. Accumulate in the first database.
- the processing method includes, for example, but not limited to: processing using a filter and flattening processing.
- the electronic device uses a filter to filter the data of the requested data stream based on the data of the display data stream to obtain the intersection data; and then process the intersection data by flattening the data to obtain the intersection data. sample.
- the electronic device accumulates the display data and the sample data obtained by the filtering process into the first database.
- the electronic device is based on the data in the first database (for example, the logical relationship information, the data of the requested data stream, the sample data, the feedback data stream).
- the model scheme includes the following scheme sub-items: feature engineering scheme, model algorithm and model hyperparameters.
- the feature engineering scheme is explored based on the logical relationship information. Therefore, the feature engineering scheme has at least a table-joining function. It should be noted that the table-joining scheme of the characteristic engineering scheme can be the same as the table-joining scheme input by the user. Can be different.
- the feature engineering solution may also have other functions, such as extracting features from data for use by model algorithms or models.
- the first preset condition may include at least one of data amount, time, and manual trigger.
- the first preset condition may be that the amount of data in the first database reaches the preset data amount.
- the time length of data accumulation in the first database reaches the preset time length.
- the electronic device generates at least two model solutions when the first preset condition is met.
- at least two model solutions may be generated based on the logical relationship information maintained by the first database, wherein different models There is at least one project sub-item that is different between the schemes; and then the at least two model schemes are used for model training based on the data in the first database; then the at least two model schemes are evaluated based on the machine learning model evaluation index The separately trained models are evaluated; finally, the at least two model solutions are selected based on the evaluation results to obtain the explored model solutions.
- the electronic device deploys the explored model solution online to provide online model estimation service, wherein the model online estimation service is based on the relevant data stream of the specified business scenario obtained online by the data service interface conduct.
- the electronic device only deploys the model solution online, and does not deploy the offline model obtained during the model solution exploration process, which can avoid the offline model directly deployed and online after the online feature calculation and offline feature calculation are obtained.
- the inconsistency of the data leads to the problem of poor estimation effect of the offline model deployed online.
- the model solution is deployed online, and offline models are not deployed online, when the online model estimation service is provided, the estimation result will not be generated.
- the requested data is received, it will be sent to the specified business scenario.
- the specified business scenario ignores the default estimation result after receiving the default estimation result.
- the electronic device when the electronic device deploys the model solution online, it also deploys the offline model obtained in the process of model solution exploration, and the offline model is based on the accumulated in the first database (ie, the offline database).
- the relevant data of the specified business scenario is trained, and after the offline model is deployed and online, the relevant data of the specified business scenario is estimated to serve. Therefore, although the data obtained by online and offline feature calculations may be inconsistent, it is still implemented online The next data is the same source.
- the data of the related data stream is stored in a second database, where the second database is an online database.
- the electronic device uses the data in the second database and the received request data to perform online real-time feature calculation based on the feature engineering solution in the deployed model solution to obtain the feature data of the estimated sample.
- the electronic device deploys the explored model solution online, upon receiving the request data, based on the feature engineering solution in the deployed model solution, it makes a request to the data in the second database and the received request.
- the data is tabled together and online real-time feature calculation is performed to obtain the wide table feature data, and the obtained feature data of the estimated sample is the wide table feature data.
- the electronic device obtains the feature data (or wide table feature data) of the estimated sample based on the model solution deployed online, and splices the feature data and the feedback data to generate sample data with features and feedback.
- the sample data may also Including other data, such as time stamp data, etc.; the feedback data comes from the feedback data stream.
- the display data is derived from the display data stream; The characteristic data and feedback data of the tape display data are described, and sample data of the tape display data, characteristic data and feedback data are generated.
- the electronic device reflows the sample data with characteristics and feedback to the first database, and when a second preset condition is met, based on the characteristics and feedback in the first database Sample data for self-learning of the model.
- the second preset condition may include at least one of data amount, time, and manual trigger.
- the second preset condition may be that the amount of data in the first database reaches the preset amount of data, or all The time length of data accumulation in the first database reaches the preset time length.
- the electronic device will train the model algorithm and model hyperparameters in the model scheme based on the sample data with features and feedback when the second preset condition is met to obtain machine learning Model.
- the electronic device deployment model solution is online
- the initial model is also deployed online, where the initial model is an offline model generated in the process of exploring the model solution, and the electronic device passes the model in the model solution
- the hyperparameters of the algorithm and the model train the initial model, update the parameter values of the initial model itself, and obtain the machine learning model.
- the electronic device trains a random model through the model algorithm in the model solution and the hyperparameters of the model to obtain a machine learning model, where
- the random model is a model generated based on the model algorithm, and the parameters of the model itself are random values.
- the electronic device deploys the machine learning model online to provide online model estimation services.
- the electronic device after the electronic device deploys the machine learning model online, when the request data is received, it generates an estimated sample with characteristics based on the data in the second database and the received request data, and deploys it.
- the online model obtains the estimated result of the estimated sample.
- the difference from the model solution is that the online model can be deployed to obtain the estimated result of the estimated sample.
- the electronic device may send the estimation result to the specified business scenario for use or reference in the business scenario.
- the electronic device replaces the model obtained by the self-learning of the model with the deployed machine learning model; or, the model obtained by the self-learning of the model is deployed online, and is combined with the deployed machine learning model Provide online model estimation service.
- the electronic device replaces the explored model solution with the deployed model solution; or, deploys the explored model solution online, and does not offline the deployed model solution.
- the data used in the exploration of the model scheme is the data in the first database, and the first database is an offline database
- the data used in the exploration of the model scheme can be understood as a line.
- the data used by the model online estimation service is online data, and the offline data and online data are both obtained from the specified business scenario by the data service interface. Therefore, it can be guaranteed that the model solution is used for exploration
- the data (abbreviated as offline data) and the data used by the model online estimation service (abbreviated as online data) are of the same origin, realizing the homology of offline and online data.
- the sample data with features and feedback used for model self-learning is based on the data in the second database (that is, the online database) and the received request data after the model solution is deployed and launched.
- the model is generated online, and the model obtained by the model self-learning module is deployed online, it also provides estimation services based on the data in the second database. Therefore, it is ensured that the data and feature engineering schemes used in the model self-learning are respectively the same as the model online estimation service
- the data used is consistent with the feature engineering scheme, achieving consistency between the self-learning effect of the model and the predictive effect of the model.
- the embodiment of the present disclosure also proposes a computer-readable storage medium that stores a program or instruction that causes a computer to execute the steps of each embodiment of the method for applying machine learning, in order to avoid repetitive description , I won’t repeat it here.
- the embodiments of the present disclosure also provide a computer program product, which includes computer program instructions.
- the computer program instructions When the computer program instructions are run on a computer device, they can execute the method steps of the various embodiments of the present disclosure, for example, when run by a processor, The processor is caused to execute the method steps of the various embodiments of the present disclosure.
- the computer program product may use any combination of one or more programming languages to write program codes for performing the operations of the embodiments of the present disclosure.
- the programming languages include object-oriented programming languages, such as Java, C++, etc. , Also includes conventional procedural programming languages, such as "C" language or similar programming languages.
- the program code can be executed entirely on the user's computing device, partly on the user's device, executed as an independent software package, partly on the user's computing device and partly executed on the remote computing device, or entirely on the remote computing device or server Executed on.
- the business scenario is directly connected, the business scenario-related data is accumulated, and then the model solution is explored, and the model solution and offline model are obtained to ensure that the data used for the exploration of the offline model solution and the online estimation service of the model are used
- the data is of the same origin, realizing the homology of offline and online data.
- the estimated effect of the offline model deployed online is poor.
- the offline model is deployed online.
- the model solution After the model solution is deployed and launched, it can receive the estimation request (that is, the data of the request data stream) to obtain sample data with features and feedback, and then use the sample data with features and feedback for model self-learning, and the self-learning model can be deployed Go online to ensure that the data and feature engineering schemes used in model self-learning are consistent with the data and feature engineering schemes used in model online estimation services, so as to achieve consistency between model self-learning effects and model prediction effects.
- the estimation request that is, the data of the request data stream
- the self-learning model can be deployed Go online to ensure that the data and feature engineering schemes used in model self-learning are consistent with the data and feature engineering schemes used in model online estimation services, so as to achieve consistency between model self-learning effects and model prediction effects.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Artificial Intelligence (AREA)
- Databases & Information Systems (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
一种应用机器学习的方法、装置、电子设备及存储介质,直接对接业务场景,积累业务场景相关数据进而探索模型方案,保证线下模型方案探索用到的数据和模型在线预估服务用到的数据同源,实现线下线上数据的同源性。为避免离线模型直接部署上线后由于线上特征计算和线下特征计算得到的数据存在不一致,导致预估效果较差的问题,只部署模型方案上线,而不部署离线模型上线。模型方案部署上线后接收预估请求可得到带特征和反馈的样本数据,进而可使用样本数据进行模型自学习,自学习得到的模型可部署上线,保证模型自学习用到的数据和特征工程方案分别与模型在线预估服务用到的数据和特征工程方案一致,实现模型自学习效果和模型预估效果一致性。
Description
本公开要求于2020年05月15日提交中国专利局、申请号为202010415370.7、发明名称为“一种应用机器学习的方法、装置、电子设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本公开中。
本公开实施例涉及机器学习技术领域,具体涉及一种应用机器学习的方法、装置、电子设备及存储介质。
机器学习的应用可包括但不限于:问题定义、机器学习模型建立(简称建模)、模型上线服务、反馈信息收集和模型迭代更新等过程。目前,建模是基于离线数据探索模型,进而基于离线评估方式确定模型效果,在模型效果达标(也即达到预设要求)后由IT人员将模型部署上线,进行模型上线服务。
但是离线效果达标的模型上线效果可能达不到要求,本公开发明人发现是因为建模所用的数据与线上的数据存在不一致,建模过程中计算的特征很难保证一致,导致模型效果线上线下差别很大,达不到预期,使得模型上线服务难度较大。
上述对问题的发现过程的描述,仅用于辅助理解本公开的技术方案,并不代表承认上述内容是现有技术。
发明内容
为了解决现有技术存在的至少一个问题,本公开的至少一个实施例提供了一种应用机器学习的方法、装置、电子设备及存储介质。
第一方面,本公开实施例提出一种应用机器学习的方法,所述方法包括:基于数据服务接口在线获取指定业务场景的相关数据流;将所述相关数据流中的数据积累到第一数据库中;当第一预设条件被满足时,基于所述第一数据库中的数据探索模型方案;所述模型方案包括以下方案子项:特征工程方案、模型算法和模型的超参数;将探索得到的模型方案部署上线以提供模型在线预估服务,其中,所述模型在线预估服务基于所述数据服务接口在线获取的所述指定业务场景的相关数据流进行。
第二方面,本公开实施例提出一种应用机器学习的装置,所述装置包括:数据管理模块,被配置为基于数据服务接口在线获取指定业务场景的相关数据流;将所述相关数据流中的数据积累到第一数据库中;模型方案探索模块,被配置为当第一预设条件被满足时,基于所述第一数据库中的数据探索模型方案;所述模型方案包括以下方案子项:特征工程方案、模型算法和模型的超参数;模型在线预 估服务模块,被配置为将所述模型方案探索模块得到的模型方案部署上线以提供模型在线预估服务,其中,所述模型在线预估服务基于所述数据服务接口在线获取的所述指定业务场景的相关数据流进行。
第三方面,本公开实施例提出一种电子设备,包括:处理器和存储器;所述处理器通过调用所述存储器存储的程序或指令,被配置为执行如第一方面所述应用机器学习的方法的步骤。
第四方面,本公开实施例提出一种计算机可读存储介质,被配置为存储程序或指令,所述程序或指令使计算机执行如第一方面所述应用机器学习的方法的步骤。
第五方面,本公开实施例还提供了一种计算机程序产品,包括计算机程序指令,当所述计算机程序指令在计算机装置上运行时,实现如第一方面所述应用机器学习的方法的步骤。
可见,本公开的至少一个实施例中,直接对接业务场景,积累业务场景相关数据进而探索模型方案,得到模型方案及离线模型,保证线下模型方案探索用到的数据和模型在线预估服务用到的数据是同源的,实现线下线上数据的同源性。为避免离线模型直接部署上线后由于线上特征计算得到的数据和线下特征计算得到的数据存在不一致,导致部署上线的离线模型的预估效果较差的问题,只部署模型方案上线,而不部署离线模型上线。模型方案部署上线后接收预估请求(也即请求数据流的数据)可得到带特征和反馈的样本数据,进而可使用带特征和反馈的样本数据进行模型自学习,自学习得到的模型可部署上线,保证模型自学习用到的数据和特征工程方案分别与模型在线预估服务用到的数据和特征工程方案是一致的,实现模型自学习效果和模型预估效果一致性。
为了更清楚地说明本公开实施例的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本公开的一些实施例,对于本领域普通技术人员来讲,还可以根据这些附图获得其他的附图。
图1是本公开实施例提供的一种应用机器学习的装置的示例性架构图;
图2是本公开实施例提供的另一种应用机器学习的装置的示例性架构图;
图3是图2所示的应用机器学习的装置的示例性流程逻辑框图;
图4是图2所示的应用机器学习的装置的示例性数据流向图;
图5是本公开实施例提供的一种电子设备的示例性架构图;
图6是本公开实施例提供的一种应用机器学习的方法的示例性流程图。
为了能够更清楚地理解本公开的上述目的、特征和优点,下面结合附图和实施例对本公开作进一步的详细说明。可以理解的是,所描述的实施例是本公开的一部分实施例,而不是全部的实施例。此 处所描述的具体实施例仅仅用于解释本公开,而非对本公开的限定。基于所描述的本公开的实施例,本领域普通技术人员所获得的所有其他实施例,都属于本公开保护的范围。
需要说明的是,在本文中,诸如“第一”和“第二”等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。
各行各业不同的业务场景具有不同的业务处理逻辑,但是不同的业务场景中大多需要应用机器学习来处理业务数据,可减少人工处理业务数据带来的多种问题,例如耗时、人工成本高、不准确等问题。为此,图1为本公开实施例提供的一种应用机器学习的装置的示例性架构图,其中所述应用机器学习的装置适用于各类不同数据的有监督学习人工智能建模,包括但不仅限于二维结构化数据、图像、NLP(Natural Language Processing,自然语言处理)、语音等。所述应用机器学习的装置可应用于指定业务场景,其中所述指定业务场景中预先定义了业务场景的相关数据流的信息,其中所述相关数据流可包括但不限于:请求数据流、展示数据流、反馈数据流和业务数据流,其中所述展示数据流的数据为所述指定业务场景基于请求数据流展示的数据。以短视频应用程序场景为例,请求数据例如为用户滑动或点击用户终端等刷新短视频的操作后,应用后台会筛选出一个候选视频集合,形成需要做模型预估的请求数据。展示数据就是短视频应用程序实际给用户展示了哪些短视频。反馈数据例如为用户是否点击或者观看短视频应用程序展示的短视频。业务数据例如为用户在观看短视频时的评论数据、点赞数据等与业务逻辑相关的数据。
预先定义的业务场景的相关数据流的信息可以理解为相关数据所包括的字段,例如,所述相关数据流为请求数据流,相应地,预先定义的请求数据流的信息可以理解为所述请求数据流中请求数据所包括的字段,所述字段可以为用户ID、请求内容、请求时间、候选物料ID等。
在指定业务场景后,即可通过图1所示的应用机器学习的装置提供模型在线预估服务。如图1所示,应用机器学习的装置可包括但不限于:数据管理模块100、模型方案探索模块200、模型在线预估服务模块300以及其他应用机器学习所需的组件,例如离线数据库、在线数据库等。
数据管理模块100,被配置为存储和管理来源于指定业务场景的数据以及所述模型在线预估服务模块300产出的数据。其中所述来源于指定业务场景的数据为所述数据管理模块100基于数据服务接口直接对接所述指定业务场景而在线获取的相关数据流。所述数据服务接口为应用编程接口(API,Application Programming Interface)。在一些实施例中,所述数据服务接口为所述数据管理模块100基于预先定义的所述指定业务场景的相关数据流的信息而创建的。在一些实施例中,所述数据管理模块100可以提供用户界面,并基于所述用户界面接收用户输入的关于所述指定业务场景的相关数据流的信息,本实施例中,所述用户可以为所述指定业务场景的运维工程师。所述数据管理模块100可基于所述用户输入的关于所述指定业务场景的相关数据流的信息创建数据服务接口。在一些实施例中,所 述数据服务接口与所述相关数据流一对一,例如请求数据流、展示数据流、反馈数据流和业务数据流分别对应不同的数据服务接口。
在一些实施例中,数据管理模块100可将所述指定业务场景的相关数据流中的数据积累到第一数据库中,其中所述第一数据库为离线数据库,例如,所述离线数据库可以为分布式文件存储系统(HDFS,Hadoop Distributed File System),还可以为其他离线数据库。在一些实施例中,所述数据管理模块100可处理请求数据流的数据得到样本数据,其中所述处理的方式例如包括但不限于:使用过滤器(filter)进行处理和压平(flatten)处理。所述数据管理模块100可将请求数据流的数据、所述样本数据、反馈数据流的数据和业务数据流的数据积累到所述第一数据库中。在一些实施例中,所述数据管理模块100基于数据服务接口在线获取所述指定业务场景的展示数据流的数据后,可使用过滤器(filter)基于所述展示数据流的数据对所述请求数据流的数据进行过滤,得到交集数据。例如,展示数据流有10条数据,请求数据流有12条数据,展示数据流和请求数据流有5条相同数据,那么通过filter过滤,得到这5条相同数据即为交集数据,把不同数据滤除掉。所述数据管理模块100可通过压平(flatten)处理交集数据(这5条相同数据)得到样本数据。所述数据管理模块100可将展示数据流的数据和过滤处理得到的样本数据积累到第一数据库中。
在一些实施例中,数据管理模块100可通过用户界面接收用户输入的数据表属性信息,其中所述数据表属性信息描述了数据表包括的列数及每列的数据属性,所述数据属性例如用户ID为离散类字段、请求时间为时间字段、浏览时长数值类字段等。所述数据管理模块100可通过用户界面接收用户输入的数据表之间的拼接方案,其中所述拼表方案包括拼接不同数据表的拼接键,以及主副表之间同拼接键的数量关系、时序关系和聚合关系。在一些实施例中,所述数据管理模块100可基于所述数据表属性信息和所述拼表方案,通过所述第一数据库维护逻辑关系信息;其中所述逻辑关系信息为描述不同数据表之间关系的信息,所述逻辑关系信息包括:所述数据表属性信息和所述拼表方案。
模型方案探索模块200,被配置为当第一预设条件被满足时,基于所述第一数据库中的数据(例如所述逻辑关系信息、所述请求数据流的数据、所述样本数据、反馈数据流的数据和业务数据流的数据、所述展示数据流的数据中一个或多个)探索模型方案。其中所述第一预设条件可以包括数据量、时间和人工触发中的至少一个,例如,第一预设条件可以为所述第一数据库中的数据量达到预设数据量,也可以为所述第一数据库中数据积累的时长达到预设时长。所述第一预设条件的设置可以使所述模型方案探索模块200迭代更新模型方案。所述模型方案包括以下方案子项:特征工程方案、模型算法和模型的超参数。所述特征工程方案基于所述逻辑关系信息探索得到,因此,所述特征工程方案至少具有拼表功能,需要说明的是,特征工程方案的拼表方式与用户输入的拼表方案可以相同,也可以不同。所述特征工程方案还可以具有其他功能,例如从数据中提取特征以供模型算法或模型使用。所述模型算法可以为目前常用的机器学习算法,例如有监督学习算法,包括但不限于:LR(Logistic Regression,逻辑回归)、GBDT(Gradient Boosting Decision Tree,梯度提升迭代决策树)、DeepNN(Deep Neural Network,深度神经网络)等。所述模型的超参数是在机器学习之前预先设置的被配置为辅助模型训练的参数,例如聚类算法中的类别个数、梯度下降法的步长、神经网络的层数、训练神经网络的学习速率等。
在一些实施例中,模型方案探索模块200在探索模型方案时,可生成至少两个模型方案,例如,可基于所述第一数据库维护的逻辑关系信息生成至少两个模型方案。其中,不同模型方案之间至少有一个方案子项不同。在一些实施例中,所述模型方案探索模块200基于所述第一数据库中的数据分别采用所述至少两个模型方案进行模型训练,可得到模型本身的参数,其中所述模型本身的参数例如:神经网络中的权重、支持向量机中的支持向量、线性回归或逻辑回归中的系数等。在一些实施例中,所述模型方案探索模块200可基于机器学习模型评价指标,对所述至少两个模型方案所分别训练出的模型进行评价,进而基于评价结果从所述至少两个模型方案中进行选择,得到探索到的模型方案。其中所述机器学习模型评价指标例如为AUC(Area Under Curve)值等。
模型在线预估服务模块300,被配置为将模型方案探索模块200探索得到的模型方案部署上线以提供模型在线预估服务,其中,所述模型在线预估服务基于所述数据服务接口在线获取的所述指定业务场景的相关数据流进行。在一些实施例中,所述模型在线预估服务模块300仅将模型方案部署上线,而没有将模型方案探索模块200探索过程中得到的离线模型部署上线,可避免离线模型直接部署上线后由于线上特征计算和线下特征计算得到的数据存在不一致,导致部署上线的离线模型的预估效果较差的问题。另外,由于所述模型在线预估服务模块300仅将模型方案部署上线,没有将离线模型部署上线,因此在提供模型在线预估服务时,并不会生成预估结果,当接收到请求数据时,向所述指定业务场景发送的是默认的预估结果,所述指定业务场景接收到默认的预估结果后不予理会,因此,图1中模型方案探索模块200以虚箭头指向模型在线预估服务模块300,表示模型方案不会提供在线预估服务,但仍会反馈默认的预估结果。在一些实施例中,所述模型在线预估服务模块300将模型方案部署上线时,还将模型方案探索模块200探索过程中得到的离线模型部署上线,所述离线模型是基于所述第一数据库(即离线数据库)中积累的所述指定业务场景的相关数据训练得到,并且离线模型部署上线后是对所述指定业务场景的相关数据进行预估服务,因此,虽然线上线下特征计算得到的数据可能不一致,但仍实现了线上线下数据同源。
在一些实施例中,模型在线预估服务模块300将模型方案部署上线后,可将所述数据服务接口获取的所述指定业务场景的相关数据流存储到第二数据库中,其中所述第二数据库为在线数据库,例如实时特征存储引擎(rtidb),rtidb是面向AI硬实时场景的分布式特征数据库,具备高效计算、读写分离、高并发、高性能查询等特性;所述第二数据库也可以为其他在线数据库。所述模型在线预估服务模块300接收到请求数据时,基于部署上线的模型方案中的特征工程方案,利用所述第二数据库中的 数据和接收的请求数据进行线上实时特征计算,得到预估样本的特征数据。在一些实施例中,所述模型在线预估服务模块300接收到请求数据时,基于部署上线的模型方案中的特征工程方案,对所述第二数据库中的数据和接收的请求数据进行拼表和线上实时特征计算得到宽表特征数据,得到的预估样本的特征数据为宽表特征数据。
在一些实施例中,模型在线预估服务模块300可基于部署上线的模型方案得到预估样本的特征数据(或宽表特征数据),拼接特征数据和反馈数据生成带特征和反馈的样本数据,所述样本数据还可包括其他数据,例如时间戳数据等;所述反馈数据来源于反馈数据流。在一些实施例中,所述模型在线预估服务模块300拼接所述特征数据和反馈数据之前,拼接所述特征数据和展示数据,得到带展示数据的特征数据,所述展示数据来源于展示数据流;进而拼接所述带展示数据的特征数据和反馈数据,生成带展示数据、特征数据和反馈数据的样本数据。在一些实施例中,所述模型在线预估服务模块300将所述带特征和反馈的样本数据回流到所述第一数据库中,以便进行模型自学习,自学习得到的模型可部署上线,保证模型自学习用到的数据和特征工程方案分别与模型在线预估服务用到的数据和特征工程方案是一致的,实现模型自学习效果和模型预估效果一致性。
基于以上关于数据管理模块100、模型方案探索模块200和模型在线预估服务模块300的描述,结合图1可见,数据管理模块100、模型方案探索模块200和模型在线预估服务模块300构成了一个机器学习闭环,由于模型方案探索用到的数据为第一数据库中的数据,而第一数据库为离线数据库,因此,模型方案探索用到的数据可以理解为线下数据,而模型在线预估服务用到的数据为线上数据,且所述线下数据和线上数据都是由数据服务接口从指定业务场景中获取,因此,可保证模型方案探索用到的数据(简称线下数据)和模型在线预估服务用到的数据(简称线上数据)是同源的,实现线下线上数据的同源性。
图2为本公开实施例提供的另一种应用机器学习的装置,所述应用机器学习的装置除了包括图1所示的数据管理模块100、模型方案探索模块200和模型在线预估服务模块300,还包括模型自学习模块400以及其他应用机器学习所需的组件,例如离线数据库、在线数据库等。
其中所述模型自学习模块400,被配置为当第二预设条件被满足时,基于所述第一数据库中的带特征和反馈的样本数据进行模型自学习。其中所述第二预设条件可以包括数据量、时间和人工触发中的至少一个,例如,第二预设条件可以为所述第一数据库中的数据量达到预设数据量,也可以为所述第一数据库中数据积累的时长达到预设时长。所述第二预设条件的设置可以使所述模型自学习模块400迭代更新模型。
在一些实施例中,所述模型自学习模块400当第二预设条件被满足时,基于所述带特征和反馈的样本数据,通过所述模型方案中的模型算法和模型的超参数进行训练,得到机器学习模型。在一些实施例中,若所述模型在线预估服务模块300部署模型方案上线时,还将初始模型部署上线,其中所述 初始模型为所述模型方案探索模块200探索模型方案的过程中产生的离线模型,则所述模型自学习模块400通过所述模型方案中的模型算法和模型的超参数训练初始模型,更新所述初始模型本身的参数取值,得到机器学习模型。在一些实施例中,若所述模型在线预估服务模块300将模型方案部署上线时,没有将初始模型部署上线,则所述模型自学习模块400通过所述模型方案中的模型算法和模型的超参数训练随机模型,得到机器学习模型,其中所述随机模型为基于所述模型算法生成的模型,且所述模型本身的参数取值为随机值。
所述模型在线预估服务模块300可将所述模型自学习模块400得到的模型部署上线以提供模型在线预估服务。在一些实施例中,所述模型在线预估服务模块300将所述模型自学习模块400得到的模型部署上线后,当接收到请求数据时,基于所述第二数据库中的数据和接收的请求数据生成带特征的预估样本,并通过部署上线的模型得到所述预估样本的预估结果,与模型方案的不同在于:部署上线的模型可得到所述预估样本的预估结果。所述模型在线预估服务模块300可向所述指定业务场景发送所述预估结果,以供业务场景使用或参考。
在一些实施例中,所述模型在线预估服务模块300可将所述模型自学习模块400得到的模型替换已部署上线的机器学习模型;或,将所述模型自学习模块400得到的模型部署上线,并与已部署上线的机器学习模型共同提供模型在线预估服务。在一些实施例中,所述模型在线预估服务模块300可将所述模型方案探索模块200得到的模型方案替换已部署上线的模型方案;或,将所述模型方案探索模块200得到的模型方案部署上线,且不下线已部署上线的模型方案。
基于以上关于所述模型自学习模块400和所述模型在线预估服务模块300的描述,结合图2可见,数据管理模块100、模型自学习模块400和模型在线预估服务模块300构成了一个机器学习闭环,由于模型自学习模块400训练模型所用的带特征和反馈的样本数据是模型方案部署上线后基于第二数据库(也即在线数据库)中的数据和接收的请求数据在线生成的,并且模型在线预估服务模块300将模型自学习模块400训练得到的模型部署上线后,也是基于第二数据库中的数据提供预估服务,因此,保证模型自学习用到的数据和特征工程方案分别与模型在线预估服务用到的数据和特征工程方案是一致的,实现模型自学习效果和模型预估效果一致性。
在一些实施例中,应用机器学习的装置中各模块的划分仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如数据管理模块100、模型方案探索模块200、模型在线预估服务模块300和模型自学习模块400中的至少两个模块可以实现为一个模块;数据管理模块100、模型方案探索模块200、模型在线预估服务模块300或模型自学习模块400也可以划分为多个子模块。可以理解的是,各个模块或子模块能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。本领域技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能。
图3是图2所示的应用机器学习的装置的示例性流程逻辑框图,如图3所示,指定业务场景的定义301中,用户可通过用户界面输入所述指定业务场景的相关数据流的信息,用户也可以在模型方案探索303进行的过程中通过用户界面输入数据表属性信息和拼表方案。图3中,数据管理302、模型自学习305、模型在线预估服务304构成小闭环;数据管理302、模型方案探索303、模型在线预估服务304构成大闭环。其中,小闭环保证模型自学习305用到的数据和特征工程方案分别与模型在线预估服务304用到的数据和特征工程方案,实现模型自学习效果和模型预估效果一致性。大闭环保证模型方案探索303用到的数据(简称线下数据)和模型在线预估服务304用到的数据(简称线上数据)是同源的,实现了线下线上的数据同源。
图4为图2所示的应用机器学习的装置的示例性数据流向图。图4中各英文单词说明如下:
GW为指定业务场景的网关;
retain-mixer被配置为实现数据管理模块100中将所述指定业务场景的相关数据流中的数据积累到第一数据库中的功能;
trial1-mixer和trial2-mixer可以理解为两个并行的模型在线预估服务模块300;
HDFS为第一数据库;
rtidb1和rtidb2为两个第二数据库;
AutoML为模型方案探索模块200;
self-learn1和self-learn2为两个模型自学习模块400;
request为请求数据;impression为展示数据;action为反馈数据;BOes为业务数据;enrich1和enrich2为宽表特征数据;viewlog1和viewlog2为带反馈的宽表特征数据;
fedb1和fedb2可以理解为模型方案中的特征工程方案。
基于以上英文单词说明,应用机器学习的装置的数据流向描述如下:
retain-mixer基于数据服务接口从指定业务场景中获取request、impression、action及BOes,并且对request、impression、action分别增加eventTime或者ingestionTime,便于数据管理模块100维护逻辑关系信息中的数据时序关系信息。其中增加eventTime属于数据管理模块100的数据管理功能。
retain-mixer将request积累到HDFS中,便于后续运维使用。retain-mixer对impression、action及BOes分别增加ingestionTime,得到impression’、action’及BOes’,并将impression’、action’及BOes’积累到HDFS中。其中增加ingestionTime属于数据管理模块100的数据管理功能。
retain-mixer通过filter操作处理request和impression,得到交集数据,例如impression有10条数据,request有12条数据,request和impression有5条相同数据,那么通过filter操作得到这5条相同数据即交集数据,把不同数据滤除掉;进而通过flatten操作处理交集数据(这5条相同数据),得到flatten_req(样本数据)。retain-mixer将flatten_req积累到HDFS中。
AutoML可基于HDFS中的flatten_req、impression’、action’及BOes’进行模型方案探索。
trial1-mixer和trial2-mixer分别将不同的模型方案部署上线后,impression’、action’及BOes’被积累到rtidb1和rtidb2中,用户的历史数据,例如用户行为数据可被同步到rtidb1和rtidb2中。
trial1-mixer和trial2-mixer分别将不同的模型方案部署上线后,每获取一条请求数据,通过fedb1和fedb2从rtidb1和rtidb2中获取积累的数据进行特征工程,得到enrich1和enrich2。
trial1-mixer和trial2-mixer将enrich1和enrich2分别与impression和action进行join(拼接)和flatten操作,得到viewlog1和viewlog2。trial1-mixer和trial2-mixer将viewlog1和viewlog2积累到HDFS中。
self-learn1和self-learn2分别基于viewlog1和viewlog2进行模型自学习,得到机器学习模型。trial1-mixer和trial2-mixer分别将self-learn1和self-learn2得到的机器学习模型部署上线,提供模型在线预估服务。
由图4可见,retain-mixer与trial1-mixer、trial2-mixer的数据来源一致,并且将数据积累到HDFS中,保证AutoML用到的数据和模型方案部署上线后用到的数据是同源的,实现线上线下数据同源。另外self-learn1和self-learn2用到的数据和特征工程方案分别与模型部署上线后用到的数据和特征工程方案是一致的,实现模型自学习效果和模型预估效果一致性。
可见,本实施例公开的应用机器学习的装置可以不依赖从其他数据库导入历史离线数据,可以从零开始收集数据。
图5是本公开实施例提供的一种电子设备的结构示意图。如图5所示,电子设备包括:至少一个处理器501、至少一个存储器502和至少一个通信接口503。电子设备中的各个组件通过总线系统504耦合在一起。通信接口503,被配置为与外部设备之间的信息传输。可理解地,总线系统504被配置为实现这些组件之间的连接通信。总线系统504除包括数据总线之外,还包括电源总线、控制总线和状态信号总线。但为了清楚说明起见,在图5中将各种总线都标为总线系统504。
可以理解,本实施例中的存储器502可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。
在一些实施方式中,存储器502存储了如下的元素,可执行单元或者数据结构,或者他们的子集,或者他们的扩展集:操作系统和应用程序。
其中,操作系统,包含各种系统程序,例如框架层、核心库层、驱动层等,被配置为实现各种基础业务以及处理基于硬件的任务。应用程序,包含各种应用程序,例如媒体播放器(Media Player)、浏览器(Browser)等,被配置为实现各种应用业务。实现本公开实施例提供的应用机器学习的方法的程序可以包含在应用程序中。
在本公开实施例中,处理器501通过调用存储器502存储的程序或指令,具体的,可以是应用程序中存储的程序或指令,处理器501被配置为执行本公开实施例提供的应用机器学习的方法各实施例 的步骤。
本公开实施例提供的应用机器学习的方法可以应被配置为处理器501中,或者由处理器501实现。处理器501可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过处理器501中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器501可以是通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
本公开实施例提供的应用机器学习的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件单元组合执行完成。软件单元可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器502,处理器501读取存储器502中的信息,结合其硬件完成方法的步骤。
图6为本公开实施例提供的一种应用机器学习的方法的示例性流程图。该方法的执行主体为电子设备。为便于描述,以下实施例中以电子设备为执行主体说明应用机器学习的方法的流程。
在一些实施例中,电子设备可提供用户界面,基于所述用户界面接收用户输入的关于指定业务场景的相关数据流的信息,其中,所述相关数据流包括但不限于:请求数据流、展示数据流、反馈数据流和业务数据流。所述关于所述指定业务场景的相关数据流的信息可以理解为相关数据所包括的字段。进而电子设备基于所述关于所述指定业务场景的相关数据流的信息创建数据服务接口,例如请求数据流、展示数据流、反馈数据流和业务数据流分别对应不同的数据服务接口。
在一些实施例中,电子设备可基于所述用户界面接收用户输入的数据表属性信息,其中所述数据表属性信息描述了数据表包括的列数及每列的数据属性。所述电子设备还可通过用户界面接收用户输入的数据表之间的拼接方案,其中所述拼表方案包括拼接不同数据表的拼接键,以及主副表之间同拼接键的数量关系、时序关系和聚合关系。在一些实施例中,电子设备可基于所述数据表属性信息和所述拼表方案,通过所述第一数据库维护逻辑关系信息;其中所述逻辑关系信息为描述不同数据表之间关系的信息,所述逻辑关系信息包括:数据表属性信息和所述拼表方案。
在步骤601中,电子设备基于数据服务接口在线获取指定业务场景的相关数据流。例如,电子设备可基于数据服务接口在线获取所述指定业务场景的展示数据流,其中所述展示数据的数据为所述指定业务场景基于请求数据流展示的数据。
在步骤602中,电子设备将所述相关数据流中的数据积累到第一数据库中。其中所述第一数据库为离线数据库。在一些实施例中,电子设备处理所述请求数据流的数据得到样本数据;进而将所述请求数据流的数据、所述样本数据、所述反馈数据流的数据和所述业务数据流的数据积累到所述第一数 据库中。其中所述处理的方式例如包括但不限于:使用过滤器(filter)进行处理和压平(flatten)处理。在一些实施例中,电子设备使用过滤器(filter)基于所述展示数据流的数据对所述请求数据流的数据进行过滤,得到交集数据;进而通过压平(flatten)处理所述交集数据得到样本数据。所述电子设备将所述展示数据和过滤处理得到的样本数据积累到所述第一数据库中。
在步骤603中,电子设备当第一预设条件被满足时,基于所述第一数据库中的数据(例如所述逻辑关系信息、所述请求数据流的数据、所述样本数据、反馈数据流的数据和业务数据流的数据、所述展示数据流的数据中一个或多个)探索模型方案;所述模型方案包括以下方案子项:特征工程方案、模型算法和模型的超参数。所述特征工程方案基于所述逻辑关系信息探索得到,因此,所述特征工程方案至少具有拼表功能,需要说明的是,特征工程方案的拼表方式与用户输入的拼表方案可以相同,也可以不同。所述特征工程方案还可以具有其他功能,例如从数据中提取特征以供模型算法或模型使用。在一些实施例中,所述第一预设条件可以包括数据量、时间和人工触发中的至少一个,例如,第一预设条件可以为所述第一数据库中的数据量达到预设数据量,也可以为所述第一数据库中数据积累的时长达到预设时长。
在一些实施例中,电子设备在第一预设条件被满足时,生成至少两个模型方案,例如,可基于所述第一数据库维护的逻辑关系信息生成至少两个模型方案,其中,不同模型方案之间至少有一个方案子项不同;进而基于所述第一数据库中的数据分别采用所述至少两个模型方案进行模型训练;然后基于机器学习模型评价指标,对所述至少两个模型方案所分别训练出的模型进行评价;最后基于评价结果从所述至少两个模型方案中进行选择,得到探索到的模型方案。
在步骤604中,电子设备将探索得到的模型方案部署上线以提供模型在线预估服务,其中,所述模型在线预估服务基于所述数据服务接口在线获取的所述指定业务场景的相关数据流进行。在一些实施例中,电子设备仅将模型方案部署上线,而没有将模型方案探索过程中得到的离线模型部署上线,可避免离线模型直接部署上线后由于线上特征计算和线下特征计算得到的数据存在不一致,导致部署上线的离线模型的预估效果较差的问题。另外,由于仅将模型方案部署上线,没有将离线模型部署上线,因此在提供模型在线预估服务时,并不会生成预估结果,当接收到请求数据时,向所述指定业务场景发送的是默认的预估结果,所述指定业务场景接收到默认的预估结果后不予理会。在一些实施例中,电子设备将模型方案部署上线时,还将模型方案探索过程中得到的离线模型部署上线,所述离线模型是基于所述第一数据库(即离线数据库)中积累的所述指定业务场景的相关数据训练得到,并且离线模型部署上线后是对所述指定业务场景的相关数据进行预估服务,因此,虽然线上线下特征计算得到的数据可能不一致,但仍实现了线上线下数据同源。
在一些实施例中,电子设备将探索得到的模型方案部署上线后,将所述相关数据流的数据存储到第二数据库中,其中所述第二数据库为在线数据库。电子设备在接收到请求数据时,基于部署上线的 模型方案中的特征工程方案,利用所述第二数据库中的数据和接收的请求数据进行线上实时特征计算,得到预估样本的特征数据。在一些实施例中,电子设备将探索得到的模型方案部署上线后,在接收到请求数据时,基于部署上线的模型方案中的特征工程方案,对所述第二数据库中的数据和接收的请求数据进行拼表和线上实时特征计算得到宽表特征数据,得到的预估样本的特征数据为宽表特征数据。
在一些实施例中,电子设备基于部署上线的模型方案得到预估样本的特征数据(或宽表特征数据),拼接特征数据和反馈数据生成带特征和反馈的样本数据,所述样本数据还可包括其他数据,例如时间戳数据等;所述反馈数据来源于反馈数据流。在一些实施例中,所述电子设备拼接所述特征数据和反馈数据之前,拼接所述特征数据和展示数据,得到带展示数据的特征数据,所述展示数据来源于展示数据流;进而拼接所述带展示数据的特征数据和反馈数据,生成带展示数据、特征数据和反馈数据的样本数据。
在一些实施例中,电子设备将所述带特征和反馈的样本数据回流到所述第一数据库中,当第二预设条件被满足时,基于所述第一数据库中的带特征和反馈的样本数据进行模型自学习。其中所述第二预设条件可以包括数据量、时间和人工触发中的至少一个,例如,第二预设条件可以为所述第一数据库中的数据量达到预设数据量,也可以为所述第一数据库中数据积累的时长达到预设时长。
在一些实施例中,电子设备将在第二预设条件被满足时,基于所述带特征和反馈的样本数据,通过所述模型方案中的模型算法和模型的超参数进行训练,得到机器学习模型。在一些实施例中,若电子设备部署模型方案上线时,还将初始模型部署上线,其中所述初始模型为探索模型方案的过程中产生的离线模型,则电子设备通过所述模型方案中的模型算法和模型的超参数训练初始模型,更新所述初始模型本身的参数取值,得到机器学习模型。在一些实施例中,若电子设备将模型方案部署上线时,没有将初始模型部署上线,则电子设备通过所述模型方案中的模型算法和模型的超参数训练随机模型,得到机器学习模型,其中所述随机模型为基于所述模型算法生成的模型,且所述模型本身的参数取值为随机值。
在一些实施例中,电子设备将所述机器学习模型部署上线以提供模型在线预估服务。在一些实施例中,电子设备将所述机器学习模型部署上线后,当接收到请求数据时,基于所述第二数据库中的数据和接收的请求数据生成带特征的预估样本,并通过部署上线的模型得到所述预估样本的预估结果,与模型方案的不同在于:部署上线的模型可得到所述预估样本的预估结果。电子设备可向所述指定业务场景发送所述预估结果,以供业务场景使用或参考。
在一些实施例中,电子设备将所述模型自学习得到的模型替换已部署上线的机器学习模型;或,将所述模型自学习得到的模型部署上线,并与已部署上线的机器学习模型共同提供模型在线预估服务。在一些实施例中,电子设备将所述探索得到的模型方案替换已部署上线的模型方案;或,将所述 探索得到的模型方案部署上线,且不下线已部署上线的模型方案。
可见,以上实施例公开的应用机器学习的方法,由于模型方案探索用到的数据为第一数据库中的数据,而第一数据库为离线数据库,因此,模型方案探索用到的数据可以理解为线下数据,而模型在线预估服务用到的数据为线上数据,且所述线下数据和线上数据都是由数据服务接口从指定业务场景中获取,因此,可保证模型方案探索用到的数据(简称线下数据)和模型在线预估服务用到的数据(简称线上数据)是同源的,实现线下线上数据的同源性。
另外,以上实施例公开的应用机器学习的方法,由于模型自学习所用的带特征和反馈的样本数据是模型方案部署上线后基于第二数据库(也即在线数据库)中的数据和接收的请求数据在线生成的,并且模型自学习模块得到的模型部署上线后,也是基于第二数据库中的数据提供预估服务,因此,保证模型自学习用到的数据和特征工程方案分别与模型在线预估服务用到的数据和特征工程方案是一致的,实现模型自学习效果和模型预估效果一致性。
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员能够理解,本公开实施例并不受所描述的动作顺序的限制,因为依据本公开实施例,某些步骤可以采用其他顺序或者同时进行。另外,本领域技术人员能够理解,说明书中所描述的实施例均属于可选实施例。
本公开实施例还提出一种计算机可读存储介质,所述计算机可读存储介质存储程序或指令,所述程序或指令使计算机执行如应用机器学习的方法各实施例的步骤,为避免重复描述,在此不再赘述。
本公开实施例还提供一种计算机程序产品,其包括计算机程序指令,计算机程序指令在计算机装置上运行时,能够执行本公开各种实施例的方法步骤,例如可以是在被处理器运行时,使得处理器执行本公开各种实施例的方法步骤。
所述计算机程序产品可以以一种或多种程序设计语言的任意组合来编写用于执行本公开实施例操作的程序代码,所述程序设计语言包括面向对象的程序设计语言,诸如Java、C++等,还包括常规的过程式程序设计语言,诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算设备上执行、部分地在用户设备上执行、作为一个独立的软件包执行、部分在用户计算设备上部分在远程计算设备上执行、或者完全在远程计算设备或服务器上执行。
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下,由语句“包括……”限定的要素,并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。
本领域的技术人员能够理解,尽管在此所述的一些实施例包括其它实施例中所包括的某些特征而 不是其它特征,但是不同实施例的特征的组合意味着处于本公开的范围之内并且形成不同的实施例。
本领域的技术人员能够理解,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。
虽然结合附图描述了本公开的实施方式,但是本领域技术人员可以在不脱离本公开的精神和范围的情况下做出各种修改和变型,这样的修改和变型均落入由所附权利要求所限定的范围之内。
本公开的至少一个实施例中,直接对接业务场景,积累业务场景相关数据进而探索模型方案,得到模型方案及离线模型,保证线下模型方案探索用到的数据和模型在线预估服务用到的数据是同源的,实现线下线上数据的同源性。为避免离线模型直接部署上线后由于线上特征计算得到的数据和线下特征计算得到的数据存在不一致,导致部署上线的离线模型的预估效果较差的问题,只部署模型方案上线,而不部署离线模型上线。模型方案部署上线后接收预估请求(也即请求数据流的数据)可得到带特征和反馈的样本数据,进而可使用带特征和反馈的样本数据进行模型自学习,自学习得到的模型可部署上线,保证模型自学习用到的数据和特征工程方案分别与模型在线预估服务用到的数据和特征工程方案是一致的,实现模型自学习效果和模型预估效果一致性。
Claims (43)
- 一种应用机器学习的方法,所述方法包括:基于数据服务接口在线获取指定业务场景的相关数据流;将所述相关数据流中的数据积累到第一数据库中;当第一预设条件被满足时,基于所述第一数据库中的数据探索模型方案;将探索得到的模型方案部署上线以提供模型在线预估服务,其中,所述模型在线预估服务基于所述数据服务接口在线获取的所述指定业务场景的相关数据流进行。
- 根据权利要求1所述的方法,其中,所述模型方案包括以下方案子项:特征工程方案、模型算法和模型的超参数。
- 根据权利要求1所述的方法,其中,在所述基于数据服务接口在线获取指定业务场景的相关数据流的步骤之前,该方法还包括:提供用户界面,基于所述用户界面接收用户输入的关于所述指定业务场景的相关数据流的信息;基于所述关于所述指定业务场景的相关数据流的信息创建所述数据服务接口。
- 根据权利要求1所述的方法,其中,所述相关数据流包括:请求数据流、反馈数据流和业务数据流。
- 根据权利要求4所述的方法,其中,所述将所述相关数据流中的数据积累到第一数据库中包括:处理所述请求数据流的数据得到样本数据;将所述请求数据流的数据、所述样本数据、所述反馈数据流的数据和所述业务数据流的数据积累到所述第一数据库中。
- 根据权利要求5所述的方法,其中,所述相关数据流还包括展示数据流;其中所述展示数据流的数据为所述指定业务场景基于所述请求数据流展示的数据;相应地,处理所述请求数据流的数据得到样本数据包括:基于所述展示数据流的数据对所述请求数据流的数据进行过滤,得到交集数据;处理所述交集数据得到样本数据;相应地,将所述展示数据流的数据和所述样本数据积累到所述第一数据库中。
- 根据权利要求1至6任一项所述的方法,其中,所述基于所述第一数据库中的数据探索模型方案包括:生成至少两个模型方案,其中,不同模型方案之间至少有一个方案子项不同;基于所述第一数据库中的数据分别采用所述至少两个模型方案进行模型训练;基于机器学习模型评价指标,对所述至少两个模型方案所分别训练出的模型进行评价;基于评价结果从所述至少两个模型方案中进行选择,得到探索到的模型方案。
- 根据权利要求7所述的方法,其中,所述方法还包括:基于用户界面接收用户输入的数据表属性信息和拼表方案;基于所述数据表属性信息和所述拼表方案,通过所述第一数据库维护逻辑关系信息;所述逻辑关系信息为描述不同数据表之间关系的信息;相应地,所述生成至少两个模型方案包括:基于所述逻辑关系信息生成至少两个模型方案。
- 根据权利要求8所述的方法,其中,所述拼表方案包括拼接不同数据表的拼接键、时序关系和聚合关系;所述逻辑关系信息包括:数据表属性信息和所述拼表方案。
- 根据权利要求1至6任一项所述的方法,其中,所述将探索得到的模型方案部署上线后,所述方法还包括:将所述相关数据流的数据存储到第二数据库中;所述第二数据库支持线上实时特征计算;当接收到请求数据时,基于部署上线的模型方案中的特征工程方案,利用所述第二数据库中的数据和接收的请求数据进行线上实时特征计算,得到预估样本的特征数据。
- 根据权利要求10所述的方法,其中,利用所述第二数据库中的数据和接收的请求数据进行线上实时特征计算包括:基于部署上线的模型方案中的特征工程方案,对所述第二数据库中的数据和接收的请求数据进行拼表和线上实时特征计算得到宽表特征数据;相应地,所述预估样本的特征数据为宽表特征数据。
- 根据权利要求10所述的方法,其中,所述方法还包括:所述将探索得到的模型方案部署上线后,当接收到请求数据且没有将模型部署上线时,向所述指定业务场景发送默认的预估结果。
- 根据权利要求10所述的方法,其中,所述将探索得到的模型方案部署上线以提供模型在线预估服务包括:基于部署上线的模型方案得到预估样本的特征数据;拼接所述特征数据和反馈数据,生成带特征和反馈的样本数据;所述反馈数据来源于反馈数据流;将所述带特征和反馈的样本数据回流到所述第一数据库中;当第二预设条件被满足时,基于所述第一数据库中的带特征和反馈的样本数据进行模型自学习;将所述模型自学习得到的模型部署上线以提供模型在线预估服务。
- 根据权利要求13所述的方法,其中,所述拼接所述特征数据和反馈数据之前,所述方法还 包括:拼接所述特征数据和展示数据,得到带展示数据的特征数据;所述展示数据来源于展示数据流;相应地,拼接所述带展示数据的特征数据和反馈数据,生成带展示数据、特征数据和反馈数据的样本数据。
- 根据权利要求13所述的方法,其中,所述基于所述第一数据库中的带特征和反馈的样本数据进行模型自学习包括:基于所述带特征和反馈的样本数据,通过所述模型方案中的模型算法和模型的超参数进行训练,得到机器学习模型。
- 根据权利要求15所述的方法,其中,所述通过所述模型方案中的模型算法和模型的超参数进行训练,得到机器学习模型,包括:通过所述模型方案中的模型算法和模型的超参数训练初始模型,得到机器学习模型;其中所述初始模型为所述探索模型方案的过程中产生的模型,且将探索得到的模型方案部署上线时,还将所述初始模型部署上线。
- 根据权利要求15所述的方法,其中,所述通过所述模型方案中的模型算法和模型的超参数进行训练,得到机器学习模型,包括:通过所述模型方案中的模型算法和模型的超参数训练随机模型,得到机器学习模型;其中所述随机模型为基于所述模型算法生成的模型,且所述模型本身的参数取值为随机值;且将探索得到的模型方案部署上线时,没有将初始模型部署上线。
- 根据权利要求13所述的方法,其中,所述将所述模型自学习得到的模型部署上线以提供模型在线预估服务包括:将所述模型自学习得到的模型部署上线后,当接收到请求数据时,基于所述第二数据库中的数据和接收的请求数据生成带特征的预估样本,并通过部署上线的模型得到所述预估样本的预估结果;向所述指定业务场景发送所述预估结果。
- 根据权利要求13所述的方法,其中,所述将所述模型自学习得到的模型部署上线包括:将所述模型自学习得到的模型替换已部署上线的机器学习模型;或,将所述模型自学习得到的模型部署上线,并与已部署上线的机器学习模型共同提供模型在线预估服务;所述将探索得到的模型方案部署上线包括:将所述探索得到的模型方案替换已部署上线的模型方案;或,将所述探索得到的模型方案部署上线,且不下线已部署上线的模型方案。
- 根据权利要求13所述的方法,其中,所述第一预设条件和所述第二预设条件包括:数据量、 时间、人工触发中的至少一个。
- 一种应用机器学习的装置,所述装置包括:数据管理模块,被配置为基于数据服务接口在线获取指定业务场景的相关数据流;将所述相关数据流中的数据积累到第一数据库中;模型方案探索模块,被配置为当第一预设条件被满足时,基于所述第一数据库中的数据探索模型方案;模型在线预估服务模块,被配置为将所述模型方案探索模块得到的模型方案部署上线以提供模型在线预估服务,其中,所述模型在线预估服务基于所述数据服务接口在线获取的所述指定业务场景的相关数据流进行。
- 根据权利要求21所述的装置,其中,所述模型方案包括以下方案子项:特征工程方案、模型算法和模型的超参数。
- 根据权利要求21所述的装置,其中,所述数据管理模块还被配置为:提供用户界面,基于所述用户界面接收用户输入的关于所述指定业务场景的相关数据流的信息;基于所述关于所述指定业务场景的相关数据流的信息创建所述数据服务接口。
- 根据权利要求21所述的装置,其中,所述相关数据流包括:请求数据流、反馈数据流和业务数据流。
- 根据权利要求24所述的装置,其中,所述数据管理模块被配置为:处理所述请求数据流的数据得到样本数据;将所述请求数据流的数据、所述样本数据、所述反馈数据流的数据和所述业务数据流的数据积累到所述第一数据库中。
- 根据权利要求25所述的装置,其中,所述相关数据流还包括展示数据流;其中所述展示数据流的数据为所述指定业务场景基于所述请求数据流展示的数据;所述数据管理模块被配置为:基于所述展示数据流的数据对所述请求数据流的数据进行过滤,得到交集数据;处理所述交集数据得到样本数据;所述数据管理模块还被配置为将所述展示数据流的数据和所述样本数据积累到所述第一数据库中。
- 根据权利要求21至26任一项所述的装置,其中,所述模型方案探索模块被配置为:当第一预设条件被满足时,生成至少两个模型方案,其中,不同模型方案之间至少有一个方案子项不同;基于所述第一数据库中的数据分别采用所述至少两个模型方案进行模型训练;基于机器学习模型评价指标,对所述至少两个模型方案所分别训练出的模型进行评价;基于评价结果从所述至少两个模型方案中进行选择,得到探索到的模型方案。
- 根据权利要求27所述的装置,其中,所述数据管理模块还被配置为:基于用户界面接收用户输入的数据表属性信息和拼表方案;基于所述数据表属性信息和所述拼表方案,通过所述第一数据库维护逻辑关系信息;所述逻辑关系信息为描述不同数据表之间关系的信息;相应地,所述模型方案探索模块被配置为:基于所述逻辑关系信息生成至少两个模型方案。
- 根据权利要求28所述的装置,其中,所述拼表方案包括拼接不同数据表的拼接键、时序关系和聚合关系;所述逻辑关系信息包括:数据表属性信息和所述拼表方案。
- 根据权利要求21至26任一项所述的装置,其中,所述模型在线预估服务模块还被配置为:将所述模型方案探索模块得到的模型方案部署上线后,将所述相关数据流的数据存储到第二数据库中;所述第二数据库支持线上实时特征计算;当接收到请求数据时,基于部署上线的模型方案中的特征工程方案,利用所述第二数据库中的数据和接收的请求数据进行线上实时特征计算,得到预估样本的特征数据。
- 根据权利要求30所述的装置,其中,所述模型在线预估服务模块利用所述第二数据库中的数据和接收的请求数据进行线上实时特征计算,包括:基于部署上线的模型方案中的特征工程方案,对所述第二数据库中的数据和接收的请求数据进行拼表和线上实时特征计算得到宽表特征数据;相应地,所述预估样本的特征数据为宽表特征数据。
- 根据权利要求30所述的装置,其中,所述模型在线预估服务模块还被配置为:将所述模型方案探索模块得到的模型方案部署上线后,当接收到请求数据且没有将模型部署上线时,向所述指定业务场景发送默认的预估结果。
- 根据权利要求30所述的装置,其中,所述模型在线预估服务模块还被配置为:基于部署上线的模型方案得到预估样本的特征数据;拼接所述特征数据和反馈数据,生成带特征和反馈的样本数据,其中所述反馈数据来源于反馈数据流;将所述带特征和反馈的样本数据回流到所述第一数据库中;所述装置还包括模型自学习模块,被配置为当第二预设条件被满足时,基于所述第一数据库中的带特征和反馈的样本数据进行模型自学习;所述模型在线预估服务模块还被配置为:将所述模型自学习模块得到的模型部署上线以提供模型在线预估服务。
- 根据权利要求33所述的装置,其中,所述模型在线预估服务模块还被配置为:在拼接所述特征数据和反馈数据之前,拼接所述特征数据和展示数据,得到带展示数据的特征数据;所述展示数 据来源于展示数据流;相应地,所述模型在线预估服务模块被配置为拼接所述带展示数据的特征数据和反馈数据,生成带展示数据、特征数据和反馈数据的样本数据。
- 根据权利要求33所述的装置,其中,所述模型自学习模块被配置为:当第二预设条件被满足时,基于所述带特征和反馈的样本数据,通过所述模型方案中的模型算法和模型的超参数进行训练,得到机器学习模型。
- 根据权利要求35所述的装置,其中,所述模型自学习模块被配置为:通过所述模型方案中的模型算法和模型的超参数训练初始模型,得到机器学习模型;其中所述初始模型为所述模型方案探索模块探索模型方案的过程中产生的模型,且所述模型在线预估服务模块将所述模型方案探索模块得到的模型方案部署上线时,还将所述初始模型部署上线。
- 根据权利要求35所述的装置,其中,所述模型自学习模块被配置为包括:通过所述模型方案中的模型算法和模型的超参数训练随机模型,得到机器学习模型;其中所述随机模型为基于所述模型算法生成的模型,且所述模型本身的参数取值为随机值;且所述模型在线预估服务模块将所述模型方案探索模块得到的模型方案部署上线时,没有将初始模型部署上线。
- 根据权利要求33所述的装置,其中,所述模型在线预估服务模块被配置为:将所述模型自学习模块得到的模型部署上线后,当接收到请求数据时,基于所述第二数据库中的数据和接收的请求数据生成带特征的预估样本,并通过部署上线的模型得到所述预估样本的预估结果;向所述指定业务场景发送所述预估结果。
- 根据权利要求33所述的装置,其中,所述模型在线预估服务模块被配置为:将所述模型自学习模块得到的模型替换已部署上线的机器学习模型;或,将所述模型自学习模块得到的模型部署上线,并与已部署上线的机器学习模型共同提供模型在线预估服务;所述模型在线预估服务模块被配置为:将所述模型方案探索模块得到的模型方案替换已部署上线的模型方案;或,将所述模型方案探索模块得到的模型方案部署上线,且不下线已部署上线的模型方案。
- 根据权利要求33所述的装置,其中,所述第一预设条件和所述第二预设条件包括:数据量、时间、人工触发中的至少一个。
- 一种电子设备,包括:处理器和存储器;所述处理器通过调用所述存储器存储的程序或指令,被配置为执行如权利要求1至20任一项所述方法的步骤。
- 一种计算机可读存储介质,所述计算机可读存储介质存储程序或指令,所述程序或指令使计算机执行如权利要求1至20任一项所述方法的步骤。
- 一种计算机程序产品,包括计算机程序指令,当所述计算机程序指令在计算机装置上运行时实现如权利要求1至20任一项所述方法的步骤。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP21802933.8A EP4152224A4 (en) | 2020-05-15 | 2021-05-17 | MACHINE LEARNING APPLICATION METHOD, DEVICE, ELECTRONIC APPARATUS AND STORAGE MEDIUM |
US17/925,576 US20230342663A1 (en) | 2020-05-15 | 2021-05-17 | Machine learning application method, device, electronic apparatus, and storage medium |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010415370.7A CN113673707A (zh) | 2020-05-15 | 2020-05-15 | 一种应用机器学习的方法、装置、电子设备及存储介质 |
CN202010415370.7 | 2020-05-15 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021228264A1 true WO2021228264A1 (zh) | 2021-11-18 |
Family
ID=78525199
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/094202 WO2021228264A1 (zh) | 2020-05-15 | 2021-05-17 | 一种应用机器学习的方法、装置、电子设备及存储介质 |
Country Status (4)
Country | Link |
---|---|
US (1) | US20230342663A1 (zh) |
EP (1) | EP4152224A4 (zh) |
CN (1) | CN113673707A (zh) |
WO (1) | WO2021228264A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115242648A (zh) * | 2022-07-19 | 2022-10-25 | 北京百度网讯科技有限公司 | 扩缩容判别模型训练方法和算子扩缩容方法 |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112036577B (zh) * | 2020-08-20 | 2024-02-20 | 第四范式(北京)技术有限公司 | 基于数据形式的应用机器学习的方法、装置和电子设备 |
CN112446597B (zh) * | 2020-11-14 | 2024-01-12 | 西安电子科技大学 | 贮箱质量评估方法、系统、存储介质、计算机设备及应用 |
CN114238269B (zh) * | 2021-12-03 | 2024-01-23 | 中兴通讯股份有限公司 | 数据库参数调整方法、装置、电子设备和存储介质 |
CN116451056B (zh) * | 2023-06-13 | 2023-09-29 | 支付宝(杭州)信息技术有限公司 | 端特征洞察方法、装置以及设备 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106777088A (zh) * | 2016-12-13 | 2017-05-31 | 飞狐信息技术(天津)有限公司 | 快速迭代的搜索引擎排序方法及系统 |
US20180012145A1 (en) * | 2016-07-07 | 2018-01-11 | Hcl Technologies Limited | Machine learning based analytics platform |
CN109003091A (zh) * | 2018-07-10 | 2018-12-14 | 阿里巴巴集团控股有限公司 | 一种风险防控处理方法、装置及设备 |
CN110766164A (zh) * | 2018-07-10 | 2020-02-07 | 第四范式(北京)技术有限公司 | 用于执行机器学习过程的方法和系统 |
CN111107102A (zh) * | 2019-12-31 | 2020-05-05 | 上海海事大学 | 基于大数据实时网络流量异常检测方法 |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8533222B2 (en) * | 2011-01-26 | 2013-09-10 | Google Inc. | Updateable predictive analytical modeling |
US20160148115A1 (en) * | 2014-11-26 | 2016-05-26 | Microsoft Technology Licensing | Easy deployment of machine learning models |
US11681943B2 (en) * | 2016-09-27 | 2023-06-20 | Clarifai, Inc. | Artificial intelligence development via user-selectable/connectable model representations |
CN107862602A (zh) * | 2017-11-23 | 2018-03-30 | 安趣盈(上海)投资咨询有限公司 | 一种基于多维度指标计算、自学习及分群模型应用的授信决策方法与系统 |
CN110083334B (zh) * | 2018-01-25 | 2023-06-20 | 百融至信(北京)科技有限公司 | 模型上线的方法及装置 |
CN110766163B (zh) * | 2018-07-10 | 2023-08-29 | 第四范式(北京)技术有限公司 | 用于实施机器学习过程的系统 |
CN110956272B (zh) * | 2019-11-01 | 2023-08-08 | 第四范式(北京)技术有限公司 | 实现数据处理的方法和系统 |
CN111008707A (zh) * | 2019-12-09 | 2020-04-14 | 第四范式(北京)技术有限公司 | 自动化建模方法、装置及电子设备 |
-
2020
- 2020-05-15 CN CN202010415370.7A patent/CN113673707A/zh active Pending
-
2021
- 2021-05-17 WO PCT/CN2021/094202 patent/WO2021228264A1/zh active Application Filing
- 2021-05-17 US US17/925,576 patent/US20230342663A1/en active Pending
- 2021-05-17 EP EP21802933.8A patent/EP4152224A4/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180012145A1 (en) * | 2016-07-07 | 2018-01-11 | Hcl Technologies Limited | Machine learning based analytics platform |
CN106777088A (zh) * | 2016-12-13 | 2017-05-31 | 飞狐信息技术(天津)有限公司 | 快速迭代的搜索引擎排序方法及系统 |
CN109003091A (zh) * | 2018-07-10 | 2018-12-14 | 阿里巴巴集团控股有限公司 | 一种风险防控处理方法、装置及设备 |
CN110766164A (zh) * | 2018-07-10 | 2020-02-07 | 第四范式(北京)技术有限公司 | 用于执行机器学习过程的方法和系统 |
CN111107102A (zh) * | 2019-12-31 | 2020-05-05 | 上海海事大学 | 基于大数据实时网络流量异常检测方法 |
Non-Patent Citations (1)
Title |
---|
See also references of EP4152224A4 |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115242648A (zh) * | 2022-07-19 | 2022-10-25 | 北京百度网讯科技有限公司 | 扩缩容判别模型训练方法和算子扩缩容方法 |
CN115242648B (zh) * | 2022-07-19 | 2024-05-28 | 北京百度网讯科技有限公司 | 扩缩容判别模型训练方法和算子扩缩容方法 |
Also Published As
Publication number | Publication date |
---|---|
EP4152224A4 (en) | 2024-06-05 |
US20230342663A1 (en) | 2023-10-26 |
CN113673707A (zh) | 2021-11-19 |
EP4152224A1 (en) | 2023-03-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021228264A1 (zh) | 一种应用机器学习的方法、装置、电子设备及存储介质 | |
CN104541247B (zh) | 用于调整云计算系统的系统和方法 | |
US11222343B2 (en) | Method and apparatus for autonomous services composition | |
CN110537193A (zh) | 卷积神经网络的快速计算 | |
CN109033109B (zh) | 数据处理方法及系统 | |
WO2022048648A1 (zh) | 实现自动构建模型的方法、装置、电子设备和存储介质 | |
CN112036577B (zh) | 基于数据形式的应用机器学习的方法、装置和电子设备 | |
US20150007084A1 (en) | Chaining applications | |
CN110119393A (zh) | 代码版本管理系统及方法 | |
WO2023201990A1 (zh) | 一种视觉定位方法、装置、设备及介质 | |
US20220366913A1 (en) | Systems and method for third party natural language understanding service integration | |
US20200302020A1 (en) | Systems and methods for a virtual agent in a cloud computing environment | |
WO2024016547A1 (zh) | 一种基于多方协作的数据查询方法及装置 | |
US11868361B2 (en) | Data distribution process configuration method and apparatus, electronic device and storage medium | |
CN110633959A (zh) | 基于图结构的审批任务创建方法、装置、设备及介质 | |
US9426197B2 (en) | Compile-time tuple attribute compression | |
CN110442753A (zh) | 一种基于opc ua的图数据库自动建立方法及装置 | |
WO2024139703A1 (zh) | 对象识别模型的更新方法、装置、电子设备、存储介质及计算机程序产品 | |
US9325758B2 (en) | Runtime tuple attribute compression | |
CN116910567B (zh) | 推荐业务的在线训练样本构建方法及相关装置 | |
US20230195742A1 (en) | Time series prediction method for graph structure data | |
WO2018205390A1 (zh) | 一种控件布局显示控制方法、系统、装置及计算机可读存储介质 | |
JP2023537725A (ja) | 分析ワークスペースの実体化 | |
Tomczak et al. | Development of service composition by applying ICT service mapping | |
WO2022037689A1 (zh) | 一种基于数据形式的数据处理方法和应用机器学习的方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21802933 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2021802933 Country of ref document: EP |
|
ENP | Entry into the national phase |
Ref document number: 2021802933 Country of ref document: EP Effective date: 20221215 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |