[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN117745329A - Data processing method, model training method and electronic equipment - Google Patents

Data processing method, model training method and electronic equipment Download PDF

Info

Publication number
CN117745329A
CN117745329A CN202211117464.1A CN202211117464A CN117745329A CN 117745329 A CN117745329 A CN 117745329A CN 202211117464 A CN202211117464 A CN 202211117464A CN 117745329 A CN117745329 A CN 117745329A
Authority
CN
China
Prior art keywords
domain
output
sample
module
media data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211117464.1A
Other languages
Chinese (zh)
Inventor
周晓松
陈舒
蔡庆亮
王喆
何海乾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Youzhuju Network Technology Co Ltd
Lemon Inc Cayman Island
Original Assignee
Beijing Youzhuju Network Technology Co Ltd
Lemon Inc Cayman Island
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Youzhuju Network Technology Co Ltd, Lemon Inc Cayman Island filed Critical Beijing Youzhuju Network Technology Co Ltd
Priority to CN202211117464.1A priority Critical patent/CN117745329A/en
Priority to PCT/CN2023/117748 priority patent/WO2024055912A1/en
Publication of CN117745329A publication Critical patent/CN117745329A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0499Feedforward networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0202Market predictions or forecasting for commercial activities

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • Development Economics (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Strategic Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Game Theory and Decision Science (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present disclosure relates to a data processing method, a model training method, an apparatus, an electronic device, a computer readable storage medium and a computer program product. The method comprises the following steps: acquiring a pre-generated multi-domain network model, wherein the multi-domain network model comprises a bottom layer module, a center branch and a plurality of domain branches corresponding to a plurality of domains; determining a join feature of the media data to be processed by inputting the media data to be processed to the underlying module; inputting the coupling feature into the central branch to obtain a first output; inputting the connection characteristic into a domain branch corresponding to the domain to which the media data to be processed belongs to obtain a second output; and determining a prediction result based on the first output and the second output, the prediction result comprising a CTR and/or a CVR. Therefore, the CTR/CVR can be obtained by prediction through the multi-domain network model, the global view angle can be considered, and the media data to be processed only need to pass through the corresponding domain branches, so that the processing calculation amount is more targeted and reduced, and the CTR/CVR can be determined more accurately and efficiently.

Description

Data processing method, model training method and electronic equipment
Technical Field
The present disclosure relates generally to the field of computers, and more particularly to data processing methods, model training methods, and electronic devices.
Background
Click through rates (click through rate, CTR) and/or conversion rates (CVR) are part of the important indicators of media information systems. Thus, if the CTR and/or CVR can be determined more accurately, important references can be provided to the provider of the media information.
Various media information, such as merchandise information, advertisement information, etc., may relate to different scenes, and the result may not be accurate enough if CTR and/or CVR are determined in the same manner due to differences between different scenes.
Disclosure of Invention
According to an example embodiment of the present disclosure, a data processing scheme based on a multi-domain network model is provided.
In a first aspect of the present disclosure, there is provided a data processing method, comprising: acquiring a pre-generated multi-domain network model, wherein the multi-domain network model comprises a bottom layer module, a center branch and a plurality of domain branches corresponding to a plurality of domains; determining a connection characteristic of the media data to be processed by inputting the media data to be processed into the bottom module; inputting the coupling feature into the central branch to obtain a first output; inputting the connection characteristic into a domain branch corresponding to a domain to which the media data to be processed belongs, so as to obtain a second output; and determining a prediction result of the media data to be processed based on the first output and the second output, the prediction result comprising a CTR and/or a CVR.
In a second aspect of the present disclosure, there is provided a model training method comprising: acquiring a media information data set for training, wherein the data set comprises a plurality of samples, each sample in the plurality of samples comprises a media data sample and a corresponding label, and the labels indicate CTR and/or CVR; for each sample in the dataset: determining a connection characteristic of the sample by inputting the sample to a bottom layer module of the multi-domain network model, inputting the connection characteristic of the sample to a central branch of the multi-domain network model to obtain a first output, inputting the connection characteristic of the sample to a domain branch corresponding to a domain to which the sample belongs to obtain a second output, and determining a network output of the sample based on the first output and the second output; and generating a multi-domain network model by training based on the labels of each sample in the dataset and the network output of each sample, wherein the multi-domain network model comprises a bottom layer module, a central branch and a plurality of domain branches corresponding to the plurality of domains.
In a third aspect of the present disclosure, there is provided an electronic device comprising: at least one processing unit; at least one memory coupled to the at least one processing unit and storing instructions for execution by the at least one processing unit, which when executed by the at least one processing unit, cause the electronic device to perform the method described in accordance with the first or second aspect of the present disclosure.
In a fourth aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon machine executable instructions which, when executed by a device, cause the device to perform a method according to the first or second aspect of the present disclosure.
In a fifth aspect of the present disclosure, there is provided a computer program product comprising computer executable instructions which when executed by a processor implement a method as described in accordance with the first or second aspect of the present disclosure.
In a sixth aspect of the present disclosure, there is provided an electronic device, comprising: processing circuitry configured to perform the method described according to the first or second aspect of the present disclosure.
The summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. The summary is not intended to identify key features or essential features of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The above and other features, advantages and aspects of embodiments of the present disclosure will become more apparent by reference to the following detailed description when taken in conjunction with the accompanying drawings. In the drawings, wherein like or similar reference numerals designate like or similar elements, and wherein:
FIG. 1 illustrates a schematic flow diagram of a process of model training according to some embodiments of the present disclosure;
FIG. 2 illustrates a schematic diagram of multi-domain splitting according to some embodiments of the present disclosure;
FIG. 3 illustrates an architectural diagram of a model according to some embodiments of the present disclosure;
FIG. 4 illustrates a schematic flow diagram of a process of data processing according to some embodiments of the present disclosure;
FIG. 5 illustrates a block diagram of an example training apparatus, according to some embodiments of the present disclosure;
FIG. 6 illustrates a block diagram of an example usage device, according to some embodiments of the present disclosure; and
FIG. 7 illustrates a block diagram of an example device that may be used to implement embodiments of the present disclosure.
Detailed Description
Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure have been shown in the accompanying drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but are provided to provide a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the present disclosure are for illustration purposes only and are not intended to limit the scope of the present disclosure.
In embodiments of the present disclosure, different scenarios may also be referred to as different domains, different fields, or other names, etc. By way of example, different scenarios may include, but are not limited to, different access manners, different advertisement types, different regions, different conversion events, and so forth.
For media information comprising multiple scenes or multiple domains (multi-domains), different scenes may have different data distributions, etc., if one model is simply used to determine CTR and/or CVR of the media information of multiple scenes, it is understood that there may be differences in modeling effects for the multiple different scenes, resulting in inaccurate determined results.
If different models are used for different scenes, respectively, data isolation between the scenes may result, although the respective models may be fitted more finely, processing each model separately may increase labor costs, and the number of samples of some scenes is too small due to the sample equalization problem of the respective scenes, making the model unstable. In practice, data isolation between different scenes discards some sample information of global views, and each model has independence from each other, so that each model lacks information of global views, and the obtained result is inaccurate.
In view of this, embodiments of the present disclosure provide a data processing scheme based on a multi-domain network model. In the scheme, the domain branches corresponding to the domains can be more targeted, the complete isolation between the different domain branches can be avoided through the center branch, so that the difference between different domains can be fully considered, and the global view angle is considered at the same time, so that the obtained result is more accurate.
Fig. 1 illustrates a schematic flow diagram of a process 100 of model training according to some embodiments of the present disclosure. At block 110, a media information dataset for training is obtained, the dataset comprising a plurality of samples, each sample of the plurality of samples comprising a media data sample and a corresponding tag, the tag indicating a CTR and/or a CVR. At block 120, for each sample in the dataset: the method comprises the steps of determining a coupling characteristic of a sample by inputting the sample to a bottom layer module of a multi-domain network model, inputting the coupling characteristic of the sample to a central branch of the multi-domain network model to obtain a first output, inputting the coupling characteristic of the sample to a domain branch corresponding to a domain to which the sample belongs to obtain a second output, and determining a network output of the sample based on the first output and the second output. At block 130, a multi-domain network model is generated by training based on the labels of each sample in the dataset and the network output of each sample, wherein the multi-domain network model includes a bottom layer module, a center branch, and a plurality of domain branches corresponding to the plurality of domains.
In some embodiments, multiple domains may be predefined and the domain to which each sample belongs determined. For example, the division of the domain may be determined based on posterior information, such as differences in posterior CTR/CVR, etc.
For example, assuming that the media information is an advertisement slot, the posterior CTR/CVR differences of different scenes can be analyzed based on the application scene of the advertisement slot, so as to determine the splitting manner of the domain. For example, the determined splitting manner may be: (1) If the type of the advertisement space belongs to banner advertisement, screen inserting advertisement, screen opening advertisement and information flow advertisement, the advertisement space belongs to a first domain; (2) If the type of the advertisement position belongs to the motivational video and the full-screen video, and the access mode is the software development kit (software development kit, SDK), the advertisement position belongs to a second domain; (3) If the advertisement position type belongs to "motivational video, full-screen video", and the access mode is "demand-side platform (DSP)", then it belongs to the third domain.
Fig. 2 illustrates a schematic diagram of a multi-domain split 200, according to some embodiments of the present disclosure. It is assumed that the types of advertisement slots (ad_slot_type) are each identified as table 1 below, and that an Access method (access_method) includes an SDK and a DSP.
TABLE 1
Advertisement spot type Identification mark
Banner advertisement 1
Screen inserting advertisement 2
Open screen advertisement 3
Information flow advertisement 5
Motivational video 7
Full screen video 8
It should be noted that the above examples and the splitting pattern of fig. 2 are only illustrative, and that in a practical scenario, a specific splitting pattern may be determined based on posterior information. And for other media information etc. a corresponding split may be determined, which will not be described in detail here.
Fig. 3 illustrates an architectural diagram of a model 300 according to some embodiments of the present disclosure. As shown in FIG. 3, model 300 includes an underlying module 310, a central branch 320, and a plurality of domain branches 330-1 through 330-N (which may be collectively or individually referred to as domain branches 330) corresponding to a plurality of domains, where N represents the number of domains.
Referring to fig. 3, the bottom layer module 310 may include a plurality of sub-modules, including, for example, a logistic regression (logistic regression, LR) sub-module 311, a factorer (factorization machines, FM) sub-module 312, and a vector compression (vec_compression) sub-module 313. It is to be understood that the illustration in fig. 3 is only schematic and should not be construed as limiting the embodiments of the disclosure, for example, the bottom layer module 310 may further include other sub-modules not shown in the drawings, for example, the vector compression sub-module 313 may also be referred to as a vector splicing sub-module, etc., which is not limited in this disclosure.
Illustratively, the original features may be converted into high-dimensional sparse binary vectors by way of one-hot encoding. Illustratively, low-dimensional dense vector features may be obtained by embedding-based means.
Illustratively, the output of the bottom layer module 310 may be a join feature, e.g., multiple outputs of multiple sub-modules (e.g., multiple vector features) may be joined (joined) to obtain a join feature.
In some embodiments, the central branch 320, also referred to as a central network, may include a forward neural network. The center branch 320 is a shared network, which is the network that each sample would enter. Referring to fig. 3, the central branch 320 may include a shared subnetwork 321 and a forward inference network 322 based on a learning hidden unit contribution (Learn Hidden Unit Contributions, LHUC) structure. For example, the shared subnetwork 321 is shared among multiple domains. Illustratively, the shared subnetwork 321 may be used to implement feature compression, such as deriving compressed features, e.g., 512-dimensional features, based on the join features, for input to the network 322. Illustratively, the network 322 may be implemented as a fully connected neural network, for example, with 512-dimensional input of feature vectors and 1-dimensional output.
In some embodiments, each domain branch 330 is a non-shared network, and only samples belonging to the corresponding domain will enter the corresponding domain branch. Taking the domain branch 330-1 of fig. 3 as an example, assuming that the domain branch 330-1 corresponds to the first domain, only the join characteristics of those samples belonging to the first domain will be input to the domain branch 330-1.
Domain branch 330-1 includes extraction network 331 and forward inference network 332. For example, extraction network 331 may be based on advanced hierarchical extraction (progressive layered extraction, PLE). For example, the forward inference network 332 may be implemented as a fully connected neural network, e.g., feature vectors input in 512 dimensions and outputs in 1 dimension.
Extraction network 331 may include a compression subnetwork 3311 and a gating (gate) subnetwork 3312. The compressed sub-network 3311 may be used to implement feature compression, e.g., based on the joined features of those samples belonging to the first domain, resulting in compressed features, e.g., 512 dimensions. The gating subnetwork 3312 may derive weighted domain embedded (weighted domain embedding) features based on vector features from the compression subnetwork 3311 and vector features from the same dimension of the shared subnetwork 321. Further, the weighted domain embedded features can be input to a forward inference network 332 to obtain a one-dimensional output.
Illustratively, domain branch 330-1 is described in more detail above in connection with FIG. 3, but it should be understood by those skilled in the art that other domain branches, such as domain branches 330-2 through 330-N, also have similar branching structures and are not repeated here for simplicity of illustration.
In some embodiments, the network output for a sample may be derived based on the one-dimensional output of the pre-branch to which the sample belongs and the one-dimensional output of the center branch. For example, assume that the sample belongs to field X (domainX) and the first output of the center branch is logic center The second output of the domain branch corresponding to domain X is logic domainX Then, the network output can be expressed as: logit final =logit center +logit domainX . In other examples, the network output may also be obtained by weighted summation or other means, which are not listed here.
In other embodiments, the outputs of the logistic regression sub-module 311 and the factoring machine sub-module 312 in the underlying module 310 may also be obtained and the network output determined based on the outputs of the logistic regression sub-module 311 and the factoring machine sub-module 312. For example, the first output of the center branch, the second output of the domain branch corresponding to the domain X, the third output of the logistic regression sub-module 311, and the fourth output of the factorer sub-module 312 may be superimposed to obtain the network output. Alternatively, the manner of superposition may be weighted summation or other manners, which is not limited by the present disclosure.
In this way, by inputting each sample in the media information dataset separately to the multi-domain network model to be trained, a loss function may be constructed based on the differences between each network output of each sample and the corresponding label, through which the trained multi-domain network model may be generated through training.
In this way, the scheme can be used for a plurality of different domains by using the multi-domain network model, and the network model for the plurality of domains can be obtained by training based on the data set and by training once.
The multi-domain network model in the embodiments of the present disclosure includes a central branch, so that global perspectives can be considered, avoiding complete isolation between different domains; the multi-domain network model includes a plurality of domain branches so as to be more targeted for individual domains, so that the multi-domain network model can be more precisely used for media information of multiple domains.
Fig. 4 illustrates a schematic flow diagram of a process 400 of data processing according to some embodiments of the present disclosure. At block 410, a pre-generated multi-domain network model is obtained, wherein the multi-domain network model includes an underlying module, a central branch, and a plurality of domain branches corresponding to the plurality of domains. At block 420, the join characteristics of the media data to be processed are determined by inputting the media data to be processed into the underlying module. At block 430, the join feature is input to the center branch, resulting in a first output. At block 440, the join feature is input to a domain branch corresponding to the domain to which the media data to be processed belongs, resulting in a second output. At block 450, a prediction result of the media data to be processed is determined based on the first output and the second output, the prediction result including a CTR and/or a CVR.
In some embodiments, the multi-domain network model obtained at block 410 may be a multi-domain network model generated by training as described in connection with fig. 1-3 above. As previously described, the multi-domain network model includes an underlying module, which may include a plurality of sub-modules. As previously mentioned, the multi-domain network model includes a central branch, which may include, for example, a shared subnetwork and a forward reasoning network implemented as a LHUC structure. As previously described, the multi-domain network model includes a plurality of domain branches, e.g., each domain branch includes an extraction network and a forward reasoning network, where the extraction network may include a compression sub-network and a gating sub-network.
For example, the bottom module may include a plurality of sub-modules, and then inputting the media data to be processed into the bottom module, for example, respectively into the plurality of sub-modules, may correspondingly obtain a plurality of outputs. Further, multiple outputs may be joined to obtain a joined characteristic of the media data to be processed.
Illustratively, the join feature may be input to a center branch (center branch 320 shown in fig. 3) and a domain branch (assumed to be domain branch 330-1 shown in fig. 3) corresponding to a domain to which media data to be processed belongs, respectively. Alternatively, the join feature of the media data to be processed may be input to the compression sub-network 3311 of the domain branch 330-1, resulting in a vector feature of, for example, 512 dimensions. Alternatively, the join characteristics of the media data to be processed may be input to the shared subnetwork 321 of the central branch 320, resulting in vector characteristics of, for example, 512 dimensions.
Illustratively, the output of the shared subnetwork 321 may be the input to a forward inference network 322, resulting in a first output.
Illustratively, the output of the compression sub-network 3311 and the output of the shared sub-network 321 may be input to the gating sub-network 3312, resulting in a weighted domain embedding feature. And the weighted domain embedded feature can be used as an input to the forward inference network 332 to derive a second output.
Optionally, a superposition (e.g. summation) of the first output and the second output is used as a prediction result of the media data to be processed. Optionally, a third output of the logistic regression sub-module 311 and a fourth output of the factorer sub-module 312 in the bottom layer module 310 may also be obtained, and a superposition (e.g., summation) of the first output, the second output, the third output, and the fourth output may be used as a prediction result of the media data to be processed.
In this way, the embodiments of the present disclosure can utilize a multi-domain network model to predict CTR/CVR, since the multi-domain network model includes a central branch, global view can be considered, avoiding complete isolation between different domains; the multi-domain network model comprises a plurality of domain branches, so that the media data to be processed only needs to be predicted through the corresponding domain branches, on one hand, the multi-domain network model is more targeted, and on the other hand, the calculation amount of processing is reduced. This way the use of the multi-domain network model can be used more accurately and efficiently to determine the CTR/CVR of the media data of the respective domain.
It can be appreciated that in embodiments of the present disclosure, a prediction result (e.g., CTR/CVR) may be obtained, and the prediction result may be used for various application instances, for example, for arranging media data, so as to implement media data fine-ranking; for example, for media data recommendation to achieve more targeted recommendation information; etc. For example, in the process of advertisement placement, the placement mode of each advertisement placement may be determined based on the multi-domain network model in the embodiment of the disclosure, for example, the overall consumption of the advertisement platform may be reduced, the value of the advertiser may be improved, and so on. Through testing, the multi-domain network model in embodiments of the present disclosure is able to achieve offline AUC gains when used for multiple different domains, e.g., an improvement of about seven parts per million.
It should be understood that in embodiments of the present disclosure, "first," "second," "third," etc. are merely intended to indicate that a plurality of objects may be different, but at the same time do not exclude that two objects are identical, and should not be construed as any limitation of the embodiments of the present disclosure.
It should also be understood that the manner, case, category, and division of embodiments in the embodiments of the present disclosure are for descriptive convenience only and should not be construed as being particularly limiting, and that the features of the various manners, categories, cases, and embodiments may be combined with one another in a logical manner.
It should also be understood that the above is only intended to assist those skilled in the art in better understanding the embodiments of the present disclosure, and is not intended to limit the scope of the embodiments of the present disclosure. Various modifications, variations, combinations, etc. may be made by those skilled in the art in light of the above teachings. Such modifications, variations, or combinations are also within the scope of embodiments of the present disclosure.
It should also be appreciated that the foregoing description focuses on differences between the various embodiments and that the same or similar features may be referred to or otherwise referred to herein for brevity and clarity.
Fig. 5 illustrates a schematic block diagram of an example apparatus 500, according to some embodiments of the disclosure. The apparatus 500 may be implemented in software, hardware, or a combination of both. As shown in fig. 5, the apparatus 500 includes a data set acquisition module 510, a sample input module 520, and a training module 530.
The data set acquisition module 510 is configured to acquire a media information data set for training, the data set comprising a plurality of samples, each sample of the plurality of samples comprising a media data sample and a corresponding tag, the tag indicating a CTR and/or a CVR.
The sample input module 520 is configured to, for each sample in the data set: the method comprises the steps of determining a coupling characteristic of a sample by inputting the sample to a bottom layer module of a multi-domain network model, inputting the coupling characteristic of the sample to a central branch of the multi-domain network model to obtain a first output, inputting the coupling characteristic of the sample to a domain branch corresponding to a domain to which the sample belongs to obtain a second output, and determining a network output of the sample based on the first output and the second output.
The training module 530 is configured to generate a multi-domain network model by training based on the labels of each sample in the dataset and the network output of each sample, wherein the multi-domain network model includes a bottom layer module, a center branch, and a plurality of domain branches corresponding to the plurality of domains.
In some embodiments, the underlying modules include a logistic regression sub-module and a factoring machine sub-module. The sample input module 520 is configured to derive a third output of the sample and a fourth output of the sample by inputting the sample to the logistic regression sub-module and the factorizer sub-module, respectively, and wherein the network output of the sample is determined based on the first output, the second output, the third output, and the fourth output.
In some embodiments, the bottom layer module includes a plurality of sub-modules, and the sample input module 520 is configured to obtain a plurality of outputs by inputting sample inputs to the plurality of sub-modules, respectively; and deriving a junction characteristic of the sample based on the junction of the plurality of outputs.
Illustratively, each domain branch includes an extraction network and a forward inference network, wherein the extraction network is to: based on the first input feature input to the domain branch and the second input feature input to the central branch, a weighted domain embedding feature is obtained through the gating sub-network, the weighted domain embedding feature being an input to the forward inference network.
Optionally, the central branch comprises a forward reasoning network based on the LHUC structure.
The apparatus 500 of fig. 5 can be used to implement the processes described above in connection with fig. 1-3, and are not repeated here for brevity.
Fig. 6 illustrates a schematic block diagram of an example apparatus 600, according to some embodiments of the disclosure. The apparatus 600 may be implemented in software, hardware, or a combination of both. As shown in fig. 6, the apparatus 600 includes an acquisition module 610, a coupling feature determination module 620, a first output determination module 630, a second output determination module 640, and a prediction result determination module 650.
The acquisition module 610 is configured to acquire a pre-generated multi-domain network model, wherein the multi-domain network model includes a bottom layer module, a center branch, and a plurality of domain branches corresponding to the plurality of domains. The join feature determination module 620 is configured to determine join features of the media data to be processed by inputting the media data to be processed into the underlying module. The first output determination module 630 is configured to input the join feature to the central branch, resulting in a first output. The second output determining module 640 is configured to input the join feature to a domain branch corresponding to a domain to which the media data to be processed belongs, resulting in a second output. The prediction result determination module 650 is configured to determine a prediction result of the media data to be processed, based on the first output and the second output, the prediction result including the click rate CTR and/or the conversion rate CVR.
In some embodiments, the underlying modules include a logistic regression sub-module and a factoring machine sub-module. The apparatus 600 may further include an underlying module output determination module configured to: and respectively inputting the media data to be processed into a logistic regression sub-module and a factorization machine sub-module to obtain a third output and a fourth output.
For example, the prediction result determination module 650 may be configured to determine the prediction result based on the first output, the second output, the third output, and the fourth output.
In some embodiments, the underlying module includes a plurality of sub-modules, and the join feature determination module 620 includes: a plurality of output determination submodules configured to obtain a plurality of outputs by inputting media data input to be processed to the plurality of submodules, respectively; and a junction feature determination sub-module configured to derive a junction feature based on the junction of the plurality of outputs.
Illustratively, the domain branch corresponding to the domain to which the media data to be processed belongs comprises an extraction network and a forward reasoning network, wherein the extraction network is configured to: based on the first input feature input to the domain branch and the second input feature input to the central branch, a weighted domain embedding feature is obtained through the gating sub-network, the weighted domain embedding feature being an input to the forward inference network.
Optionally, the central branch comprises a forward reasoning network based on the LHUC structure.
The apparatus 600 of fig. 6 can be used to implement the process described above in connection with fig. 4, and is not repeated here for brevity.
The division of the modules or units in the embodiments of the disclosure is schematically only one logic function division, and there may be another division manner in actual implementation, and in addition, each functional unit in the disclosed embodiments may be integrated in one unit, or may exist alone physically, or two or more units may be integrated into one unit. The integrated units may be implemented in hardware or in software functional units.
Fig. 7 illustrates a block diagram of an example device 700 that may be used to implement embodiments of the present disclosure. It should be understood that the apparatus 700 illustrated in fig. 7 is merely exemplary and should not be construed as limiting the functionality and scope of the implementations described herein. For example, the process described above with respect to fig. 1-4 may be performed using the apparatus 700. For example, device 700 may be implemented as a classical computer and/or a quantum computer.
As shown in fig. 7, device 700 is in the form of a general purpose computing device. Components of computing device 700 may include, but are not limited to, one or more processors or processing units 710, memory 720, storage 730, one or more communication units 740, one or more input devices 750, and one or more output devices 760. The processing unit 710 may be an actual or virtual processor and is capable of performing various processes according to programs stored in the memory 720. In a multiprocessor system, multiple processing units execute computer-executable instructions in parallel to increase the parallel processing capabilities of computing device 700.
Computing device 700 typically includes a number of computer storage media. Such media can be any available media that is accessible by computing device 700 and includes, but is not limited to, volatile and non-volatile media, removable and non-removable media. The Memory 720 may be volatile Memory (e.g., registers, cache, random access Memory (Random Access Memory, RAM)), non-volatile Memory (e.g., read Only Memory (ROM), electrically erasable programmable Read Only Memory (Electrically Erasable Programmable Read Only Memory, EEPROM), flash Memory), or some combination thereof. Storage device 730 may be a removable or non-removable media and may include machine-readable media such as flash drives, magnetic disks, or any other media that may be capable of storing information and/or data (e.g., training data for training) and may be accessed within computing device 700.
Computing device 700 may further include additional removable/non-removable, volatile/nonvolatile storage media. Although not shown in fig. 7, a magnetic disk drive for reading from or writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk may be provided. In these cases, each drive may be connected to a bus (not shown) by one or more data medium interfaces. Memory 720 may include a computer program product 725 having one or more program modules configured to perform the various methods or acts of the various implementations of the disclosure.
Communication unit 740 enables communication with other computing devices via a communication medium. Additionally, the functionality of the components of computing device 700 may be implemented in a single computing cluster or in multiple computing machines capable of communicating over a communications connection. Thus, the computing device 700 may operate in a networked environment using logical connections to one or more other servers, a network personal computer (Personal Computer, PC), or another network node.
The input device 750 may be one or more input devices such as a mouse, keyboard, trackball, etc. The output device 760 may be one or more output devices such as a display, speakers, printer, etc. Computing device 700 may also communicate with one or more external devices (not shown), such as storage devices, display devices, etc., as needed, through communication unit 740, with one or more devices that enable a user to interact with computing device 700, or with any device (e.g., network card, modem, etc.) that enables computing device 700 to communicate with one or more other computing devices. Such communication may be performed via an Input/Output (I/O) interface (not shown).
According to an exemplary implementation of the present disclosure, a computer-readable storage medium having stored thereon computer-executable instructions, wherein the computer-executable instructions are executed by a processor to implement the method described above is provided. According to an exemplary implementation of the present disclosure, there is also provided a computer program product tangibly stored on a non-transitory computer-readable medium and comprising computer-executable instructions that are executed by a processor to implement the method described above. According to an exemplary implementation of the present disclosure, a computer program product is provided, on which a computer program is stored, which program, when being executed by a processor, implements the method described above.
Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus, devices, and computer program products implemented according to the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.
These computer readable program instructions may be provided to a processing unit of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processing unit of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable medium having the instructions stored therein includes an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer readable program instructions may be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various implementations of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The foregoing description of implementations of the present disclosure has been provided for illustrative purposes, is not exhaustive, and is not limited to the implementations disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the various implementations described. The terminology used herein was chosen in order to best explain the principles of each implementation, the practical application, or the improvement of technology in the marketplace, or to enable others of ordinary skill in the art to understand each implementation disclosed herein.

Claims (17)

1. A data processing method, comprising:
acquiring a pre-generated multi-domain network model, wherein the multi-domain network model comprises a bottom layer module, a central branch and a plurality of domain branches corresponding to a plurality of domains;
determining the connection characteristics of the media data to be processed by inputting the media data to be processed into the bottom layer module;
inputting the junction feature into the central branch, resulting in a first output;
inputting the connection characteristic into a domain branch corresponding to the domain to which the media data to be processed belongs to obtain a second output; and
based on the first output and the second output, a prediction result of the media data to be processed is determined, the prediction result comprising a click rate CTR and/or a conversion rate CVR.
2. The method of claim 1, wherein the underlying module comprises a logistic regression sub-module and a factorer sub-module, the method further comprising:
and respectively inputting the media data to be processed into the logistic regression sub-module and the factorization machine sub-module to obtain a third output and a fourth output.
3. The method of claim 2, wherein determining a prediction of media data to be processed comprises:
the prediction result is determined based on the first output, the second output, the third output, and the fourth output.
4. The method of claim 1, wherein the underlying module comprises a plurality of sub-modules, wherein determining the join characteristics of the media data to be processed comprises:
the media data input to be processed is respectively input to the plurality of sub-modules, so that a plurality of outputs are obtained; and
the junction characteristic is derived based on a junction of the plurality of outputs.
5. The method of claim 1, wherein the domain branch corresponding to the domain to which the media data to be processed belongs comprises an extraction network and a forward reasoning network, wherein the extraction network is configured to:
based on the first input feature input to the domain branch and the second input feature input to the central branch, a weighted domain embedding feature is obtained through a gating sub-network, the weighted domain embedding feature being an input to the forward reasoning network.
6. The method of claim 1, wherein the central branch comprises a forward reasoning network that contributes to the LHUC structure based on learning hidden units.
7. A model training method, comprising:
obtaining a media information dataset for training, the dataset comprising a plurality of samples, each sample of the plurality of samples comprising a media data sample and a corresponding tag, the tag indicating a click rate CTR and/or a conversion rate CVR;
for each sample in the dataset:
by inputting the sample into the underlying module of the multi-domain network model, determining the join characteristics of the sample,
inputting the join feature of the sample to a central branch of the multi-domain network model, resulting in a first output,
inputting the joint feature of the sample into a domain branch corresponding to the domain to which the sample belongs to obtain a second output, and
determining a network output of the sample based on the first output and the second output; and
the multi-domain network model is generated by training based on the labels of each sample in the dataset and the network output of each sample, wherein the multi-domain network model comprises the bottom layer module, the central branch and a plurality of domain branches corresponding to a plurality of domains.
8. The method of claim 7, wherein the underlying module includes a logistic regression sub-module and a factorer sub-module, the method further comprising:
by inputting the samples to the logistic regression sub-module and the factorizer sub-module, respectively, a third output of the samples and a fourth output of the samples are obtained,
and wherein the network output of the sample is determined based on the first output, the second output, the third output, and the fourth output.
9. The method of claim 7, wherein the underlying module comprises a plurality of sub-modules, wherein determining the join characteristics of the sample comprises:
obtaining a plurality of outputs by inputting the samples to the plurality of sub-modules, respectively; and
based on the coupling of the plurality of outputs, a coupling characteristic of the sample is obtained.
10. The method of claim 7, wherein each domain branch comprises an extraction network and a forward reasoning network, wherein the extraction network is to:
based on the first input feature input to the domain branch and the second input feature input to the central branch, a weighted domain embedding feature is obtained through a gating sub-network, the weighted domain embedding feature being an input to the forward reasoning network.
11. The method of claim 7, wherein the central branch comprises a forward reasoning network that contributes LHUC structure based on learning hidden units.
12. An electronic device, comprising:
at least one processing unit;
at least one memory coupled to the at least one processing unit and storing instructions for execution by the at least one processing unit, the instructions when executed by the at least one processing unit cause the electronic device to perform acts comprising:
acquiring a pre-generated multi-domain network model, wherein the multi-domain network model comprises a bottom layer module, a central branch and a plurality of domain branches corresponding to a plurality of domains;
determining the connection characteristics of the media data to be processed by inputting the media data to be processed into the bottom layer module;
inputting the junction feature into the central branch, resulting in a first output;
inputting the connection characteristic into a domain branch corresponding to the domain to which the media data to be processed belongs to obtain a second output; and
based on the first output and the second output, a prediction result of the media data to be processed is determined, the prediction result comprising a click rate CTR and/or a conversion rate CVR.
13. An electronic device, comprising:
at least one processing unit;
at least one memory coupled to the at least one processing unit and storing instructions for execution by the at least one processing unit, the instructions when executed by the at least one processing unit cause the electronic device to perform acts comprising:
obtaining a media information dataset for training, the dataset comprising a plurality of samples, each sample of the plurality of samples comprising a media data sample and a corresponding tag, the tag indicating a click rate CTR and/or a conversion rate CVR;
for each sample in the dataset:
by inputting the sample into the underlying module of the multi-domain network model, determining the join characteristics of the sample,
inputting the join feature of the sample to a central branch of the multi-domain network model, resulting in a first output,
inputting the joint feature of the sample into a domain branch corresponding to the domain to which the sample belongs to obtain a second output, and
determining a network output of the sample based on the first output and the second output; and
the multi-domain network model is generated by training based on the labels of each sample in the dataset and the network output of each sample, wherein the multi-domain network model comprises the bottom layer module, the central branch and a plurality of domain branches corresponding to a plurality of domains.
14. A data processing apparatus comprising:
an acquisition module configured to acquire a pre-generated multi-domain network model, wherein the multi-domain network model includes a bottom layer module, a center branch, and a plurality of domain branches corresponding to a plurality of domains;
a join feature determination module configured to determine join features of media data to be processed by inputting the media data to be processed to the underlying module;
a first output determination module configured to input the junction feature to the central branch, resulting in a first output;
a second output determining module configured to input the join feature to a domain branch corresponding to a domain to which the media data to be processed belongs, to obtain a second output; and
a prediction result determination module configured to determine a prediction result of the media data to be processed based on the first output and the second output, the prediction result including a click rate CTR and/or a conversion rate CVR.
15. A model training apparatus comprising:
a data set acquisition module configured to acquire a media information data set for training, the data set comprising a plurality of samples, each sample of the plurality of samples comprising a media data sample and a corresponding tag, the tag indicating a click rate CTR and/or a conversion rate CVR;
a sample input module configured to, for each sample in the dataset:
by inputting the sample into the underlying module of the multi-domain network model, determining the join characteristics of the sample,
inputting the join feature of the sample to a central branch of the multi-domain network model, resulting in a first output,
inputting the joint feature of the sample into a domain branch corresponding to the domain to which the sample belongs to obtain a second output, and
determining a network output of the sample based on the first output and the second output; and
a training module configured to generate the multi-domain network model by training based on the labels of each sample in the dataset and the network output of each sample, wherein the multi-domain network model includes the floor module, the center branch, and a plurality of domain branches corresponding to a plurality of domains.
16. A computer readable storage medium having stored thereon a computer program which when executed by a processor implements the method according to any of claims 1 to 11.
17. A computer program product having a computer program stored thereon, which when executed by a processor, implements the method according to any of claims 1 to 11.
CN202211117464.1A 2022-09-14 2022-09-14 Data processing method, model training method and electronic equipment Pending CN117745329A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202211117464.1A CN117745329A (en) 2022-09-14 2022-09-14 Data processing method, model training method and electronic equipment
PCT/CN2023/117748 WO2024055912A1 (en) 2022-09-14 2023-09-08 Data processing method, model training method, and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211117464.1A CN117745329A (en) 2022-09-14 2022-09-14 Data processing method, model training method and electronic equipment

Publications (1)

Publication Number Publication Date
CN117745329A true CN117745329A (en) 2024-03-22

Family

ID=90253202

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211117464.1A Pending CN117745329A (en) 2022-09-14 2022-09-14 Data processing method, model training method and electronic equipment

Country Status (2)

Country Link
CN (1) CN117745329A (en)
WO (1) WO2024055912A1 (en)

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9569696B1 (en) * 2015-08-12 2017-02-14 Yahoo! Inc. Media content analysis system and method
CN113762501A (en) * 2021-04-20 2021-12-07 京东城市(北京)数字科技有限公司 Prediction model training method, device, equipment and storage medium
CN113409090B (en) * 2021-07-05 2024-07-05 中国工商银行股份有限公司 Training method, prediction method and device of advertisement click rate prediction model

Also Published As

Publication number Publication date
WO2024055912A1 (en) 2024-03-21

Similar Documents

Publication Publication Date Title
Augenstein et al. Generative models for effective ML on private, decentralized datasets
Merrick et al. The explanation game: Explaining machine learning models using shapley values
US9305263B2 (en) Combining human and machine intelligence to solve tasks with crowd sourcing
CN102855259B (en) Parallelization of massive data clustering analysis
US20190272553A1 (en) Predictive Modeling with Entity Representations Computed from Neural Network Models Simultaneously Trained on Multiple Tasks
US20210103858A1 (en) Method and system for model auto-selection using an ensemble of machine learning models
EP3596689A1 (en) Mixed-initiative machine learning systems and methods for determining segmentations
Viscusi et al. Digital information asset evaluation: Characteristics and dimensions
Florez-Lopez Effects of missing data in credit risk scoring. A comparative analysis of methods to achieve robustness in the absence of sufficient data
Kuosmanen et al. Discrete and integer valued inputs and outputs in data envelopment analysis
Pajankar et al. Introduction to machine learning with Scikit-Learn
CN111159241B (en) Click conversion estimation method and device
CN111160638A (en) Conversion estimation method and device
Panda et al. Web site productivity measurement using single task size measure
CN117745329A (en) Data processing method, model training method and electronic equipment
Wang et al. Detecting difference between process models using edge network
Singh et al. Spot instance similarity and substitution effect in cloud spot market
Urh et al. Structural indicators for business process redesign efficiency assessment
CN114971532B (en) Enterprise portrait establishing and managing method and system
Zirogiannis et al. Dynamic factor analysis for short panels: Estimating performance trajectories for water utilities
US11809375B2 (en) Multi-dimensional data labeling
Cignoni et al. Pisa, 1954–1961: Assessing Key Stages of a Seminal Italian Project
CN116127083A (en) Content recommendation method, device, equipment and storage medium
CN113760713B (en) Test method, system, computer system and medium
Sabbani et al. Business matching for event management and marketing in mass based on predictive algorithms

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination