CN114443235A - User portrait offline data processing method, device and equipment and storage medium - Google Patents
User portrait offline data processing method, device and equipment and storage medium Download PDFInfo
- Publication number
- CN114443235A CN114443235A CN202011194093.8A CN202011194093A CN114443235A CN 114443235 A CN114443235 A CN 114443235A CN 202011194093 A CN202011194093 A CN 202011194093A CN 114443235 A CN114443235 A CN 114443235A
- Authority
- CN
- China
- Prior art keywords
- processing
- target
- data
- task
- user portrait
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Program initiating; Program switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/20—Software design
- G06F8/24—Object-oriented
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/60—Software deployment
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Stored Programmes (AREA)
Abstract
The application provides a user portrait offline data processing method, device, equipment and storage medium; the method comprises the following steps: receiving configuration information through a preset external interface; the configuration information is used for configuring different processing modes for the user portrait to be processed of different processing targets; generating a target task corresponding to the user image to be processed according to the configuration information; loading the target task into a preset processing service when a starting instruction for the target task is received; the preset processing service is used for executing a preset processing flow, and the preset processing flow comprises a customized processing flow and a general processing flow; and in the running process of the preset processing service, executing a corresponding customized processing flow according to the target task, and executing a general processing flow according to a preset data processing method to obtain a processed user portrait. By the method and the device, the efficiency and maintainability of user portrait offline data processing can be improved.
Description
Technical Field
The present application relates to internet technologies, and in particular, to a method, an apparatus, a device, and a storage medium for processing user portrait offline data.
Background
With the rapid iteration of the business requirements, software engineers are often faced with various business requirements for processing user portrait offline data, such as business requirements for user portrait synchronization, portrait access, portrait cleaning, portrait preprocessing, etc. for user portrait. At present, developers usually adopt independent development processes to realize different business requirements, so that the development cost is high and the working efficiency is low. Meanwhile, the quality of programs developed by different developers for realizing the same function may be different, and the execution of the programs with poor development quality may reduce the user portrait offline data processing efficiency; moreover, when upgrading and maintaining development projects realized by different software, rapid upgrading and maintenance are difficult to be performed in a unified manner, so that maintainability of offline data processing of final user portrait is also reduced.
Disclosure of Invention
The embodiment of the application provides a user portrait offline data processing method, device and storage medium, which can improve the efficiency and maintainability of user portrait offline data processing.
The technical scheme of the embodiment of the application is realized as follows:
receiving configuration information through a preset external interface; the configuration information is used for configuring different processing modes for the user portrait to be processed of different processing targets; the user image to be processed contains behavior characteristic information of the user;
generating a target task corresponding to the user image to be processed according to the configuration information;
when a starting instruction for the target task is received, loading the target task into a preset processing service; the preset processing service is used for executing a preset processing flow, and the preset processing flow comprises a customized processing flow and a general processing flow; the general class processing flow is used for executing the same preset data processing process on the user portrait to be processed of different processing targets;
and in the running process of the preset processing service, executing a corresponding customized processing flow according to the target task, and executing the general processing flow according to a preset data processing method to obtain a processed user portrait.
The embodiment of the present application provides a user portrait offline data processing apparatus, including:
the receiving module is used for receiving the configuration information through a preset external interface; the configuration information is used for configuring different processing modes for the user portrait to be processed of different processing targets; the user image to be processed contains behavior characteristic information of the user;
the generating module is used for generating a target task corresponding to the user image to be processed according to the configuration information;
the loading module is used for loading the target task into a preset processing service when a starting instruction aiming at the target task is received; the preset processing service is used for executing a preset processing flow, and the preset processing flow comprises a customized processing flow and a general processing flow; the general class processing flow is used for executing the same preset data processing process on the user portrait to be processed of different processing targets;
and the processing module is used for executing a corresponding customized processing flow according to the target task and executing the general processing flow according to a preset data processing method in the running process of the preset processing service to obtain a processed user portrait.
In the device, the preset external interface is a task configuration interface, and the receiving module is further configured to respond to an operation instruction for a preset configuration interface entry on a task processing interface and enter the task configuration interface; displaying at least one configuration item control on the task configuration interface, and receiving at least one externally input configuration item data through the at least one configuration item control; taking the at least one configuration item data as the configuration information.
In the above apparatus, the generating module is further configured to jump from the task configuration interface to the task processing interface in response to an operation instruction for a preset issuing control on the task configuration interface; generating the target task according to the configuration information, wherein the target task carries target task data; and displaying the target task in a task processing list of the task processing interface.
In the above apparatus, the preset processing service includes: the processing module is also used for reading target task data from the target task; when the preset processing service is the downloading process, determining a user portrait to be downloaded according to target downloading source information in the target task data; downloading the user portrait to be downloaded to the local according to a target downloading mode in the target task data by using a downloading general method to obtain the user portrait to be processed; the downloading general method is a preset data downloading method in the preset data processing methods; when the preset processing service is the processing process, preprocessing the user portrait to be processed according to a processing general method in the general processing flow to obtain a preprocessed user portrait; calling a processing method corresponding to a target processing method name in the target task data to process the pre-processed user portrait to obtain a processed user portrait, so that when the preset processing service is the output process, the processed user portrait is output to a specified path according to target output configuration information in the target task data, and processing of the user portrait to be processed is achieved; the processing general method is a preset data processing method in the preset data processing methods.
In the above apparatus, the target download source information includes: target download address and target download data file name; the target task data further comprises: the processing module is also used for determining at least one data file containing the file name of the target download data in the target download address as a user portrait to be downloaded; downloading the user image to be downloaded to the local in the target downloading mode by using the downloading general method to obtain target file data; the target downloading mode comprises any one of single file downloading and full downloading; carrying out integrity check on the target file data through the target address of the download integrity check file; and when the integrity check is passed, taking the target file data as the user portrait to be processed.
In the above apparatus, the target task data further includes: downloading and saving the path; the processing module is further configured to use the download general method to download the user image to be downloaded locally in the target download manner, before target file data is obtained, when the download general method includes breakpoint download, check a file target address according to the download integrity, and determine whether the user image to be downloaded completes download; when the user portrait to be downloaded is not downloaded, acquiring historical execution information of the target task, and continuously judging whether the target task is an interrupted task or not according to the historical execution information; the historical execution information is used for recording the last operation result of the target task; when the target task is the interrupt task, obtaining historical download data of the user portrait to be downloaded from the download storage path, calculating an interrupt position according to the historical download data, and determining residual download data corresponding to the user portrait to be processed according to the interrupt position; and determining the residual download data as the user portrait to be downloaded.
In the above apparatus, the processing module is further configured to, when the download general method includes retry of download failure, if an ith data file in the at least one data file fails to be downloaded, obtain a number of download retries corresponding to the ith data file; wherein i is a positive integer greater than or equal to 1; when the download retry time exceeds a preset retry time threshold value, recording the ith data file to a download failure file list, and starting downloading the (i + 1) th file; and when the download retry times do not exceed the preset retry time threshold, using the download general method to re-download the ith data file in the target download mode until the download of the at least one data file is completed, so as to obtain the target file data.
In the above apparatus, the processing module is further configured to, before obtaining the download retry number corresponding to the ith data file, obtain the number of download failure files in the download failure file list when the download general method includes a global download failure alarm; and when the number of the download failure files exceeds a preset failure number threshold value, terminating the downloading of the portrait of the user to be downloaded and carrying out alarm prompt.
In the above apparatus, the general objective handling policy includes: a task allocation strategy and a result monitoring strategy; the processing module is further used for acquiring the user portrait to be processed through a preset data processing interface; batching at least one data file in the user portrait to be processed according to the task allocation strategy to obtain the preprocessed user portrait; the preprocessed user portraits comprise at least one batch of data files; calling a target data processing method through the target processing method name, and performing concurrent processing on the at least one batch of data files to obtain a processing result of each batch of data files; counting the total processing result of the at least one batch of data files according to the result monitoring strategy; and in the total processing result, when the number of the failure results does not exceed a preset processing failure threshold value, generating the processed user portrait according to the processing result of each batch of data files.
In the above device, the processing module is further configured to, after counting the total processing results of the at least one batch of data files according to the result monitoring policy, not generate the processed user portrait and perform an alarm prompt when the number of the failure results exceeds a preset processing failure threshold.
In the above apparatus, the loading module is further configured to, after the target task is loaded into a preset processing service when the start instruction for the target task is received, load the second task into the preset processing service when the start instruction for the second task is received in a process of running the target task through the preset processing service, and run the target task and the second task simultaneously in the preset processing service.
In the above apparatus, the at least one configuration item data further includes task execution period information, process deployment information, and host deployment information, where the task execution period information is used to configure a repeated execution period of the target task; the process deployment information is used for deploying a target process for executing the target task in the preset processing service; the host deployment information is used for deploying a target host address for running the preset processing service.
An embodiment of the present application provides an electronic device, including:
a memory for storing executable instructions;
and the processor is used for realizing the user portrait offline data processing method provided by the embodiment of the application when the executable instructions stored in the memory are executed.
The embodiment of the application provides a storage medium, which stores executable instructions and is used for causing a processor to execute the executable instructions so as to realize the user portrait offline data processing method provided by the embodiment of the application.
The embodiment of the application has the following beneficial effects:
in the process of processing the offline data of the user portrait, a target task can be generated in a mode of receiving configuration information, and the user portrait is processed in a differentiated mode by driving a customized processing flow through the target task so as to realize the specific target of processing the offline data of the user portrait; in addition, the user portrait offline data processing device can also realize a common processing flow in the user portrait offline data processing processes of different processing targets by using a preset data processing process in a general class processing flow, so that the process of converting the processing targets into a specific implementation scheme is simplified, the user portrait offline data processing efficiency is improved, the integration level of the data processing flow is higher, and the maintainability of the user portrait offline data processing is improved.
Drawings
FIG. 1 is a process diagram illustrating a process flow for implementing a user representation offline data processing requirement according to an embodiment of the present application;
FIG. 2 is a schematic diagram illustrating a summary analysis of a user representation offline data processing requirement according to an embodiment of the present disclosure;
FIG. 3 is an alternative block diagram of a user representation offline data processing software architecture according to an embodiment of the present application;
FIG. 4 is an alternative architectural diagram of a user representation offline data processing system architecture provided in accordance with an embodiment of the present application;
FIG. 5 is a schematic diagram of an alternative structure of a user representation offline data processing apparatus according to an embodiment of the present application;
FIG. 6 is a schematic diagram of an alternative structure of a user representation offline data processing apparatus according to an embodiment of the present application;
FIG. 7 is a schematic flow chart illustrating an alternative method for processing user portrait offline data according to an embodiment of the present application;
FIG. 8 is a schematic flow chart illustrating an alternative method for processing user representation offline data according to an embodiment of the present disclosure;
FIG. 9 is a schematic diagram illustrating an effect of a task configuration interface provided by an embodiment of the present application;
FIG. 10 is a schematic flow chart illustrating an alternative method for processing user representation offline data according to an embodiment of the present application;
FIG. 11 is a schematic diagram illustrating an effect of a task management interface provided by an embodiment of the present application;
FIG. 12 is a schematic flow chart illustrating an alternative method for processing user representation offline data according to an embodiment of the present disclosure;
FIG. 13 is a schematic flow chart diagram illustrating an alternative method for processing user representation offline data according to an embodiment of the present application;
FIG. 14 is a schematic flow chart illustrating an alternative method for processing user representation offline data according to an embodiment of the present disclosure;
FIG. 15 is a schematic flow chart illustrating an alternative method for processing user representation offline data according to an embodiment of the present disclosure;
FIG. 16 is an alternative flowchart of a user representation offline data processing method according to an embodiment of the present disclosure.
Detailed Description
In order to make the objectives, technical solutions and advantages of the present application clearer, the present application will be described in further detail with reference to the attached drawings, the described embodiments should not be considered as limiting the present application, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.
In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or different subsets of all possible embodiments, and may be combined with each other without conflict.
In the following description, references to the terms "first \ second \ third" are only to distinguish similar objects and do not denote a particular order, but rather the terms "first \ second \ third" are used to interchange specific orders or sequences, where appropriate, so as to enable the embodiments of the application described herein to be practiced in other than the order shown or described herein.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the present application only and is not intended to be limiting of the application.
Before further detailed description of the embodiments of the present application, terms and expressions referred to in the embodiments of the present application will be described, and the terms and expressions referred to in the embodiments of the present application will be used for the following explanation.
1) Distributed File System (Hadoop Distributed File System, HDFS): the method is suitable for a distributed file system running on general hardware, and has high fault tolerance and high throughput.
2) User portrait: the effective tool for drawing the target user and connecting the user appeal with the design direction can convert and connect the attributes and behaviors of the user with expected data to form a virtual representation of a real user. For example, in a recommendation system, users and contents are generally quantified, and both the users and the contents are labeled and classified, so that model training, deep user mining and the like are facilitated. For example, for a male user, according to the content that he usually consumes, such as pictures and texts of basketball, and videos, he is tagged with "basketball", "sports", and the like in the information flow recommendation system, so that when recommending the user, the user will recall, score, and finally recommend the content according to the matched tag.
3) Off-line data synchronization: the information recommendation system mainly comprises an online part and an offline part, wherein the online part comprises three steps of resource adaptation, feature extraction and scoring and sequencing: resource adaptation, mainly processing user portrait information and behavior information; feature extraction, which mainly comprises three aspects of feature design, feature index and feature coding; and (4) scoring and sorting, namely calculating click rate scores according to the extracted features and the CTR prediction model, training the offline part mainly for the model, extracting the features according to the combined logs, and further training the model. The off-line part usually includes many off-line data synchronization and processing tasks, such as user image synchronization, image access, image cleaning, image pre-processing, etc.
4) zookeeper: an open source distributed application coordination service for providing a consistency service for distributed applications.
5) etcd: a distributed kv storage facility with a highly available, strongly consistent service discovery storage repository.
At present, in a software project such as a user portrait offline data processing project, developers usually develop different software requirements of periodic iteration according to specific software requirements, and finally realize functions and release online through testing, joint debugging and verifying whether expected results are met. When a software development project meets new requirements, the new requirements are often difficult to combine with the previous requirements, and the new similar requirements need to be subjected to independent processes of development, testing, joint debugging, verification, online connection and the like. Illustratively, requirement 1 requires that when recommending users, a game representation is first introduced for all current users from the game big data, and offline data processing of the user representation is completed. A software developer may specifically develop implementation 1 shown in fig. 1 for requirement 1: for example, the data file is first acquired and downloaded, then the downloaded data is processed, and finally the processed data is stored in the database, and meanwhile, a processing scheme corresponding to different abnormal situations, such as a downloading abnormality, is implemented in implementation scheme 1.
However, after the software developer realizes requirement 1, new requirement 2 may be met, for example, requirement 2 needs to introduce music and video images of the current user from the big data of music or video, and further process the music and video images, so as to conveniently mine the deep interests and preferences of the user. Although the new requirement 2 is similar to the requirement 1, the specific processing mode of each link is different, the implementation process of the requirement 1 cannot be reused, and a developer needs to re-develop the new requirement 2 to support the requirement 2. For the new demand scenario encountered in such a development process, the solution is usually to first consider whether the previous program can be quickly reused, and if not, develop support. However, in actual situations, the demands change frequently, and most of functions required to be realized by each demand are different, so that many demands cannot be multiplexed, and the current solution has to independently develop each demand that cannot be multiplexed, thereby resulting in long development time, high cost, linear increase of development time under extreme conditions, and high error rate, and further reducing the working efficiency and the speed of service update iteration. Moreover, different developers have different codes for realizing the same function, so that different requirements are filled with a lot of codes with similar and repeated functions, such as data acquisition, data code downloading and the like, thereby greatly increasing the risk of code failure and influencing the maintainability of data processing.
In order to save development cost, improve work efficiency and accelerate the agile development of projects, the method and the system refine a set of solutions by researching a large number of demand cases and continuously combing demands on the basis of thinking and abstraction and taking the problems existing in the prior art as trigger points, and are suitable for actual software development and generation environments. The following describes the design principle and idea of the present application in detail by taking user portrait requirement processing as an example.
On the basis of abstracting and analyzing a large number of demands, the method combs out the following dimensions of each demand: 1. the data input source, the different requirements may include different user portrait acquisition paths; 2. data acquisition means, such as one-time download or multiple downloads; 3. the data processing logic has different specific processing modes required by different requirements, such as format preprocessing, or screening, splicing, encryption, compression, portrait cleaning and the like, and the processing logics of different processing modes are different and are usually difficult to unify; 4. data output means, such as storing the processed user portrait in a database, or outputting the processed user portrait to a model training module; 5. the task running period is different from the processing task cycle execution period required by different requirements; 6. and (4) monitoring the running of the tasks, wherein the monitoring granularity of the tasks running at fixed time is different according to different requirements. Through the above analysis, the results of the demand analysis shown in fig. 2 can be summarized.
By combining the demand analysis result of fig. 2, the present application can decouple the different points and the common points in a large number of demands, and provide a generalized solution for different demands through the functional architecture as shown in fig. 3. The functional architecture in fig. 3 can be divided into three functional layers: an input layer, a processing layer, and an output layer. Each functional layer abstractly encapsulates common points of the requirements to be realized in the functional layer based on the requirement analysis in fig. 2, and provides a general method for realizing capability support; meanwhile, the differentiated requirements are used as configuration items, a configuration page or a reserved function calling interface is provided for configuration and use of developers, specific implementation logic is accessed through the interface, and finally offline data processing of the user portrait is completed by combining the configuration page and the reserved function calling interface.
Based on the foregoing technical principles and ideas, the method, apparatus, device and storage medium for processing user portrait offline data implemented in the embodiments of the present application are not limited to be applied in a scenario of processing user portrait offline data, but are also applicable in a variety of data processing scenarios including data downloading, processing and outputting. For example, for an online image processing model training scene, a sample image may be downloaded by the method in the embodiment of the present application, and the sample image is subjected to enhancement processing in different ways and output to different image processing models; the method and the device can also be used for processing data of other types of user images to be processed, such as text types, and the embodiment of the application is not limited.
Next, a specific implementation of the embodiment of the present application when applied to a device will be described.
The embodiments of the present application provide a method, an apparatus, a device, and a storage medium for processing offline data of a user portrait, which can improve efficiency and maintainability of data processing, and an exemplary application of an electronic device provided in the embodiments of the present application is described below. In the following, an exemplary application will be explained when the device is implemented as a server.
Referring to fig. 4, fig. 4 is an alternative architecture diagram of an information recommendation system provided in the embodiment of the present application. The information recommendation system can be used for supporting recommendation scenes of various kinds of information, such as application scenes for recommending news, application scenes for recommending commodities, application scenes for recommending videos, and the like. The information recommendation system 100 includes a terminal 400, a network 300, a model generation server 600, a database 500, a server 200, and an information recommendation server 100, where the terminal 400, the model generation server 600, and the information recommendation server 100 are connected through the network 300, and the model generation server 600, the database 500, the server 200, and the information recommendation server 100 may be connected through the network, or two or more of them may be deployed on the same server host. The recommendation method in the information recommendation system in the embodiment of the present application may include an offline part and an online part, as shown in fig. 5, where the offline part in fig. 5 may be implemented by the model generation server 600, the database 500, and the server 200 in fig. 4, and the online part may be implemented by the information recommendation server 100 in fig. 4. The off-line part mainly comprises three parts of user behavior collection, user portrait calculation and Click-Through Rate (CTR) prediction model training calculation, wherein the user behavior collection mainly comprises Click, display exposure, approval, watching duration and the like, the user portrait calculation mainly calculates interests of users under different dimensions of labels, classifications and the like according to the user behaviors, the user portraits can be label interests, video classifications and the like, the label interests can be 'lakes', 'royal horses' and the like, data after each user portrait represent influence weights or importance degrees of different user portraits in the whole information characteristics, the CTR prediction model mainly performs feature extraction and model training according to the user behaviors and the user portraits, the on-line part mainly comprises three parts of a candidate recall part, a sequencing scoring part and diversity display, the candidate recall part mainly performs article recall according to the user behaviors and the user portraits, and the ranking and scoring part performs feature extraction and click rate prediction scoring according to the offline trained model, and the diversity display is performed by combining the diversity model on the basis of ranking and scoring.
In fig. 4, the terminal 400 is configured to report the collected user behavior to the model generation server 600 in the process of using the client by the user, so that the model generation server 600 generates a user portrait or a feature model corresponding to the user, and stores the generated user portrait or feature model in the database 500. The server 200 is a server for performing offline data processing on the user portrait, and the server 200 is used for acquiring the user portrait from the database 500 as a user portrait to be processed and receiving configuration information through a preset external interface; the configuration information is used for configuring different processing modes for the user portrait to be processed of different processing targets; the user picture to be processed contains behavior characteristic information of the user; generating a target task corresponding to the user image to be processed according to the configuration information; loading the target task into a preset processing service when a starting instruction for the target task is received; the preset processing service is used for executing a preset processing flow, and the preset processing flow comprises a customized processing flow and a general processing flow; the general class processing flow is used for executing the same preset data processing process on the user portrait to be processed of different processing targets; executing a corresponding customized processing flow according to a target task and executing a general processing flow according to a preset data processing method in the running process of a preset processing service to obtain a processed user portrait; the processed user representation is sent to the information recommendation server 100, so that the information recommendation server 100 generates recommendation information according to the processed user representation, and the information recommendation server 100 pushes the recommendation information to the user using the terminal 400 through the network 300, so that information content which is recommended according to the behavior of the user and is possibly interested is presented to the user. The network 300 may be a wide area network or a local area network, or a combination thereof.
In some embodiments, the server 200 may be an independent physical server, may also be a server cluster or a distributed system formed by a plurality of physical servers, and may also be a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a network service, cloud communication, a middleware service, a domain name service, a security service, a CDN, and a big data and artificial intelligence platform. The terminal 400 may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, and the like. The terminal and the server may be directly or indirectly connected through wired or wireless communication, and the embodiment of the present application is not limited.
Referring to fig. 6, fig. 6 is a schematic structural diagram of a server 200 according to an embodiment of the present application, where the server 200 shown in fig. 6 includes: at least one processor 410, memory 450, at least one network interface 420, and a user interface 430. The various components in the terminal 400 are coupled together by a bus system 440. It is understood that the bus system 440 is used to enable communications among the components. The bus system 440 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are labeled as bus system 440 in fig. 6.
The Processor 410 may be an integrated circuit chip having Signal processing capabilities, such as a general purpose Processor, a Digital Signal Processor (DSP), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like, wherein the general purpose Processor may be a microprocessor or any conventional Processor, or the like.
The user interface 430 includes one or more output devices 431, including one or more speakers and/or one or more visual displays, that enable the presentation of media content. The user interface 430 also includes one or more input devices 432, including user interface components that facilitate user input, such as a keyboard, mouse, microphone, touch screen display, camera, other input buttons and controls.
The memory 450 may be removable, non-removable, or a combination thereof. Exemplary hardware devices include solid state memory, hard disk drives, optical disk drives, and the like. Memory 450 optionally includes one or more storage devices physically located remote from processor 410.
The memory 450 includes both volatile memory and nonvolatile memory, and can include both volatile and nonvolatile memory. The nonvolatile memory may be a Read Only Memory (ROM), and the volatile memory may be a Random Access Memory (RAM). The memory 450 described in embodiments herein is intended to comprise any suitable type of memory.
In some embodiments, memory 450 is capable of storing data, examples of which include programs, modules, and data structures, or a subset or superset thereof, to support various operations, as exemplified below.
An operating system 451, including system programs for handling various basic system services and performing hardware-related tasks, such as a framework layer, a core library layer, a driver layer, etc., for implementing various basic services and handling hardware-based tasks;
a network communication module 452 for communicating to other computing devices via one or more (wired or wireless) network interfaces 420, exemplary network interfaces 420 including: bluetooth, wireless compatibility authentication (WiFi), and Universal Serial Bus (USB), etc.;
a presentation module 453 for enabling presentation of information (e.g., user interfaces for operating peripherals and displaying content and information) via one or more output devices 431 (e.g., display screens, speakers, etc.) associated with user interface 430;
an input processing module 454 for detecting one or more user inputs or interactions from one of the one or more input devices 432 and translating the detected inputs or interactions.
In some embodiments, the user representation offline data device provided by the embodiments of the present application may be implemented in software, and fig. 6 shows a user representation offline data processing device 455 stored in a memory 450, which may be software in the form of programs and plug-ins, and includes the following software modules: a receiving module 4551, a generating module 4552, a loading module 4553 and a processing module 4554, which are logical and thus may be arbitrarily combined or further split depending on the functions implemented.
The functions of the respective modules will be explained below.
In other embodiments, the user representation offline data processing apparatus provided in the embodiments of the present application may be implemented in hardware, and for example, the user representation offline data processing apparatus provided in the embodiments of the present application may be a processor in the form of a hardware decoding processor, which is programmed to execute the user representation provided in the embodiments of the present application. . Methods, for example, a processor in the form of a hardware decoding processor may employ one or more Application Specific Integrated Circuits (ASICs), DSPs, Programmable Logic Devices (PLDs), Complex Programmable Logic Devices (CPLDs), Field Programmable Gate Arrays (FPGAs), or other electronic components.
The user portrait offline data processing method provided by the embodiment of the present application will be described in conjunction with exemplary applications and implementations of the server provided by the embodiment of the present application.
Referring to fig. 7, fig. 7 is an alternative flowchart of a user portrait offline data processing method according to an embodiment of the present application, which will be described with reference to the steps shown in fig. 7.
S101, receiving configuration information through a preset external interface; the configuration information is used for configuring different processing modes for the user portrait to be processed of different processing targets; the user image to be processed contains the behavior characteristic information of the user.
In the embodiment of the application, the user portrait offline data processing device can receive the configuration information for the user portrait to be processed through the preset external interface.
In the embodiment of the application, different processing methods of the user portrait to be processed can be specified through different configuration information for the user portrait to be processed with different processing targets. In some embodiments, based on the requirement analysis diagram of fig. 2, for a variable requirement portion, such as a data input source, including different download addresses and integrity check addresses corresponding to different image types, image data processing, an image output path, and a task monitoring policy, which may be used as configurable items, configuration information corresponding to the configurable items is received through a preset external interface.
In the embodiment of the application, the preset external interface may be a configuration control on a visual graphical interface, and the configuration information is obtained by receiving an operation of an operator on the configuration control on the visual graphical interface; or the parameter reception of the configuration information transmitted from the outside may be realized through a program interface, which is specifically selected according to the actual situation, and the embodiment of the present application is not limited.
And S102, generating a target task corresponding to the user image to be processed according to the configuration information.
In the embodiment of the application, when the user portrait offline data processing device receives the configuration information, the target task corresponding to the configuration information is generated, so that the user portrait to be processed is corresponding to the configuration information configured for the user portrait to be processed through the target task, and the data processing process of the user portrait to be processed is executed and managed through executing the target task.
In the embodiment of the application, the target task is generated according to the configuration information, so that a differentiated processing mode specified by the user portrait to be processed can be specified according to the target task.
S103, when a starting instruction for the target task is received, loading the target task into a preset processing service; the preset processing service is used for executing a preset processing flow, and the preset processing flow comprises a customized processing flow and a general processing flow; the general class processing flow is used for executing the same data processing process on the user portrait to be processed of different processing targets.
In the embodiment of the application, when a starting instruction for a target task is received, the user portrait offline data processing device loads the target task into a preset processing service, so that a processing method specified in the target task is executed and realized through the preset processing service.
In the embodiment of the application, for a target task which is configured to be started immediately after being issued in configuration information, when the target task is issued, a user portrait offline data processing device determines that a starting instruction for the target task is received; for a target task configured by a preset execution period in the configuration information, when the preset execution period is reached each time, the user portrait offline data processing device determines that a starting instruction for the target task is received; for a target task configured with a timing start or other start conditions, when the start time of the target task reaches or is monitored to meet the start conditions, the user portrait offline data processing device determines that a start instruction for the target task is received.
In the embodiment of the application, the preset processing service is a service for executing a preset processing flow, and when the user portrait offline data processing device receives a start instruction for a target task, the preset processing service may be in an operating state to support hot loading of the task. The pre-defined processing service may process multiple target tasks through one or more process instances, or may deploy multiple pre-defined processing services on one or more data processing devices, such as hosts, to fully utilize machine resources. The specific deployment mode of the preset processing service may be selected according to a task processing condition, for example, a task with a heavy data processing is deployed alone, a task with a small data processing amount and the same data processing may be deployed in one process instance, and the specific deployment mode is selected according to an actual condition, and the embodiment of the present application is not limited.
In the embodiment of the application, the configuration information may include process deployment information and host deployment information, where the process deployment information is used to deploy a target process executing a target task in a preset processing service; the host deployment information is used for deploying a target host address for running the preset processing service. It should be noted that, when deployment is performed through the process deployment information and the host deployment information, deployment may be performed through an ip address and a running service name of the target host, or configuration storage and distribution may be performed by using distributed components such as zookeeper and etcd. And more flexible task scheduling and forwarding processing is realized.
In the embodiment of the application, the preset processing flow includes a customized processing flow and a general processing flow. The general class processing flow is used for executing the same data processing process on the user portrait to be processed of different processing targets. For example, for user images to be processed of different processing targets, each processing target needs to perform the same processing procedures of downloading data, exception processing (downloading failure retry, breakpoint downloading, etc.), processing data, and data output. The user portrait offline data processing device can execute the general processing flow through the preset processing service, and directly apply the pre-packaged general processing method in the general processing flow to realize the common data processing process.
In the embodiment of the application, the customized processing flow is a process that different processing needs to be performed on the user portrait to be processed with different processing targets, and when the preset processing service executes the customized processing flow, a processing mode of the user portrait to be processed needs to be determined and executed according to a method or information specified in a target task. Illustratively, the customized class process flow may include different implementation methods for parsing, splicing, encrypting, compressing, cleaning, formatting and merging the user representation for different processing targets. The customized processing flow can call a developer to complete a program for realizing a specific function through a preset calling interface to run.
And S104, in the running process of the preset processing service, executing a corresponding customized processing flow according to the target task, and executing a general processing flow according to a preset data processing method to obtain a processed user portrait.
In the embodiment of the application, in the operation process of the preset processing service, the user portrait offline data processing device can execute the data processing process corresponding to the customized type processing flow according to the target task, and meanwhile, the user portrait to be processed is processed in a universal mode according to the preset data processing method in the universal type processing flow, so that the user portrait to be processed is processed through the universal type processing flow and the customized type processing flow, and the processed user portrait is obtained.
In the embodiment of the present application, the pre-set processing service runs the pre-set processing flow based on the framework of input layer-processing layer-output layer as shown in fig. 3. For an input layer used for acquiring and downloading a user portrait to be processed, the user portrait offline data processing device can read a specific downloading address and a downloading mode specified in a target task, and download a data file of the specified address in the specified downloading mode based on a preset general data downloading bottom layer method and a general downloading strategy so as to realize the function of the input layer through a general processing flow and a customized processing flow; for a processing layer for realizing specific data processing logic, the user portrait offline data processing device can execute customized processing flows by calling specific processing methods specified in target tasks, such as cleaning, encryption, combination and the like, and jointly realize the processing of a user portrait to be processed by combining with a general preprocessing method in a general processing flow to obtain a processed user portrait so as to finish the function of the processing layer; for an output layer used for outputting data, the user portrait offline data processing device can call a general data output method according to a target output path designated in a target task and output the processed data to the designated path. Therefore, the whole processing process of the user portrait to be processed can be completed by utilizing the customized processing flow and the general processing flow in the whole preset processing flow.
In some embodiments of the application, the processed user portrait is sent to an information recommendation server, so that the information recommendation server generates recommendation information according to the processed user portrait, and then the recommendation information is pushed to the user through the information recommendation server.
In the embodiment of the application, the user portrait offline data processing device sends the processed user portrait to the information recommendation server, algorithm analysis is carried out through the real-time online information recommendation server according to the processed user portrait, recommendation information which is interesting to a user and represented by the processed user portrait is generated, and then the information recommendation server pushes the recommendation information to the user, and interesting information content is presented to the user.
It can be understood that, in the process of processing the offline data of the user portrait, the user portrait offline data processing device can generate a target task in a manner of receiving configuration information, and drive a customized processing flow through the target task to perform differentiated processing on the user portrait so as to realize a specific target of processing the offline data of the user portrait; in addition, the user portrait offline data processing device can also realize the common data processing flow in the user portrait offline data processing of different processing targets by using the preset data processing process in the generic processing flow, thereby simplifying the process required for converting the processing target into a specific implementation scheme, improving the user portrait offline data processing efficiency, and improving the maintainability of the user portrait offline data processing due to higher integration level of the data processing flow.
In some embodiments, referring to fig. 8, fig. 8 is an optional flowchart of the user portrait offline data processing method provided in this embodiment, and S101 in fig. 7 may be specifically implemented by S1011 to S1012, which will be described with reference to each step.
And S1011, responding to an operation instruction aiming at a preset configuration interface inlet on the task processing interface, and entering the task configuration interface.
In the embodiment of the application, the user portrait offline data processing device can realize the visualization process of data configuration through a preset task processing interface, and further improve the configuration efficiency of operators. The task processing interface comprises a preset configuration interface inlet, and when an operation instruction acting on the preset configuration interface inlet is received, the task processing interface jumps to the task configuration interface so as to complete new creation, modification and release of a target task through the task configuration interface.
S1012, displaying at least one configuration item control on the task configuration interface, and receiving at least one configuration item data input from the outside through the at least one configuration item control.
And S1013, taking at least one piece of configuration item data as configuration information.
In the embodiment of the application, the user portrait offline data processing device displays at least one configuration item control on the task configuration interface, and receives external input aiming at the configuration item control through each configuration item control in the at least one configuration item control to obtain configuration item data corresponding to the configuration item control, so as to obtain the at least one configuration item data.
In some embodiments, the task configuration interface may be as shown in fig. 9, where at least one configuration item control is displayed on the task configuration interface shown in fig. 9, and for example, a descriptive description corresponding to each configuration item control is displayed at a display position in front of each configuration item control.
In some embodiments, at least one configuration item control in fig. 9 may be implemented correspondingly through a task configuration table as shown in table 1, as follows:
TABLE 1
In table 1, id is the primary key corresponding to the target task, and one is added for each target task generated. hdfs _ finish _ checked _ file is a data integrity check file path used for configuring a path for generating an integrity check file, and when the integrity check file is generated under the path, it indicates that the user portrait to be processed is downloaded completely. hdfs _ addr is a database-out address of the data hdfs and is used for configuring a download address when the user portrait to be processed is downloaded from the hdfs; local _ save _ dir is a local save path, the default format 1 is usually "./data/$ { table _ name }/$ { YYYYMMDD }", and is used for configuring the local save path when the user image to be processed is downloaded to the local, the place _ var is a data update period variable, and the default format 2 is $ { YYYYMMDD }, and is used for configuring a period for initiating data update; description is a brief description of the task. at _ once _ exec is used to determine whether the target task is automatically executed every next execution cycle for a periodically executed task. The need _ download is used to configure whether to download again after the user image to be processed has been downloaded. The which _ day _ download is used to specify the download date of the incremental data for the case of downloading the incremental data. The delete _ download _ file is used for deleting the locally saved download data after the user image to be processed is processed. the task _ handler is the name of the task processing structure and is used for calling a program for realizing specific data processing logic, and the core method logic specified by the processing target is realized in the program corresponding to the task _ handler. The batch _ size is used to specify the number of pieces of batch data when batch parallel processing is performed on at least one data file in the user portrait to be processed. in _ use is used to specify whether execution is to be started immediately after the target task is issued. instance _ name is a run instance name used to specify a process instance for running a target task at the time of a single-machine multi-instance deployment. The execute _ host _ addr is a host ip for task operation and is used for specifying a host address for target task operation when multiple hosts are deployed.
In the embodiment of the application, the user portrait offline data processing device takes at least one configuration item data as configuration information.
In some embodiments, referring to fig. 10, fig. 10 is an optional flowchart of the user portrait offline data processing method provided in the embodiment of the present application, and S102 in fig. 8 may be specifically implemented by S1021-S1022, which will be described with reference to the steps.
And S1021, responding to an operation instruction aiming at the preset issuing control on the task configuration interface, and jumping from the task configuration interface to the task processing interface.
In the embodiment of the present application, a preset publishing control is included on the task configuration interface, and for example, the preset publishing control may be the control 80 in fig. 9. When the user portrait offline data processing device receives an operation instruction acting on the preset issuing control, it is described that an operator needs to use configuration item information input on the current task configuration interface to generate a target task, and the user portrait offline data processing device responds to the operation instruction aiming at the preset issuing control and skips from the task configuration interface to the task processing interface.
S1022, generating a target task according to the configuration information, wherein the target task carries target task data; and displaying the target task in a task processing list of the task processing interface.
In the embodiment of the application, the user portrait offline data processing device takes the configuration information as target task data to be carried in the target task, and assigns a new primary key id to the target task corresponding to the configuration information so as to identify the target task, and then displays the target task and the target task data corresponding to the target task in a task processing list of a task processing interface.
In some embodiments, as shown in fig. 11, fig. 11 shows a task processing list including 12 target tasks, where target task data corresponding to each target task, that is, configuration information received by the target task on the previous task configuration interface, is correspondingly displayed in each target task. The user portrait offline data processing device can subsequently realize the management of each target task in the task processing list through the task processing interface, such as starting the target task, deleting the target task, re-editing the target task, exporting the target task list, and the like.
It can be understood that, in the embodiment of the application, the user portrait offline data processing device may receive and manage configuration information that needs to be configured for different processing targets through a visual interface, so that convenience and friendliness of data configuration operation are improved, and further, efficiency of user portrait offline data processing is improved.
In some embodiments, the pre-set processing service comprises: the downloading process, the processing process, and the outputting process, S104 may be specifically realized through S1041 to S1043, and will be described with reference to each step.
And S1041, reading target task data from the target task.
In the embodiment of the application, when the target task is executed through the preset processing service, the target task data is read from the target task loaded into the preset processing service.
S1042, when the preset processing service is a downloading process, determining a user portrait to be downloaded according to target downloading source information in the target task data; downloading the user image to be downloaded to the local according to a target downloading mode in the target task data by using a downloading general method to obtain the user image to be processed; the downloading general method is a preset data downloading method in the preset data processing method.
In an embodiment of the present application, the preset processing service includes: downloading process, processing process and output process. When the preset processing service currently executes a downloading process, the user image offline data processing device determines data source information and downloading mode information of the user image to be processed according to target downloading configuration information in the target task data, and downloads the user image to be processed to the local by using a downloading general method in a general processing flow.
In some embodiments, referring to fig. 12, fig. 12 is an optional flowchart of the data processing method provided in the embodiments of the present application. Based on fig. 10, the target download source information includes: target download address and target download data file name; the target task data further includes: the downloading of the target address of the integrity check file may specifically be implemented by S201 to S203, which will be described with reference to the steps.
S201, determining at least one data file containing a target download data file name in a target download address as a user portrait to be downloaded; downloading the user image to be downloaded to the local in a target downloading mode by using a downloading general method to obtain target file data; the target downloading mode comprises any one of single file downloading and full downloading.
In the embodiment of the present application, the data quantity device uses a general download method according to the configured slave target download address in the task data, for example, hdfs _ addr in table 1, such as establishing a data transmission channel with the target download address, starting data download, storing the download data, and so on, and uses a general bottom layer data download method to download the target file data corresponding to the target download data file name to the local in a target download manner.
In some embodiments, the target download mode may include a single file download and a full download. Because the user portrait to be processed usually comprises a plurality of data files, the size of the data files is different from a plurality of G to a plurality of T, and the data source end usually adopts a mode of being partitioned into small files for storage, the user portrait off-line data processing device can download one file for processing each time according to the needs of actual conditions, such as selecting a single file downloading mode, and then download the next file after the processing is finished; or selecting a full file downloading mode, downloading all data at one time, and then processing the data. The user portrait offline data processing device may also use other target data downloading methods, which are specifically selected according to actual situations, and the embodiment of the present application is not limited.
In some embodiments, the target download address may be an HDFS, or may also be a database, a message queue, or a local file, which is specifically selected according to the actual situation, and the embodiments of the present application are not limited.
In the embodiment of the application, the file name of the target download data is the file name of the user image to be processed which is specified to be downloaded in the target task data. In some embodiments, the target download data file name may also receive external configuration via a corresponding portrait field list control on the task configuration interface.
In the embodiment of the application, the local address corresponding to the target file data downloaded to the local is an address specified in the target task data, and exemplarily, the address configured corresponding to local _ save _ dir in table 1 is shown.
In some embodiments, referring to fig. 13, fig. 13 is an optional flowchart of a user portrait offline data processing method provided in the embodiments of the present application. Based on fig. 12, the target task data further includes: downloading and saving the path; in S201, a general downloading method is used, and before the user image to be downloaded is downloaded locally in a target downloading manner to obtain target file data, S001-S004 may be executed, which will be described with reference to the steps.
And S001, when the downloading general method comprises breakpoint downloading, checking a file target address according to downloading integrity, and judging whether the user portrait to be downloaded is downloaded completely.
In the embodiment of the application, when the downloading general method includes breakpoint downloading, the user portrait offline data processing device needs to determine whether to start the breakpoint downloading for the current target task before starting the downloading process.
In the embodiment of the application, when the user portrait to be downloaded is completely downloaded, a corresponding download integrity check file is generated at the target address of the download integrity check file. Therefore, the user portrait offline data processing device can check whether a corresponding file is generated in the file target address by accessing the downloading integrity, and judge whether the user portrait to be downloaded is downloaded completely.
S002, when the user portrait to be downloaded is not downloaded, obtaining historical execution information of the target task, and continuously judging whether the target task is an interrupted task or not according to the historical execution information; and the historical execution information is used for recording the last running result of the target task.
In the embodiment of the application, when the user portrait to be downloaded is not downloaded, the user portrait offline data processing device can acquire historical execution information of the target task from the running log of the target task, so that the last running result of the target task can be known, and the user portrait offline data processing device can continuously judge whether the target task is an interrupted task according to the historical execution information.
In some embodiments, when the last running of the target task is interrupted, interrupt identification information is generated in a running log of the target task so as to provide the user with a representation offline data processing device to judge whether the target task is the interrupted task.
And S003, when the target task is an interruption task, acquiring historical download data of the user portrait to be downloaded from the download storage path, calculating an interruption position according to the historical download data, and determining the residual download data corresponding to the user portrait to be downloaded according to the interruption position.
In the embodiment of the application, when the target task is judged to be the interrupted task, the user portrait offline data processing device determines historical download data of downloaded data files in a download saving path according to the download saving path appointed in the target task data, further calculates an interruption position according to the historical download data and the total size of the user portrait to be downloaded, and takes the user portrait to be downloaded after the interruption position as residual download data.
And S004, determining the residual download data as the user portrait to be downloaded.
In the embodiment of the application, the user portrait offline data processing device determines the residual download data as the user portrait to be downloaded, and then downloads the residual download data to the local according to the target download mode in the target task data by using a download general method, and performs data merging with the downloaded data to obtain the user portrait to be processed, so as to complete the download of the user portrait to be downloaded.
S202, integrity check is carried out on the target file data by downloading the target address of the integrity check file.
In the embodiment of the application, the user portrait offline data processing device can check the integrity of the target file data by accessing the target address of the download integrity check file in the target task data and checking whether the integrity check file exists in the directory corresponding to the address.
S203, when the integrity check passes, the target file data is used as the user portrait to be processed.
In the embodiment of the application, when the integrity check file is generated in the target address of the downloaded integrity check file, the user portrait offline data processing device determines that the integrity check is passed, which indicates that all the user portraits to be processed are downloaded as local target file data, and the user portrait offline data processing device takes the target file data as the user portrait to be processed.
It can be understood that, in the embodiment of the present application, the user portrait offline data processing apparatus can provide a breakpoint download function support for the download process of the user portrait to be processed of different processing targets by executing the breakpoint download process in the download general method, so as to save the cost of demand development for the download process, and improve the flexibility and efficiency of user portrait offline data processing by the generalized breakpoint download processing.
In some embodiments, referring to fig. 14, fig. 14 is an optional flowchart of a user portrait offline data processing method provided in an embodiment of the present application. Based on the general download method in fig. 10 or fig. 12, in S201, downloading the target file data corresponding to the target download data file name to the local in the target download manner from the target download address may be further specifically implemented through S301 to S303, which will be described with reference to the steps.
S301, when the downloading general method comprises downloading failure retry, if the ith data file in at least one data file fails to download, acquiring the downloading retry times corresponding to the ith data file; wherein i is a positive integer greater than or equal to 1.
In this embodiment of the application, when downloading of the ith data file in the at least one data file fails, the user image offline data processing apparatus may obtain the download retry number corresponding to the ith data file from the download retry record recorded in real time, and determine whether the download retry number exceeds a preset retry number threshold.
S302, when the current loading trial frequency exceeds a preset retry frequency threshold value, recording the ith data file to a download failure file list, and starting downloading the (i + 1) th file.
In the embodiment of the application, when the current loading trial frequency exceeds the preset retry frequency threshold, the user portrait offline data processing device does not perform download retry on the ith data file any more, records the ith data file to the download failure file list, and starts downloading the (i + 1) th file.
And S303, when the current loading trial frequency does not exceed the preset retry frequency threshold value, re-downloading the ith data file in a target downloading mode by using a downloading general method until the downloading of at least one data file is completed to obtain target file data.
In this embodiment of the present application, when the download retry number does not exceed the threshold of the preset retry number, the user portrait offline data processing apparatus adds one download retry number to the download retry record of the ith data file, and initiates a download retry again for the ith data file according to the same target download manner. The user portrait off-line data processing device carries out the same downloading processing on the ith data file and the (i + 1) th file until the downloading of at least one data file is completed, and target file data are obtained.
In some embodiments, based on fig. 14, before S302, S401-S402 may also be performed, which will be described in conjunction with the steps.
S401, when the general downloading method comprises a global downloading failure alarm, acquiring the number of the downloading failure files in the downloading failure file list.
In the embodiment of the application, the user portrait offline data processing device can also monitor the number of the download failure files in the general data processing flow. When the downloading of the ith data file fails, the user image offline data processing device can acquire the number of the downloading failure files in the downloading failure file list before judging whether the downloading retry number exceeds the preset retry number threshold value, and judge whether the number of the downloading failure files exceeds the preset failure number threshold value.
S402, when the number of the downloading failure files exceeds a preset failure number threshold value, the downloading of the portrait of the user to be downloaded is stopped, and an alarm prompt is given.
In the embodiment of the application, when the number of the download failure files exceeds the preset failure number threshold value, the user portrait offline data processing device timely stops downloading the user portrait to be downloaded, and prompts an operator to timely eliminate a large number of failures causing download failure.
It can be understood that, in the embodiment of the present application, the user portrait offline data processing apparatus may implement and package functions of breakpoint download, download retry number alarm, and download file failure number alarm in advance in a general processing flow corresponding to a download process, so that user portrait offline data processing tasks of different processing targets may implement uniform and fixed procedural steps in the download process by using the pre-package and download function in the general processing flow in the respective download process, thereby increasing efficiency and maintainability of user portrait offline data processing.
S1043, when the preset processing service is a processing process, preprocessing the user portrait to be processed according to a processing general method in the general processing flow to obtain a preprocessed user portrait; calling a processing method corresponding to a target processing method name in the target task data to process the pre-processed user portrait to obtain a processed user portrait, so that when the preset processing service is an output process, the processed user portrait is output to a specified path according to target output configuration information in the target task data; the processing general method is a preset data processing method in the preset data processing methods.
In some embodiments, referring to fig. 15, fig. 15 is an alternative flowchart of a data processing method provided in the embodiments of the present application. Based on fig. 10, the general objective handling strategy includes: the task allocation policy and the result monitoring policy, and the process of S1043 may be implemented through S501 to S505, and will be described with reference to each step.
S501, acquiring a user portrait to be processed through a preset data processing interface.
In the embodiment of the application, the user portrait offline data processing device can acquire the user portrait to be processed, which is transmitted from the download layer, through the preset data processing interface.
In some embodiments, the preset data processing interface may be implemented by a data processor interface, as shown in code example 1:
in code example 1, the DataHandler interface is a data handler interface for passing the user representation to be processed to the data handler DataHandler. The Handle is a data processing method contained in the data processor, and the data processor DataHandler can contain handles with different functions to realize diversified processing of the user images to be processed. The data handler DataHandler may illustratively include a Handle1 for cleaning user images, a Handle2 for filtering user images, and a Handle3 for encrypting user images, among others. The batch data is a user portrait to be processed; TaskConf is target task data; the TaskConf includes a target processing method name for specifying a target data processing method, i.e., a target Handle, among the plurality of handles included in the datahandler. interface { } is a method structure body contained in each data processing method Handle and used for realizing specific data processing logic.
In some embodiments, the target processing method name may be configured via a task _ hand ler field in at least one configuration item.
S502, batching at least one data file in the user image to be processed according to a task allocation strategy to obtain a preprocessed user image; the pre-processed user image includes at least one batch of data files.
In the embodiment of the application, in order to improve the data processing efficiency, the user portrait offline data processing device may batch at least one data file in the user portrait to be processed according to the task allocation policy, so as to obtain a preprocessed user portrait including at least one batch of data files.
It should be noted that, in the embodiment of the present application, the user portrait offline data processing apparatus may also receive the task allocation policy through at least one configuration item control, that is, the task allocation policy is used as configuration information and configured by a developer.
S503, calling a target data processing method through the target processing method name, and performing concurrent processing on at least one batch of data files to obtain the processing result of each batch of data files.
In the embodiment of the application, the user portrait offline data processing device can call the target data processing method through the name of the target processing method, and process at least one batch of data files in parallel through at least one target processing method to obtain the processing result of each batch of data files.
In some embodiments, the name of the target processing method may be Handle in code example 1, the target data processing method may be interface { } method structure, and the user image offline data processing apparatus may call, through Handle, the corresponding interface { } method structure in Handle to run an ExecuteTask method function in the interface { } method structure to process the user image to be processed, as shown in code example 2:
type Tasker interface{
v/performing data processing
ExecuteTask(ctx*cat.Context,conf*TaskConf)error
}
In the above example, the ExecuteTask is a method function that is specifically implemented in the interface { } structure of the data processing method and is used to implement different processing logics, and after the user image offline data processing device processes the user image to be processed through the ExecuteTask, the processing result is returned to the data processor DataHandler.
In the embodiment of the application, when the ExecutTask is operated to successfully process the user image to be processed, the processed file is returned, otherwise, failure information is returned.
In the embodiment of the present application, the preset data processing interface and the target processing method name need to be registered in the user portrait offline data processing apparatus in advance, so that the user portrait offline data processing apparatus can normally call the preset data processing interface and the target processing method name when executing the target task. In some embodiments, the registration of the default data processing interface and the target processing method name may be implemented by the following TaskHandle method:
TaskHandler pool.RegisterHandlerProxy("toufangBasicPortraitHandler",api.NewTaskHandleProxy(NewXXXTaskHandler()))
it should be noted that, in the embodiment of the present application, an ExecuteTask method function with different functions may also be implemented on the basis of a template, and may be utilized
S504, counting the total processing result of at least one batch of data files according to the result monitoring strategy.
In the embodiment of the application, when the result monitoring is needed, the user portrait offline data processing device counts the total processing result of at least one batch of data files according to the result monitoring strategy.
And S505, in the total processing result, when the number of the failure results does not exceed a preset processing failure threshold value, generating a processed user portrait according to the processing result of each batch of data files, so that when the preset processing service is an output process, outputting the processed user portrait to a specified path according to target output configuration information in the target task data.
In the embodiment of the application, when the user portrait offline data processing device can count the processing results according to a preset result monitoring strategy, and when the number of the failure results does not exceed a preset processing failure threshold value, the user portrait offline data processing device can generate the processed user portrait according to the processing results of each batch of data files.
In the embodiment of the application, in the output process stage of the preset processing service, the user portrait offline data processing device outputs the processed user portrait to a storage database, kv, or as a recommended recall index, for model training, and the like according to the target output configuration information in the target task data, and the selection is specifically performed according to the actual situation, which is not limited in the embodiment of the application.
In some embodiments, after the statistics of the processing result according to the result monitoring policy in S505, S506 may be further included, as follows:
s506, when the number of the failure results exceeds the preset processing failure threshold value, the processed user portrait is not generated and an alarm prompt is given.
In the embodiment of the application, when the number of the failure results exceeds the preset processing failure threshold value, it is indicated that too many error results are generated in the current customized data processing flow, and the user portrait offline data processing device does not generate the processed user portrait and gives an alarm to prompt an operator so that the operator can perform troubleshooting in time.
It can be understood that, the user portrait offline data processing device may encapsulate different implementation logics in a customized processing flow of a processing layer through a preset data processing interface (DataHandler data processor interface) and a target processing method name, and cover a commonalization processing process of the processing layer based on a general task allocation policy and a result monitoring policy, thereby improving efficiency and maintainability of user portrait offline data processing.
In some embodiments, based on fig. 7, after S102, S601 may be further included, as follows:
s601, in the process of running the target task through the preset processing service, when a starting instruction aiming at the second task is received, loading the second task to the preset processing service, and running the target task and the second task simultaneously in the preset processing service.
In the embodiment of the application, when a processing target is newly added to an already-running preset processing service, new configuration information corresponding to the new processing target can be received through the task configuration interface to generate a second task, the preset processing service can automatically hot-load and execute the second task without restarting and releasing the second task, and the efficiency of processing the user portrait offline data is improved.
Next, an exemplary application of the embodiment of the present application in a practical application scenario will be described.
Referring to fig. 16, fig. 16 is an alternative flowchart illustrating a user portrait offline data processing method according to an embodiment of the present application. In fig. 16, the task corresponds to the target task, and the task arrangement information corresponds to the target task data, and the description will be given with reference to the respective steps.
And S701, starting the process.
In S701, the user portrait offline data processing apparatus starts a process in a preset processing service to implement a task to be loaded through the process.
S702, checking whether a new task is added.
In S702, the user portrait offline data processing apparatus periodically detects whether a new task needs to be added to the process for execution during the process of starting the process or running the loaded task. When the new task is detected to be added, executing S703 to realize hot loading of the new task; when no new task is added, execution S734 continues executing the current loaded task.
And S703, realizing task hot loading.
In S703, the newly added task of the user portrait offline data processing apparatus is directly added to the processing queue of the currently started process, and the newly added task is processed through the currently started process without restarting the process, thereby implementing hot loading of the task.
S704, starting a coroutine to load task configuration information.
In S704, when the process calls the newly added task in the processing queue and starts processing the newly added task, the user portrait offline data processing apparatus starts a co-process in the process with the newly added task as the current task, and loads task configuration information corresponding to the current task through the co-process.
S705, whether the task is executed immediately.
In S705, the user portrait offline data processing apparatus determines whether the current task needs to be executed immediately based on the field included in the loaded task configuration information. Illustratively, when the at _ once _ exec field in table 1 is configured as 1, the characterization of the current task needs to be performed immediately.
In S705, when the task needs to be executed immediately, go to S706; otherwise, for the task started at regular time, the process jumps to S707.
And S706, starting executing the task.
In step S706, when the current task needs to be executed immediately, the user portrait offline data processing apparatus starts executing the current task and starts a timing task corresponding to the current task at the same time. The timing task is described in S707.
And S707, starting the timing task according to the timing task expression.
In S707, the user portrait offline data processing apparatus reads a timed task expression, such as a task _ cron field in the identification table 1, from the task configuration information of the current task, and starts the timed task according to the timed task expression.
Illustratively, when a user portrait on the HDFS needs to be downloaded and processed periodically, a timed task expression configured as weekly or monthly can be received through a data update period configuration item of a task configuration interface; the user portrays the off-line data processing device, so that the timing task corresponding to the current task which is started periodically every week or every month can be correspondingly set and started according to the timing task expression in the task configuration information.
S708, checking whether the task configuration is valid.
In S708, the user portrait offline data processing apparatus performs validity check on the loaded task configuration information to avoid unrecognizable illegal fields or unreasonable values in the task configuration information. When the validity check passes, S710 is performed, otherwise, S709 is performed.
And S709, exiting the task.
In S709, when the validity check fails, the data device cannot execute the current task according to the task configuration information that does not satisfy the validity, and the user portrait offline data processing device exits the current task.
And S710, locking the current task.
In S710, when the task configuration information passes the validity check, the user portrait offline data processing apparatus locks the current task to unlock the running resource protection of the current task. And the user portrait offline data processing device analyzes the download address of the user portrait to be downloaded, namely the user portrait from the task configuration information of the current task, and calculates the real data storage path of the user portrait to be downloaded according to the download address.
S711, check whether the download has been completed.
In S711, before formally starting downloading, the user portrait offline data processing apparatus first checks a file path, such as hdfs _ finish _ checked _ file in table 1, according to data integrity included in the task configuration information, and determines whether the user portrait to be downloaded corresponding to the current task has been downloaded by accessing whether an integrity check file is generated under the data integrity check file path. When the download has been completed, S712 is performed, otherwise, S714 is performed.
And S712, judging whether re-execution is needed.
In S712, when the user portrait to be downloaded has been downloaded, the user portrait offline data processing apparatus determines whether the current task needs to be re-executed according to whether the fields in the task configuration information are re-downloaded. If yes, S717 is executed, the user portrait to be downloaded corresponding to the current task is re-downloaded and processed with data, otherwise S713 is executed.
And S713, exiting the task.
In step S713, when the current task has been downloaded and does not need to be re-executed, the user portrait offline data processing apparatus exits the task.
And S714, judging whether the task is an interrupt task.
In S714, when the current task is not downloaded and needs to be re-executed, the user portrait offline data processing apparatus obtains the last operation result of the current task from the operation log of the current task, and when the last operation result indicates that the task is not completed and interrupted, the user portrait offline data processing apparatus determines that the current task is an interrupted task, and executes S715, otherwise, executes S716.
And S715, calculating the interrupt position, and positioning to the next file to be downloaded and the file list to be downloaded.
In S715, the user portrait offline data processing apparatus obtains historical download data in the download saving path according to the download saving path included in the task configuration information, determines an interruption position where the current task was last interrupted according to the historical download data and the user portrait to be downloaded, and locates a next file to be downloaded in the user portrait to be downloaded and a list of remaining files to be downloaded according to the interruption position.
S716, acquiring all file lists needing to be downloaded.
In S717, if the task is not an interrupted task, all files of the user portrait to be downloaded need to be downloaded, and the user portrait offline data processing device obtains all the file lists that need to be downloaded.
Next, a description will be given taking as an example that the target download method in the task configuration information is single file download.
And S717, traversing all the files.
In S717, for each file in the user portrait to be downloaded, the user portrait offline data processing device processes each file by adopting the process in S717-S732, and the user portrait offline data processing device processes all files in the user portrait to be downloaded in the same processing mode and the same processing flow, so that traversal of all files is finally completed.
And S718, downloading the single file.
In S718, the user portrait offline data processing apparatus first starts downloading a file of the user portrait to be downloaded, and uses the file as the current file.
And S719, judging whether the downloading is successful.
In step S719, the user portrait offline data processing apparatus determines whether the current file is successfully downloaded. If yes, go to 724; otherwise, S720 is performed.
S720, whether the download failure file exceeds a preset failure number threshold value or not.
In S720, when the current file fails to be downloaded, the user image offline data processing apparatus obtains the number of files that failed to be downloaded from the download failure file list counted in real time, and determines whether the number of files that failed to be downloaded exceeds a preset failure number threshold. If so, perform S721, otherwise, perform S722.
And S721, notifying a responsible person by an alarm.
In S721, when the number of files failing to be downloaded exceeds the threshold of the preset number of failures, it indicates that an abnormal situation of a large number of files failing to be downloaded occurs, the user portrait offline data processing device stops downloading the user portrait to be downloaded, and gives an alarm to notify a responsible person, such as a developer or an operation and maintenance person, to remind the responsible person to perform troubleshooting in time.
And S722, whether the preset retry number threshold value is exceeded.
In S722, when the number of files that failed to be downloaded does not exceed the threshold of the preset number of failures, the user portrait offline data processing apparatus determines whether the number of download retries of the current file exceeds the threshold of the preset number of retries. If so, executing S723, otherwise, initiating a download retry, returning to S718 to start downloading the current file again.
And S723, recording the file to a download failure file list.
In S723, when the number of download retries of the user portrait offline data processing apparatus to the current file exceeds the preset retry number threshold, it indicates that the number of download retries to the current file has been used up, and the user portrait offline data processing apparatus does not initiate download retries to the current file any more, records the current file to a download failure file list, and starts to download and process the next file.
And S724, transmitting the file into a data processor through a data processor interface, calling a target data processing method in the data processor through the target data processing method handle, and receiving a processing result returned by the target data processing method.
In S724, when the current file is downloaded successfully, the user image offline data processing apparatus transmits the current file to the data processor through the data processor interface.
In S724, the data processor interface is equivalent to the predetermined data processing interface, and the target data processing method handle is equivalent to the target processing method name. The data processor may contain a plurality of data processing methods, each of which may be invoked via a handle to the data processing method. The data processor may determine a target data processing method among multiple data processing methods in the data processor to call according to a target data processing method handle included in the task configuration information, for example, an interface { } structure in a task _ handler field in table 1.
And S725, processing the file by using the target data processing method and returning a processing result.
In S725, the user portrait offline data processing apparatus performs specific differential data processing procedures, such as screening, portrait cleaning, portrait stitching, and the like, on the current file according to the target task processing method, to obtain a processing result, and returns the processing result to the data processor. And when the processing is successful, returning the processed file, and when the processing is failed, returning processing failure information.
In addition, for the case of full download, the user image offline data processing apparatus may perform one-time full download on the user image to be downloaded corresponding to the current task, so as to obtain at least one download file. According to the batch file number contained in the task configuration information, illustratively, as shown in a batch _ size field in table 1, the user portrait offline data processing device batches at least one downloaded file to obtain at least one batch of downloaded files, wherein the number of each batch of downloaded files is the number specified in the batch file number, and then performs concurrent processing on the at least one batch of files to obtain a processed user portrait corresponding to the current task.
And S726, counting the success and failure times of data processing.
In S726, the user portrait offline data processing apparatus counts the processing results received by the data processor to monitor the data processing process.
S727, judging whether the number of the files which fail to be processed exceeds a preset processing failure threshold value.
In step S727, the user portrait offline data processing apparatus determines whether the number of failed processing files exceeds a preset failure processing threshold, for example, a preset failure percentage threshold. If so, go to S728, otherwise, go to S729.
And S728, warning and informing the responsible person.
In S728, when the number of the failed files exceeds the preset failure threshold, it indicates that an abnormal situation of a large number of failed files occurs, the user portrait offline data processing apparatus stops processing the current task, and gives an alarm to notify a responsible person, such as a developer or an operation and maintenance person, to remind the responsible person to perform troubleshooting in time.
S729, judging whether to keep the current download file.
In S729, when the processing of the current file is completed, the user image offline data processing apparatus deletes the download data field according to whether the data processing in the task configuration information is completed, for example, as the delete _ download _ file field in table 1, determines whether the download file corresponding to the currently processed file needs to be retained. If so, go to step S732, otherwise, go to step S730.
And S730, deleting the file.
In S730, the user portrait offline data processing apparatus corresponds to the downloaded file corresponding to the processed file.
S731, judging whether all files are processed completely.
In S731, the user image offline data processing device can be based on the number of files currently processed. And judging whether all the files corresponding to the current task are processed completely. If so, executing S732, otherwise, continuing to step 717, and starting downloading and processing the next file.
And S732, releasing the lock and finishing the task processing.
In S732, when all the files corresponding to the current task are processed, the user portrait offline data processing apparatus releases the task lock for the current task, and marks the execution result of the task as execution completion.
It can be understood that, by the method flow shown in fig. 16, the user portrait offline data processing apparatus can implement the logic of specific processing of data that is difficult to unify among different requirements by the methods corresponding to S724 and S725, and can implement the complex and streamlined data processing work by the generic processing flow of other steps, thereby greatly reducing the workload of converting the requirements into implementation schemes, avoiding the problems of increased failures and difficult maintenance that may be introduced by using different implementation schemes for the generic processing flow of the same function, and improving the efficiency, accuracy and maintainability of user portrait offline data processing. By applying the method provided by the embodiment of the application in production practice, the development period of the original user portrait offline data processing can be greatly shortened from 2 days to less than 0.5 day, and meanwhile, the fault risk is reduced.
Continuing with the exemplary structure of user representation offline data processing apparatus 455 provided by the embodiments of the present application as software modules, in some embodiments, as shown in fig. 6, the software modules stored in user representation offline data processing apparatus 455 of memory 450 may include:
a receiving module 4551, configured to receive configuration information through a preset external interface; the configuration information is used for configuring different processing modes for the user portrait to be processed of different processing targets; the user image to be processed contains behavior characteristic information of the user;
a generating module 4552, configured to generate a target task corresponding to the user image to be processed according to the configuration information;
a loading module 4553, configured to load the target task into a preset processing service when a start instruction for the target task is received; the preset processing service is used for executing a preset processing flow, and the preset processing flow comprises a customized processing flow and a general processing flow; the general class processing flow is used for executing the same preset data processing process on the user portrait to be processed of different processing targets;
and the processing module 4554 is configured to execute a corresponding customized processing flow according to the target task and execute the general processing flow according to a preset data processing method in an operation process of the preset processing service, so as to obtain a processed user portrait.
In some embodiments, the preset external interface is a task configuration interface, and the receiving module 4551 is further configured to, on the task processing interface, enter the task configuration interface in response to an operation instruction for a preset configuration interface entry; displaying at least one configuration item control on the task configuration interface, and receiving at least one externally input configuration item data through the at least one configuration item control; taking the at least one configuration item data as the configuration information.
In some embodiments, the generating module 4552 is further configured to jump from the task configuration interface to the task processing interface in response to an operation instruction for a preset issue control on the task configuration interface; generating the target task according to the configuration information, wherein the target task carries target task data; and displaying the target task in a task processing list of the task processing interface.
In some embodiments, the pre-set processing service comprises: the processing module 4554 is further configured to read target task data from the target task; when the preset processing service is the downloading process, determining a user portrait to be downloaded according to target downloading source information in the target task data; downloading the user portrait to be downloaded to the local according to a target downloading mode in the target task data by using a downloading general method to obtain the user portrait to be processed; the downloading general method is a preset data downloading method in the preset data processing methods; when the preset processing service is the processing process, preprocessing the user portrait to be processed according to a processing general method in the general processing flow to obtain a preprocessed user portrait; calling a processing method corresponding to a target processing method name in the target task data to process the pre-processed user portrait to obtain a processed user portrait, so that when the preset processing service is the output process, the processed user portrait is output to a specified path according to target output configuration information in the target task data, and processing of the user portrait to be processed is achieved; the processing general method is a preset data processing method in the preset data processing methods.
In some embodiments, the target download source information comprises: target download address and target download data file name; the target task data further includes: a target address of a download integrity check file, and the processing module 4554 is further configured to determine at least one data file containing a file name of the target download data in the target download address as a user portrait to be downloaded; downloading the user image to be downloaded to the local in the target downloading mode by using the downloading general method to obtain target file data; the target downloading mode comprises any one of single file downloading and full downloading; carrying out integrity check on the target file data through the target address of the download integrity check file; and when the integrity check is passed, taking the target file data as the user portrait to be processed.
In some embodiments, the target task data further comprises: downloading and saving the path; the processing module 4554 is further configured to, when the download general method includes breakpoint download, use the download general method to download the user image to be downloaded locally in the target download manner, and before obtaining target file data, check a file target address according to the download integrity, and determine whether the user image to be downloaded completes downloading; when the user portrait to be downloaded is not downloaded, acquiring historical execution information of the target task, and continuously judging whether the target task is an interrupted task or not according to the historical execution information; the historical execution information is used for recording the last operation result of the target task; when the target task is the interrupt task, obtaining historical download data of the user portrait to be downloaded from the download storage path, calculating an interrupt position according to the historical download data, and determining residual download data corresponding to the user portrait to be processed according to the interrupt position; and determining the residual download data as the user portrait to be downloaded.
In some embodiments, the processing module 4554 is further configured to, when the download general method includes retry of download failure, if download of an ith data file in the at least one data file fails, obtain a number of download retries corresponding to the ith data file; wherein i is a positive integer greater than or equal to 1; when the download retry time exceeds a preset retry time threshold value, recording the ith data file to a download failure file list, and starting downloading the (i + 1) th file; and when the download retry times do not exceed the preset retry time threshold, using the download general method to re-download the ith data file in the target download mode until the download of the at least one data file is completed, so as to obtain the target file data.
In some embodiments, the processing module 4554 is further configured to, when the general download method includes a global download failure alarm, obtain the number of download failure files in the download failure file list before obtaining the download retry number corresponding to the ith data file; and when the number of the download failure files exceeds a preset failure number threshold value, terminating the download of the user portrait to be downloaded and giving an alarm prompt.
In some embodiments, the generic targeting policy comprises: a task allocation strategy and a result monitoring strategy; the processing module 4554 is further configured to obtain the user portrait to be processed through a preset data processing interface; batching at least one data file in the user portrait to be processed according to the task allocation strategy to obtain the preprocessed user portrait; the pre-processed user portraits comprise at least one batch of data files; calling a target data processing method through the target processing method name, and performing concurrent processing on the at least one batch of data files to obtain a processing result of each batch of data files; counting the total processing result of the at least one batch of data files according to the result monitoring strategy; and in the total processing result, when the number of the failure results does not exceed a preset processing failure threshold value, generating the processed user portrait according to the processing result of each batch of data files.
In some embodiments, the processing module 4554 is further configured to, after the statistics is performed on the total processing results of the at least one batch of data files according to the result monitoring policy, when the number of the failure results exceeds a preset processing failure threshold, not generate the processed user portrait and perform an alarm prompt.
In some embodiments, the loading module 4553 is further configured to, after the target task is loaded into a preset processing service when the start instruction for the target task is received, load a second task into the preset processing service when the start instruction for the second task is received in the process of running the target task through the preset processing service, and run the target task and the second task simultaneously in the preset processing service.
In some embodiments, the at least one configuration item data further includes task execution cycle information, process deployment information, and host deployment information, where the task execution cycle information is used to configure a repeated execution cycle of the target task; the process deployment information is used for deploying a target process for executing the target task in the preset processing service; the host deployment information is used for deploying a target host address for running the preset processing service.
It should be noted that the above description of the embodiment of the apparatus, similar to the above description of the embodiment of the method, has similar beneficial effects as the embodiment of the method. For technical details not disclosed in the embodiments of the apparatus of the present application, reference is made to the description of the embodiments of the method of the present application for understanding.
Embodiments of the present application provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and executes the computer instructions, so that the computer device executes the user portrait offline data processing method described in this embodiment of the present application.
Embodiments of the present application provide a computer-readable storage medium having stored thereon executable instructions that, when executed by a processor, cause the processor to perform a user representation offline data processing method provided by embodiments of the present application, for example, the method shown in fig. 7-8, 10, 12-16.
In some embodiments, the computer-readable storage medium may be memory such as FRAM, ROM, PROM, EPROM, EEPROM, flash, magnetic surface memory, optical disk, or CD-ROM; or may be various devices including one or any combination of the above memories.
In some embodiments, executable instructions may be written in any form of programming language (including compiled or interpreted languages), in the form of programs, software modules, scripts or code, and may be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.
By way of example, executable instructions may correspond, but do not necessarily have to correspond, to files in a file system, and may be stored in a portion of a file that holds other programs or data, such as in one or more scripts in a hypertext Markup Language (HTML) document, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code).
By way of example, executable instructions may be deployed to be executed on one computing device or on multiple computing devices at one site or distributed across multiple sites and interconnected by a communication network.
To sum up, according to the embodiments of the present application, the user portrait offline data processing apparatus receives different configuration information for different processing targets through the preset external interface, generates a target task according to the configuration information, further performs common data processing on the processing data through the preset processing procedure in the preset processing flow, and performs corresponding differentiated processing through the target task, so as to decouple the common data processing method and the differentiated data processing method, and when facing target tasks with different requirements, the common part of the requirements is uniformly realized by using the general processing flow, so that when the general processing flow needs to be uniformly upgraded and maintained, the modified content can be synchronized to the user offline data processing tasks with different processing targets by only once modifying the preset general data processing method, thereby improving the maintainability of the user portrait off-line data processing; and only the configuration data corresponding to the differentiated parts with different requirements are used for generating the target task, so that the data amount from the requirement to the realization of the required conversion is reduced, the conversion speed is increased, and the efficiency of processing the offline data of the user portrait is improved.
The above description is only an example of the present application, and is not intended to limit the scope of the present application. Any modification, equivalent replacement, and improvement made within the spirit and scope of the present application are included in the protection scope of the present application.
Claims (15)
1. A user portrait offline data processing method is characterized by comprising the following steps:
receiving configuration information through a preset external interface; the configuration information is used for configuring different processing modes for the user portrait to be processed of different processing targets; the user image to be processed contains behavior characteristic information of the user;
generating a target task corresponding to the user image to be processed according to the configuration information;
when a starting instruction for the target task is received, loading the target task into a preset processing service; the preset processing service is used for executing a preset processing flow, and the preset processing flow comprises a customized processing flow and a general processing flow; the general class processing flow is used for executing the same preset data processing process on the user portrait to be processed of different processing targets;
and in the running process of the preset processing service, executing a corresponding customized processing flow according to the target task, and executing the general processing flow according to a preset data processing method to obtain a processed user portrait.
2. The method of claim 1, wherein the predetermined external interface is a task configuration interface, and the receiving configuration information through the predetermined external interface comprises:
on the task processing interface, responding to an operation instruction aiming at a preset configuration interface inlet, and entering a task configuration interface;
displaying at least one configuration item control on the task configuration interface, and receiving at least one externally input configuration item data through the at least one configuration item control;
taking the at least one configuration item data as the configuration information.
3. The method according to claim 2, wherein the generating a target task corresponding to the user image to be processed according to the configuration information comprises:
on the task configuration interface, responding to an operation instruction aiming at a preset issuing control, and jumping from the task configuration interface to the task processing interface;
generating the target task according to the configuration information, wherein the target task carries target task data; and displaying the target task in a task processing list of the task processing interface.
4. The method of claim 3, wherein the pre-set processing service comprises: in the running process of the preset processing service, executing a corresponding customized processing flow according to the target task, and executing the general processing flow according to a preset data processing method to obtain a processed user portrait, including:
reading target task data from the target task;
when the preset processing service is the downloading process, determining a user portrait to be downloaded according to target downloading source information in the target task data; downloading the user portrait to be downloaded to the local according to a target downloading mode in the target task data by using a downloading general method to obtain the user portrait to be processed; the downloading general method is a preset data downloading method in the preset data processing methods;
when the preset processing service is the processing process, preprocessing the user portrait to be processed according to a processing general method in the general processing flow to obtain a preprocessed user portrait; calling a processing method corresponding to a target processing method name in the target task data to process the pre-processed user portrait to obtain a processed user portrait, so that when the preset processing service is the output process, the processed user portrait is output to a specified path according to target output configuration information in the target task data; the processing general method is a preset data processing method in the preset data processing methods.
5. The method of claim 4, wherein the target download source information comprises: target download address and target download data file name; the target task data further includes: downloading an integrity check file target address; determining a user portrait to be downloaded according to target downloading source information in the target task data; and downloading the user portrait to be downloaded to the local according to a target downloading mode in the target task data by using a downloading general method to obtain the user portrait to be processed, wherein the downloading general method comprises the following steps:
determining at least one data file containing the target download data file name in the target download address as a user portrait to be downloaded;
downloading the user image to be downloaded to the local in the target downloading mode by using the downloading general method to obtain target file data; the target downloading mode comprises any one of single file downloading and full downloading;
carrying out integrity check on the target file data through the target address of the download integrity check file;
and when the integrity check is passed, taking the target file data as the user portrait to be processed.
6. The method of claim 5, wherein the target task data further comprises: downloading and saving the path; before the downloading the user image to be downloaded locally in the target downloading mode by using the downloading general method to obtain the target file data, the method further comprises:
when the downloading general method comprises breakpoint downloading, checking a file target address according to the downloading integrity, and judging whether the user portrait to be downloaded is downloaded;
when the user portrait to be downloaded is not downloaded, acquiring historical execution information of the target task, and continuously judging whether the target task is an interrupted task or not according to the historical execution information; the historical execution information is used for recording the last operation result of the target task;
when the target task is the interrupt task, obtaining historical download data of the user portrait to be downloaded from the download storage path, calculating an interrupt position according to the historical download data, and determining residual download data corresponding to the user portrait to be processed according to the interrupt position;
and determining the residual download data as the user portrait to be downloaded.
7. The method according to claim 5 or 6, wherein the downloading the user image to be downloaded locally in the target downloading manner by using the downloading generic method to obtain target file data comprises:
when the downloading general method comprises downloading retry failure, if the ith data file in the at least one data file fails to download, acquiring the downloading retry times corresponding to the ith data file; wherein i is a positive integer greater than or equal to 1;
when the download retry time exceeds a preset retry time threshold, recording the ith data file to a download failure file list, and starting downloading the (i + 1) th file;
and when the download retry times do not exceed the preset retry time threshold, using the download general method to re-download the ith data file in the target download mode until the download of the at least one data file is completed, so as to obtain the target file data.
8. The method of claim 7, wherein before obtaining the download retry number corresponding to the ith data file, the method further comprises:
when the downloading general method comprises a global downloading failure alarm, acquiring the number of the downloading failure files in the downloading failure file list;
and when the number of the download failure files exceeds a preset failure number threshold value, terminating the download of the user portrait to be downloaded and giving an alarm prompt.
9. The method according to any of claims 4-8, wherein the generic object handling policy comprises: a task allocation strategy and a result monitoring strategy; preprocessing the user portrait to be processed according to a processing general method in the general type processing flow to obtain a preprocessed user portrait; and calling a processing method corresponding to the target processing method name in the target task data to process the pre-processed user portrait to obtain a processed user portrait, wherein the processing method comprises the following steps:
acquiring the user portrait to be processed through a preset data processing interface;
batching at least one data file in the user portrait to be processed according to the task allocation strategy to obtain the preprocessed user portrait; the pre-processed user portraits comprise at least one batch of data files;
calling a target data processing method through the target processing method name, and performing concurrent processing on the at least one batch of data files to obtain a processing result of each batch of data files;
counting the total processing result of the at least one batch of data files according to the result monitoring strategy;
and in the total processing result, when the number of the failure results does not exceed a preset processing failure threshold value, generating the processed user portrait according to the processing result of each batch of data files.
10. The method of claim 9, wherein after counting the total processing results of the at least one batch of data files according to the result monitoring policy, the method further comprises:
and when the number of the failure results exceeds a preset processing failure threshold value, the processed user portrait is not generated and an alarm prompt is given.
11. The method according to any one of claims 1 to 10, wherein after the target task is loaded into a preset processing service upon receiving a start instruction for the target task, the method further comprises:
in the process of running the target task through the preset processing service, when a starting instruction for a second task is received, the second task is loaded to the preset processing service, and the target task and the second task are run simultaneously in the preset processing service.
12. The method according to any one of claims 4 to 8, wherein the at least one configuration item data further includes task execution cycle information, process deployment information, and host deployment information, wherein the task execution cycle information is used for configuring a repeated execution cycle of the target task; the process deployment information is used for deploying a target process for executing the target task in the preset processing service; the host deployment information is used for deploying a target host address for running the preset processing service.
13. A user representation offline data processing apparatus, comprising:
the receiving module is used for receiving the configuration information through a preset external interface; the configuration information is used for configuring different processing modes for the user portrait to be processed of different processing targets; the user image to be processed contains behavior characteristic information of the user;
the generating module is used for generating a target task corresponding to the user image to be processed according to the configuration information;
the loading module is used for loading the target task into a preset processing service when a starting instruction aiming at the target task is received; the preset processing service is used for executing a preset processing flow, and the preset processing flow comprises a customized processing flow and a general processing flow; the general class processing flow is used for executing the same preset data processing process on the user portrait to be processed of different processing targets;
and the processing module is used for executing a corresponding customized processing flow according to the target task and executing the general processing flow according to a preset data processing method in the running process of the preset processing service to obtain a processed user portrait.
14. An electronic device, comprising:
a memory for storing executable instructions;
a processor for implementing the method of any one of claims 1 to 12 when executing executable instructions stored in the memory.
15. A storage medium storing executable instructions for performing the method of any one of claims 1 to 12 when executed by a processor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011194093.8A CN114443235A (en) | 2020-10-30 | 2020-10-30 | User portrait offline data processing method, device and equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011194093.8A CN114443235A (en) | 2020-10-30 | 2020-10-30 | User portrait offline data processing method, device and equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114443235A true CN114443235A (en) | 2022-05-06 |
Family
ID=81357858
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011194093.8A Pending CN114443235A (en) | 2020-10-30 | 2020-10-30 | User portrait offline data processing method, device and equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114443235A (en) |
-
2020
- 2020-10-30 CN CN202011194093.8A patent/CN114443235A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110928529B (en) | Method and system for assisting operator development | |
US11467952B2 (en) | API driven continuous testing systems for testing disparate software | |
US11934301B2 (en) | System and method for automated software testing | |
CN105164644B (en) | Hook frame | |
US11681512B2 (en) | Industrial automation smart object inheritance | |
CN111580926A (en) | Model publishing method, model deploying method, model publishing device, model deploying device, model publishing equipment and storage medium | |
CN110727575B (en) | Information processing method, system, device and storage medium | |
CN115454869A (en) | Interface automation test method, device, equipment and storage medium | |
CN117251189A (en) | Script program upgrading method, script program upgrading device, computer equipment and storage medium | |
US20230152790A1 (en) | System model smart object configuration | |
CN111523676B (en) | Method and device for assisting machine learning model to be online | |
KR20240047468A (en) | ECU upgrade method and device, and readable storage medium | |
CN112000334A (en) | Page development method, device, server and storage medium | |
CN111245917B (en) | Katalon-based work order entry device and implementation method thereof | |
CN114443235A (en) | User portrait offline data processing method, device and equipment and storage medium | |
CN117149266A (en) | Task processing method and device, storage medium and electronic equipment | |
CN111258618A (en) | File configuration method and device, computer equipment and storage medium | |
US20220292457A1 (en) | Industrial automation smart object inheritance break and singleton creation | |
CN113610242A (en) | Data processing method and device and server | |
CN117707917A (en) | Service testing method, device, medium and product | |
CN114327709A (en) | Control page generation method and device, intelligent device and storage medium | |
Islam et al. | Framework for automation of cloud-application testing using selenium (facts) | |
CN116627392B (en) | Model development method and system based on interactive IDE | |
US20240370236A1 (en) | Managing an app, developing an app including an event artifact, method, and system | |
US20230046732A1 (en) | Industrial automation smart object parent/child data collection propagation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |