CN114219967A

CN114219967A - Production data processing method, device, equipment and storage medium

Info

Publication number: CN114219967A
Application number: CN202111390312.4A
Authority: CN
Inventors: 许晓文; 聂磊; 程鸿; 朱名发
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-11-22
Filing date: 2021-11-22
Publication date: 2022-03-22

Abstract

The disclosure provides a method, a device, equipment and a storage medium for processing production data, and relates to the field of computer data processing, in particular to the field of production data processing. The specific implementation scheme is as follows: collecting target production data and potential target production data of production data generated under a specified scene and output results of the production models in a target threshold interval from the production data; and then, summarizing and clustering the collected production data to obtain training data which can be used for training a production model. Because the training data comprise data generated under a specified scene, target production data and potential target production data determined according to the target threshold interval, the production model can be trained more specifically, the distribution range of the training data can be further expanded by collecting the potential target production data, and the generalization capability of the production model is enhanced.

Description

Production data processing method, device, equipment and storage medium

Technical Field

The present disclosure relates to the field of artificial intelligence technology, and further relates to the technical fields of deep learning, industrial internet, industrial safety production, data backflow, and industrial inspection, and in particular, to a method, an apparatus, a device, and a storage medium for processing production data.

Background

With the continuous sedimentation development and maturity of the deep learning technology, the advanced deep learning technology is gradually applied to the traditional industrial scenes, and the mature deep learning technology is utilized to carry out detection and analysis on scenes such as industrial plant safety precaution, production operation standard inspection and the like.

The main technical logic in the field is to use collected production data to carry out deep learning model training and then use the trained deep learning model to enable a specific production scene. The scenes faced by the safety production are mostly open-set environments, which puts high requirements on the generalization capability of the deep learning model in practical application. Therefore, data generated in the production environment actively flow back to the model training end, data support is provided for improving the performance of the deep learning model, the generalization capability of the deep learning model is favorably improved, and the overall effect of industrial safety production analysis is improved.

In the related technology, the data reflow method based on the error samples is mainly used, namely, the production data are primarily screened according to the image detection result, then the samples with the errors in image detection are manually checked out, then the samples which are manually checked to be the samples with the errors in identification are labeled, and finally the directional optimization is carried out according to the samples with the errors in detection and re-labeling, so that the identification capability of the model on the samples with the errors is improved. However, the method only returns wrong samples, the number of returned samples and the types of returned samples are relatively small, and in an application scene with a complex environment and a large amount of data, the generalization capability of the deep learning model is difficult to further improve.

Disclosure of Invention

The present disclosure provides a method, apparatus, device, and storage medium for production data processing.

According to an aspect of the present disclosure, there is provided a method of production data processing, comprising: determining production data generated under a specified scene from the production data input into the production model to obtain first data; determining production data which enables the output result of the production model to be in a target threshold interval from the production data input into the production model to obtain second data; and summarizing and clustering the first data and the second data to obtain third data.

According to another aspect of the present disclosure, there is provided an apparatus for production data processing, including: the scene data determining module is used for determining production data generated under a specified scene from the production data input into the production model to obtain first data; the target production data determining module is used for determining production data which enables the output result of the production model to be in a target threshold interval from the production data input into the production model to obtain second data; and the summarizing and clustering module is used for summarizing and clustering the first data and the second data to obtain third data.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform any one of the above methods of production data processing.

According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the above-described method of production data processing.

According to another aspect of the present disclosure, a computer program product is provided, comprising a computer program which, when executed by a processor, implements any of the above-described methods of production data processing.

The present disclosure provides a method, apparatus, device, and storage medium for production data processing. The method comprises the steps of collecting production data generated under a specified scene and target production data and potential target production data of an output result of a production model in a target threshold interval from production data; and then, summarizing and clustering the collected production data to obtain training data which can be used for training a production model. Because the training data comprise data generated under a specified scene, target production data and potential target production data determined according to the target threshold interval, the production model can be trained more specifically, the distribution range of the training data can be further expanded by collecting the potential target production data, and the generalization capability of the production model is enhanced.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a schematic flow chart diagram of a method for implementing production data processing according to a first embodiment of the present disclosure;

FIG. 2 is a flow chart diagram of a method for implementing production data processing according to a second embodiment of the present disclosure;

FIG. 3 is a flow chart diagram of a method for implementing production data processing according to a third embodiment of the present disclosure;

FIG. 4 is a schematic flow chart of the third embodiment of the present disclosure for collecting production data at the detection module;

FIG. 5 is a schematic flow chart of the third embodiment of the present disclosure for collecting production data at the property module;

FIG. 6 is a schematic flow chart of the third embodiment of the present disclosure for collecting production data in the trace smoothing module;

FIG. 7 is a schematic flow chart of the third embodiment of the present disclosure for collecting production data at the quality module;

FIG. 8 is a schematic flow chart of collecting production data at an alarm module according to a third embodiment of the present disclosure;

FIG. 9 is a schematic flow chart of clustering production data according to a third embodiment of the present disclosure;

FIG. 10 is a schematic structural diagram of a production data processing apparatus according to a first embodiment of the present disclosure;

FIG. 11 is a block diagram of an electronic device used to implement methods of embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Fig. 1 shows a flow of a method for implementing production data processing according to an embodiment of the present disclosure, and with reference to fig. 1, the method includes: operation S110, determining production data generated in a specified scene from the production data input to the production model, to obtain first data; operation S120, determining production data, which makes an output result of the production model be within a target threshold interval, from the production data input to the production model, to obtain second data; operation S130, collecting and clustering the first data and the second data to obtain third data

In operation S110, the production data input to the production model may be raw production data collected in a production environment; or the production data after being pretreated and processed by cleaning, denoising, filtering and the like; and vectorization production data obtained by feature extraction and the like.

The method comprises the steps of determining production data generated under a specified scene from production data input into a production model to obtain first data, generally analyzing the data generated under the specific scene in advance to obtain certain characteristics of the data different from the data generated under other scenes or a certain rule of the data, and judging whether the production data input into the production model is the production data generated under the specified scene according to the characteristics or the rule.

For example, a new product is introduced in a production line, and production data for producing the product does not exist in actual production. At this time, the production data generated using a specific scenario in a non-production environment, such as a pilot production line, may be used as input to the production model.

Generally, the specified scenario is a scenario to be optimized by the production model, such as a scenario with relatively low data or a scenario with relatively low prediction accuracy. Therefore, the production model can be optimized in a targeted manner aiming at the specified scene so as to enhance the application effect of the production model in the specified scene.

In operation S120, the output result of the production model is a predicted value calculated according to the input data and the model algorithm, which may be a boolean value, a score, a set of values and a probability of each value, etc. The target threshold interval is a range of values corresponding to the output of the production model, e.g., if the output of the production model is a boolean value, the target threshold interval may be 1 or 0; if the output result of the production model is a score, the target threshold interval is a numerical range, for example, greater than n, less than m, or greater than or equal to n and less than or equal to m, where n and m are a specific numerical value; if the output of the production model is a set of values and a probability for each value, then the threshold range is also a set of values and a range of values for each value accordingly.

Different target threshold intervals generally represent different types of data, and in the present disclosure target threshold intervals are used not only to collect production data of a target type, but also to collect potential target production data that is closer to the target type data.

For example, assuming that the production model is a model for scoring a degree of compliance of production, an output result thereof is a degree of compliance of the production process, and a case of less than 60 points is determined as a production process that is not compliant; assuming that the goal of the training model is to improve the recognition rate of the unqualified production processes, production data with a compliance rate less than 65 points may be collected, wherein in addition to non-compliance data less than 60 points, some compliance data near the compliance threshold may be collected. Therefore, the range of the alternative training data can be properly expanded, and the generalization capability and the adaptability of the production model are improved.

Because of the above, the production data collected in the operations S110 and S120 may have a large amount of redundant data, which not only wastes computing resources, but also may cause the subsequent data processing amount to become abnormally large, so that the processing speed is unacceptable, and even the result is not obtained.

Therefore, in operation S130, the collected production data is summarized and clustered, and similar data are merged according to the clustering result, so as to save the computing resources and reduce the data amount of the subsequent operations on the premise of not affecting the data distribution.

In addition, the clustering can also be used for clustering data with the same service attribute or similar data into a class so as to more accurately acquire data with a certain service attribute. For example, clustering is performed according to production data to obtain production data including different types of computation tools such as sports cars, trucks, automobiles, and the like. In this way, corresponding subsequent processing, such as adding corresponding classification labels and the like, can be performed without manual classification and identification.

As such, the embodiment of the present disclosure collects production data generated under a specified scenario from the production data through operation S110, and collects goal production data and potential goal production data of the output result of the production model in a goal threshold interval through operation S120; then, the collected production data is summarized and clustered through operation S130 to obtain training data that can be used for training a production model. Because the training data comprise data generated under a specified scene, target production data and potential target production data determined according to the target threshold interval, the production model can be trained more specifically, the distribution range of the training data can be further expanded by collecting the potential target production data, and the generalization capability of the production model is enhanced.

However, in most cases, the data amount of the production data even after being processed through operation S130 is very large.

For this reason, in another embodiment of the present disclosure shown in fig. 2, after operation S230, there are further added: operation S240, based on an active learning (active learning) strategy, screening the third data to obtain fourth data to be labeled, where the active learning strategy includes at least one of the following query strategies:

an uncertain Sampling (uncertain Sampling) query strategy is used for selecting production data which are difficult to distinguish by a production model and submitting the production data to a professional for marking, so that the capability of improving the algorithm effect at a higher speed is achieved;

a Committee-By-Committee (Query-By-Committee) based Query strategy for voting By a plurality of sub-models to select production data that is difficult to distinguish;

a query strategy based on a Model Change expectation (Expected Model Change) for selecting production data that maximizes gradient Change;

an Error Reduction (Expected Error Reduction) based query strategy for selecting production data that minimizes the loss function;

a Variance Reduction (Variance Reduction) based query strategy for selecting production data that minimizes Variance;

the query strategy based on Density-Weighted Methods is used for selecting different strategies according to business scenes, so that data which are high in Density and difficult to distinguish are selected.

Therefore, the data volume of the data to be labeled can be greatly reduced, the efficiency of training data is improved, and the labor cost of labeling the data is reduced.

It should be noted that the above embodiments shown in fig. 1 and fig. 2 are only basic embodiments of the method for processing production data according to the present disclosure, and an implementer may further refine and expand the above embodiments to form new embodiments in various applicable application scenarios.

Fig. 3 to 9 illustrate another embodiment of the present disclosure, which is applied in a scenario of safety compliance detection of a production process of industrial production.

In the application scenario, the embodiment of the disclosure applies a safety compliance detection model to detect an image acquired in a production process so as to determine whether a violation problem exists in the production process.

For example, an image of a working picture of a worker is collected, whether the worker wears a safety cap or not is judged according to the image, if the worker does not wear the safety cap, the production process is considered to have a violation problem, and the corresponding image is identified as a violation image.

In the embodiment of the present disclosure, the production data is images acquired in the production process, and the acquisition Environment of these images is a typical Open Environment (Open Environment), the target to be analyzed (for example, the head of an operator) is very easily affected by changes of various factors such as background, distance, burst/interference objects, and the like, and the acquired sample set may also have unpredictable situations, for example, a flying insect flying on a camera leads to an inability to photograph a complete production process.

In the embodiment of the present disclosure, when performing safety compliance detection, the safety compliance detection model uses a model inference process including a plurality of processing stages, each processing stage corresponds to one processing module, for example, the detection module, the attribute module, the tracking smoothing module, the quality module, and the alarm module shown in fig. 3.

The detection module is realized by a target detection submodel and is used for outputting the position of a target object (such as a worker or a transport tool) in an image and the probability of the target image appearing at the position, wherein the probability of the target image appearing at the position is the detection score of the target detection submodel;

the attribute module is realized through an attribute submodel and is used for judging the safety compliance attribute of the target object, such as whether a safety helmet is worn or not, whether a protective suit is worn or not and the like according to the target object obtained by the detection module, and the score output by the attribute submodel is the probability of meeting the safety compliance attribute and the score of representing the safety compliance degree of the staff;

the tracking smoothing module is realized by a tracking smoothing processing submodel and is used for removing or processing the non-smooth frames with inconsistent front and back frame attributes in the detection module according to the output of the detection module and the output of the attribute module, so that the finally obtained production data is the production data with consistent front and back frame attributes and is consistent with logic.

The quality module is realized through a quality submodel, and the output of the quality submodel is a quality score which is used for indicating the quality of the block diagram of the target object output by the target detection submodel is good and fast.

The alarm module is realized through an alarm sub-model and is used for judging the safety compliance of the production data processed by the processing module, if the production data is the safety non-compliance production data (violation image), the alarm module gives an alarm and provides the non-compliance image.

Through the processing flow, the safety compliance detection model can send the detected violation images and alarm events to related personnel for processing in time.

Referring to fig. 3, the collection of production data according to the embodiment of the present disclosure is performed synchronously with a process of detecting an image collected in a production process by using a safety compliance detection model, wherein a solid line is a flow line of industrial safety production analysis, and a dashed line represents a summary of collected data at each stage, and a main process of the collection includes:

operation S3010, input a video frame;

the video frames are pictures of a production workshop captured by a camera in a production environment, and each video frame corresponds to an image.

Operation S3020, collecting data at the detection module;

the specific process of collecting data at the detection module is shown in fig. 4, and mainly includes:

step S410, receiving input data (video frame);

step S420, obtaining a detection score through a detection sub-model;

step S430, determining whether the scene is a burst scene, if so, continuing to step S450, and if not, continuing to step S440;

here, the burst scenario refers to an abnormal production scenario, and data generated in the burst scenario may also be abnormal production data.

Because normal production data under normal production conditions are generally used when the safety compliance detection model is trained, the safety compliance detection model can frequently and continuously give an alarm when detecting abnormal production data generated under an emergency scene.

Therefore, when a continuous alarm is received in the production process, the number of the equipment which continuously sends the alarm and the time period can be recorded as conditions for judging whether the scene which generates the production data is an emergent scene.

For example, the device number and the time period of the current production data may be acquired, and if the device number of the production data is the same as the device number recorded in the burst scene and the acquisition time of the current production data also coincides with the time period in the burst scene, it is determined that the current production data is data generated in the burst scene.

The same cause of a sudden scenario often lasts for a period of time, and it is usually not necessary to keep all the production data collected during this period of time.

Therefore, in this embodiment, a selection policy for saving production data in a burst scene is further set, for example, a limit on the number of saved pictures is set, and after the current production data is determined to be the production data generated in the burst scene, the production data conforming to the selection policy is determined as the scene data in the burst scene. Thus, data disasters can be avoided, and data collection can be more efficient.

Step S440, judging whether the detection score is in an effective interval, if so, continuing to step S450, and if not, ending the execution;

the valid interval of the detection score (target threshold interval) here mainly refers to a dual-threshold region formed by a first threshold above a set threshold and a second threshold below the set threshold, where the set threshold refers to a probability value that the target can be considered to be detected.

In the disclosed embodiments, a first threshold above the set threshold and a second threshold below the set threshold may be configured (including set or modified) as needed for production model optimization. Therefore, the model can be converged more quickly, and the expected precision is achieved or the expected generalization capability is improved.

Step S450, saving the original image.

Because the generated data acquisition environment of the embodiment of the present disclosure is an open set environment, the detection result is interfered by factors such as background and distance. Therefore, in order to better analyze the inspection results, the data collected in the inspection module is the collected artwork.

In this way, the production data collected in the detection module can be obtained.

Operation S3030, collecting data at the attribute module;

the specific process of collecting data at the attribute module is shown in fig. 5, and mainly includes:

step S510, receiving input data;

the input data of the attribute module is a block diagram obtained by matting according to the region of the target object output by the detection module.

Step S520, obtaining an attribute score through an attribute sub-model;

wherein, the attribute refers to the attribute of the safety compliance, for example, a certain condition for judging whether the safety compliance is: whether the dressing of workers meets the safety standard in the production environment or not; and the attribute score refers to the probability that the table input data meets the corresponding safety compliance attribute.

Step S530, determining whether the scene is a burst scene, if so, continuing to step S550, and if not, continuing to step S540;

the determination is similar to step S430, and therefore is not described in detail.

Step S540, judging whether the attribute score is in the effective interval, if so, continuing to step S550, and if not, ending the execution;

where the valid interval is also defined herein by a dual threshold, in the attribute module, the specified threshold typically represents an attribute score that can be determined to be non-compliant.

In step S550, the block diagram is saved.

As for the attribute module, the attribute module is only responsible for the block diagram obtained by matting the target object. Therefore, in step S550, only the block diagram including the target object needs to be saved.

In this manner, the production data collected in the attribute module is obtained.

Operation S3040, collecting data at the tracking smoothing module;

the specific process of collecting data in the trace smoothing module is shown in fig. 6, and mainly includes:

step S610, receiving input data;

the input data received by the trace smoothing module is the artwork saved by the detection module and the block diagram saved by the attribute module.

Step S620, a smooth queue is obtained through a tracking smooth module;

and combining the attribute result and the attribute score obtained in the attribute module, performing smooth analysis on the front and rear frames in the original image and the frame diagram queue, and determining the frames with inconsistent attributes of the front and rear frames as non-smooth frames to remove the non-smooth frames. After the above smoothing analysis and processing, a smooth queue containing only smooth frames can be obtained.

Step S630, determining that the current frame is in the smooth queue, if yes, continuing to step S640, and if no, ending the execution;

step 640, determining that the attribute results of the previous frame and the next frame are consistent, if yes, returning to step 630, continuing to obtain the next frame, and if not, continuing to step 650;

and step S650, storing the block diagram and the original drawing, and then continuously acquiring the next frame.

In this way, the production data collected in the trace smoothing module can be obtained.

Operation S3050, collecting data at the quality module;

the specific process of collecting data at the attribute module is shown in fig. 7, and mainly includes:

step S710, receiving input data;

the input data is the artwork or the block diagram in the smooth queue.

Step S720, obtaining a quality score through a quality sub-model;

the quality score represents the score of the captured video frame image quality being good or bad.

Step S730, judging whether the quality score is in an effective interval, if so, continuing to step S740, and if not, ending the execution;

here, the valid interval is also defined by a dual threshold, and in the attribute module, the specified threshold generally represents a quality score that can be determined to be poor in quality.

Step S740, saving the original image or the block diagram.

In this way, the production data collected in the quality module can be obtained.

Operation S3060, collecting data at the alarm module;

the specific process of collecting data at the alarm module is shown in fig. 8, and mainly includes:

step S810, receiving input data;

the input data of the alarm module is original pictures or block diagrams collected by the detection module, the attribute module, the tracking smoothing module and the quality module.

Step S820, obtaining alarm information through an alarm module;

and identifying and judging the original image or the block diagram to determine whether illegal operation exists in the original image or the block diagram, if so, alarming, and giving an alarm reason and the original image or the block diagram.

Step S830, asking the client to confirm the warning information;

step 840, judging whether the alarm is wrong, if yes, continuing step 850, if yes, continuing step 860, otherwise, ending the execution;

wherein, the false detection refers to the condition that the target object is not included in the region of the target detection output; the false alarm indicates that the target object is included, but the judgment of whether the target object violates the rule is wrong.

Step S850, false alarm, save the block diagram;

step S860, belonging to the false detection, storing the original image;

and step S870, data summarization is carried out.

In this way, production data (original drawings or block diagrams) which cause false detection or false alarm when the safety compliance detection model is applied to actual production can be obtained.

Operation S3070, collecting and obtaining a production data collection;

and merging the production data collected in the detection module, the attribute module, the tracking smoothing module, the quality module and the alarm module to obtain a production data collection.

Operation S3080, transmitting data using a data transmission module;

in order to ensure the performance of the online model and reasonably distribute computing power, the embodiment of the disclosure uploads the production data primarily screened by the safety compliance detection model through the data transmission module, and is used for supporting subsequent processing such as data marking, data screening, model iteration and the like.

Specifically, to stagger the peak periods of the on-site task operation, a timed task is first created, and the primary screening data of the current day is automatically uploaded to the on-line storage space at a fixed time each day. And then the server sends a request to process the data in the storage space, and sends the data to the clustering module after a series of image preprocessing is carried out on the data.

Operation S3090, clustering the production data using a clustering module;

because the data collected in the above steps has the characteristics of large data volume and data redundancy, the clustering module is used for clustering the production data in the embodiment of the disclosure. The specific steps are shown in fig. 9, and mainly include:

step S910, receiving input data;

the input data received by the clustering module is production data primarily screened by the safety compliance detection model, and data obtained after a series of graphic preprocessing are obtained, wherein the data comprise the block diagram and the original drawing, if the block diagram is the original drawing, the step S920 is continued, and if the block diagram is the original drawing, the step S950 is continued.

Step S920, if the input data is a block diagram, the input data is directly input into a feature extraction model for feature extraction;

step S930, clustering the extracted features by using a dbscan algorithm to obtain a block diagram cluster;

step S940, if the image is an original image, the original image is cut to obtain a target image;

step S950, the target image is sent to a feature extraction module for feature extraction;

s960, clustering the extracted features by using a dbscan algorithm to obtain a cluster of the target image;

and step S970, obtaining the clustering cluster of the original image according to the mapping relation.

Thus, clustered production data are obtained. And then, according to the clustering result, selecting part of typical production data from each cluster as an alternative frame to be labeled.

Through the clustering operation, redundant data of production data can be greatly reduced, and the data volume to be processed subsequently is reduced.

Operation S3100, selecting a frame to be annotated based on an automatic learning strategy;

because the data labeling cost is high and the time consumption is long, in the embodiment of the disclosure, a batch of production data which is relatively large for the subsequent model training and the iterative help is screened out from the mass production data in an active learning mode to label, so that the labeling cost is reduced, and the time is saved.

The core of the active learning strategy is the selection of the query strategy, and one or more of the query strategies may be selected from the multiple query strategies described in the foregoing operation S240 according to the requirements of the business scenario.

Operation S3110, labeling the frame to be labeled to form training data;

at this time, the number of frames to be labeled selected by the automatic learning strategy is greatly reduced, and a manual or automatic labeling method can be adopted for labeling to obtain training data. For example, a small number of frames are labeled manually, then a labeling model is obtained by training based on the frames, and then the remaining frames to be labeled are labeled by the labeling model.

Operation S3120, the model is optimized for pertinence.

After the training data with the labels are obtained, the original safety compliance detection model can be trained by using the training data to obtain a trained production model. And then, the safety compliance detection model obtained after training is applied to carry out safety compliance detection on actual production data. The safety compliance detection model obtained after training is purposefully optimized for emergency scenes, non-compliance production data and production data close to the non-compliance, so that more emergency events and non-compliance events can be captured, the accuracy is higher, and the application effect is better.

In the embodiment of the disclosure, data collection is embedded into a plurality of sub-models in a plurality of processing stages in a safety compliance detection model, so that more data and types of data can be collected from each processing stage, the generalization capability of the safety compliance detection model can be better improved compared with a mode of collecting production data from end to end in one step, and a specific certain stage is easily traced back.

In the embodiment of the disclosure, a data collection mechanism of a burst scene is introduced, and data generated in the burst scene can be collected at fixed points. Thus, effective data support is provided for analyzing root cause analysis causing burst scenes. In some cases, the same root causes multiple sudden scenes, if the root cause analysis can be carried out, the safety compliance detection model can rapidly converge various sudden scenes caused by the same root causes according to the root causes, so that the number of the sudden scenes which need to be processed really is reduced, and the processing work of the subsequent sudden scenes is more efficient.

In the disclosed embodiment, targeted production data (e.g., non-compliant production data) and potential targeted production data in various processing stages are also included by the valid intervals. Therefore, the number and the types of the alternative training data are greatly expanded, the detection rate of the safety compliance detection model to the non-compliance problem can be improved, and the generalization capability of the safety compliance detection model is improved.

In the embodiment of the disclosure, the collection of the production data is performed synchronously with the actual application of the safety compliance detection model, so that the reflux processing process of the whole production data can be greatly simplified, the convergence of the model is accelerated, and the iteration period of the model is shortened.

Further, the present disclosure also provides an apparatus for secure data processing, as shown in fig. 10, the apparatus 100 includes: a scene data determination module 1001, configured to determine, from the production data input to the production model, production data generated in a specified scene to obtain first data; a target production data determining module 1002, configured to determine, from the production data input to the production model, production data that enables an output result of the production model to be within a target threshold interval, to obtain second data; and a summarizing and clustering module 1003, configured to summarize and cluster the first data and the second data to obtain third data.

In another embodiment of the present disclosure, the production model includes at least two sub models, each sub model corresponds to a processing stage, and correspondingly, the scenario data determining module 1001 is further configured to determine production data generated in a specified scenario from production data input into each sub model to obtain first data; accordingly, the target production data determining module 1002 is further configured to determine, from the production data input into each sub-model, production data having a score of the corresponding sub-model in the target threshold interval, to obtain second data.

In another embodiment of the present disclosure, the scene data determination module 1001 includes: the scene data determining submodule is used for determining production data generated under a specified scene from input data; and the scene data screening submodule is used for determining the production data which accords with the data selection strategy from the production data generated under the specified scene to obtain first data.

In another embodiment of the present disclosure, a scene is designated as a burst scene, and accordingly, the scene data determination module 1001 includes: the burst scene determining submodule is used for determining the number and the time period of equipment for generating the burst scene according to the alarm frequency and the duration; and the burst scene data determining submodule is used for determining the production data generated under the burst scene from the production data input into the production model according to the equipment number and the time period to obtain first data.

In another embodiment of the present disclosure, wherein the target threshold interval includes a dual threshold region formed by a first threshold above a given threshold and a second threshold below the given threshold, the apparatus 1000 further includes: and the target threshold interval configuration module is used for configuring a first threshold area and a second threshold according to the requirement of production model optimization.

In another embodiment of the present disclosure, the apparatus 100 further comprises: and the data screening module is used for screening the third data based on the active learning strategy to obtain fourth data to be labeled.

In another embodiment of the present disclosure, the apparatus 100 further comprises: the data labeling module is used for labeling the fourth data to obtain training data; the model training module is used for training the production model by using the training data to obtain a trained production model; and the model replacement module is used for applying the trained production model to predict the actual production data.

In the technical scheme of the disclosure, the acquisition, storage, application and the like of the personal information of the related user all accord with the regulations of related laws and regulations, and do not violate the good customs of the public order.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 11 shows a schematic block diagram of an example electronic device 800 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 11, the device 1100 comprises a computing unit 1101, which may perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)1102 or a computer program loaded from a storage unit 1108 into a Random Access Memory (RAM) 1103. In the RAM 1103, various programs and data necessary for the operation of the device 1100 may also be stored. The calculation unit 1101, the ROM 1102, and the RAM 1103 are connected to each other by a bus 1104. An input/output (I/O) interface 1105 is also connected to bus 1104.

A number of components in device 1100 connect to I/O interface 1105, including: an input unit 1106 such as a keyboard, a mouse, and the like; an output unit 1107 such as various types of displays, speakers, and the like; a storage unit 1108 such as a magnetic disk, optical disk, or the like; and a communication unit 1109 such as a network card, a modem, a wireless communication transceiver, and the like. The communication unit 1109 allows the device 1100 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

The computing unit 1101 can be a variety of general purpose and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 1101 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and the like. The calculation unit 1101 performs the respective methods and processes described above, such as the processing method of the present production data. For example, in some embodiments, the disclosed methods of production data processing may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 1108. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 1100 via ROM 1102 and/or communication unit 1109. When the computer program is loaded into the RAM 1103 and executed by the computing unit 1101, one or more steps of the method of production data processing described above may be performed. Alternatively, in other embodiments, the computing unit 1101 may be configured by any other suitable means (e.g., by means of firmware) to perform the disclosed method of production data processing.

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A method of production data processing, comprising:

determining production data generated under a specified scene from the production data input into the production model to obtain first data;

determining production data which enables the output result of the production model to be in a target threshold interval from the production data input into the production model to obtain second data;

and summarizing and clustering the first data and the second data to obtain third data.

2. The method of claim 1, wherein the production model comprises at least two sub-models, one for each processing stage,

correspondingly, the determining the production data generated under the specified scene from the production data input into the production model to obtain first data comprises:

determining production data generated under a specified scene from the production data input into each sub-model to obtain first data;

accordingly, the determining, from the data input into the production model, the production data that causes the output result of the production model to be within the target threshold interval, resulting in second data, includes:

and determining the production data of which the score of the corresponding submodel is in the target threshold interval from the production data input into each submodel to obtain second data.

3. The method of claim 1, wherein determining production data generated under a specified scenario from the production data input to the production model, resulting in first data, comprises:

determining production data generated under a specified scene from the production data input into the production model;

and determining the production data which accords with the data selection strategy from the production data generated under the specified scene to obtain first data.

4. The method of claim 1 or 3, wherein the specified scene is a burst scene,

determining the number and time period of equipment generating the emergency scene according to the alarm frequency and the duration;

and determining the production data generated under the sudden scene from the production data input into the production model according to the equipment number and the time period to obtain first data.

5. The method of claim 1, wherein the target threshold interval comprises a dual threshold region formed by a first threshold above a given threshold and a second threshold below the given threshold, the method further comprising:

and configuring the first threshold value and the second threshold value according to the requirement of the production model optimization.

6. The method of claim 1, further comprising:

and screening the third data based on an active learning strategy to obtain fourth data to be labeled.

7. The method of claim 6, wherein the active learning strategy comprises at least one of the following query strategies:

an uncertainty sampling query strategy used for selecting production data which are difficult to distinguish by the production model;

a committee-based query strategy for voting by a plurality of sub-models to select production data that is difficult to distinguish;

a query strategy based on model change expectation for selecting production data that maximizes gradient change;

a reduced-error based query strategy for selecting production data that minimizes the loss function;

a reduced variance based query strategy for selecting production data that results in the most reduced variance;

and a density weight-based query strategy is used for selecting high-density data which are difficult to distinguish.

8. The method of claim 6, further comprising:

labeling the fourth data to obtain training data;

training the production model by using the training data to obtain a trained production model;

and (5) applying the trained production model to predict the actual production data.

9. An apparatus for production data processing, comprising:

the scene data determining module is used for determining production data generated under a specified scene from the production data input into the production model to obtain first data;

the target production data determining module is used for determining production data which enables the output result of the production model to be in a target threshold interval from the production data input into the production model to obtain second data;

and the summarizing and clustering module is used for summarizing and clustering the first data and the second data to obtain third data.

10. The apparatus of claim 9, wherein the production model comprises at least two sub-models, one for each processing stage,

correspondingly, the scene data determining module is also used for determining the production data generated under the appointed scene from the production data input into each sub-model to obtain first data;

correspondingly, the target production data determining module is further used for determining the production data which enables the score of the corresponding sub-model to be in the target threshold interval from the production data input into each sub-model, and obtaining the second data.

11. The apparatus of claim 9, wherein the scene data determination module comprises:

the scene data determining submodule is used for determining the production data generated under the specified scene from the production data input into the production model;

and the scene data screening submodule is used for determining the production data which accords with the data selection strategy from the production data generated under the specified scene to obtain first data.

12. The apparatus according to claim 9 or 11, wherein the specified scene is a burst scene, and accordingly the scene data determining module comprises:

the burst scene determining submodule is used for determining the number and the time period of equipment for generating the burst scene according to the alarm frequency and the duration;

and the burst scene data determining submodule is used for determining the production data generated under the burst scene from the production data input into the production model according to the equipment number and the time period to obtain first data.

13. The apparatus of claim 9, wherein the target threshold interval comprises a dual threshold region formed by a first threshold above a given threshold and a second threshold below the given threshold, the apparatus further comprising:

and the target threshold interval configuration module is used for configuring the first threshold and the second threshold.

14. The apparatus of claim 9, further comprising:

and the data screening module is used for screening the third data based on an active learning strategy to obtain fourth data to be labeled.

15. The apparatus of claim 14, further comprising:

the data labeling module is used for labeling the fourth data to obtain training data;

the model training module is used for training the production model by using the training data to obtain a trained production model;

and the model replacement module is used for applying the trained production model to predict the actual production data.

16. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-8.

17. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-8.

18. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-8.