[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN115310129A - Data scheduling method, device, equipment and readable storage medium - Google Patents

Data scheduling method, device, equipment and readable storage medium Download PDF

Info

Publication number
CN115310129A
CN115310129A CN202210956574.0A CN202210956574A CN115310129A CN 115310129 A CN115310129 A CN 115310129A CN 202210956574 A CN202210956574 A CN 202210956574A CN 115310129 A CN115310129 A CN 115310129A
Authority
CN
China
Prior art keywords
data
batch
scheduling
scheduled
proportion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210956574.0A
Other languages
Chinese (zh)
Inventor
孙子文
韩旭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Weride Technology Co Ltd
Original Assignee
Guangzhou Weride Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Weride Technology Co Ltd filed Critical Guangzhou Weride Technology Co Ltd
Priority to CN202210956574.0A priority Critical patent/CN115310129A/en
Publication of CN115310129A publication Critical patent/CN115310129A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Bioethics (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application discloses a method, a device, equipment and a readable storage medium for data scheduling, wherein the method comprises the following steps: when a signal to be scheduled of data is received, acquiring a first unit time length for desensitization processing of current batch data, and a second unit time length for desensitization processing of the current batch data and pre-stored last batch data of the current batch data, determining a data volume scheduling proportion of next batch data of the current batch data in a pre-established batch data planning scheduling record, adjusting the next batch data to obtain the batch data to be scheduled, and scheduling the batch data to be scheduled. Therefore, according to the result of the current batch data processing, the result is compared with the result of the most adjacent batch data processing, the data volume of the next batch data is controlled, so that the overhigh load of the data desensitization system is avoided, and finally the batch data to be scheduled is scheduled to the data desensitization processor, so that the massive image data can be desensitized efficiently.

Description

Data scheduling method, device, equipment and readable storage medium
Technical Field
The present application relates to the field of data scheduling, and more particularly, to a method, an apparatus, a device, and a readable storage medium for data scheduling.
Background
With the development of information technology, people have an increasing demand for data, such as people need to know the current world by acquiring images and texts. Sensitive information is inevitably involved in the data, for example, the driving is not free, an environmental image is required to be captured to capture sensitive information such as passers-by and license plates, and for example, personal resume acquired in a recruitment system is related to more personal information, and desensitization processing is required to be carried out on the sensitive data to remove privacy information in the sensitive data.
Because the data volume to be desensitized is huge, in the process of data desensitization, the data to be desensitized needs to be dispatched to a desensitization system batch by batch for desensitization, and the efficiency of the data desensitization process can be improved by optimizing the efficiency of the data dispatching process.
The current common scheduling method for data desensitization is to set a partition threshold, divide a mass of images into a plurality of packets, and then hand each packet to a system for desensitization execution.
By dynamically adjusting the data volume of the data to be scheduled, the phenomenon of high load of the system is avoided, and high-efficiency desensitization is carried out on massive image data.
Disclosure of Invention
In view of the above problems, the present application is provided to provide a method, an apparatus, a device and a readable storage medium for data scheduling, which avoid the high liability of the system and perform efficient desensitization on massive image data.
In order to achieve the above object, the following specific solutions are proposed:
a method of data scheduling, comprising:
when a data signal to be scheduled is received, acquiring a first unit time length for desensitization processing of current batch data;
comparing the first unit time length with a second unit time length for desensitization processing of the previous batch of data of the current batch of data stored in advance, and determining a data quantity scheduling proportion of the next batch of data of the current batch of data in a preset batch data planning scheduling record;
and adjusting the next batch of data according to a data amount scheduling proportion corresponding to the next batch of data to obtain batch data to be scheduled, and scheduling the batch data to be scheduled.
Optionally, the process of establishing the batch data planning scheduling record includes:
dividing each data to be distributed according to the local directory address of each data to be distributed to obtain a plurality of batches of data, and determining the data volume of each batch of data;
determining a data scheduling sequence of each batch of data in the plurality of batches of data;
and establishing a batch data planning and scheduling record according to the data quantity of each batch data and the data scheduling sequence of each batch data in the plurality of batch data.
Optionally, comparing the first unit time length with a second unit time length for desensitization processing of the previous batch of data of the current batch of data stored in advance, and determining a data quantity scheduling proportion of the next batch of data of the current batch of data in a preset batch data scheduling record, includes:
if the first unit time length is longer than a second unit time length of the previous batch of data of the current batch of data which is stored in advance and is subjected to desensitization processing, determining that the data quantity scheduling proportion of the next batch of data of the current batch of data in the existing batch data planning scheduling record is a first proportion;
and if the first unit time length is not greater than a second unit time length of the previous batch of data of the current batch of data which is stored in advance and is subjected to desensitization processing, determining that the data quantity scheduling proportion of the next batch of data of the current batch of data in the existing batch data planning scheduling record is a second proportion.
Optionally, adjusting the next batch of data according to the data amount scheduling proportion corresponding to the next batch of data to obtain batch of data to be scheduled, including:
and when the data volume scheduling proportion corresponding to the next batch of data is a first proportion, selecting a first part of data in the next batch of data as batch data to be scheduled, wherein the data volume of the first part of data is a result of multiplying the data volume of the next batch of data by the first proportion.
Optionally, after selecting the first part of data in the next batch of data as the batch data to be scheduled, the method further includes:
and determining the data except the first part of data in the next batch of data as load suspension execution data, wherein the load suspension execution data is data after the first part of data in the scheduling sequence.
Optionally, adjusting the next batch of data according to the data amount scheduling proportion corresponding to the next batch of data to obtain batch of data to be scheduled, including:
and when the data volume scheduling proportion corresponding to the next batch of data is a second proportion, selecting a second part of data in the next batch of data as batch data to be scheduled, wherein the data volume of the second part of data is a result of multiplying the data volume of the next batch of data by the second proportion.
Optionally, the scheduling the batch data to be scheduled includes:
and adding the batch data to be scheduled to an existing task queue, so that a data desensitization processor for data desensitization acquires the batch data to be scheduled from the task queue.
Optionally, obtaining a first unit processing duration of desensitization processing performed on current batch data includes:
acquiring the data volume and the total processing time of desensitization processing of the current batch data;
and taking the ratio of the total processing time to the data volume as the first unit time length for desensitizing the current batch data.
Optionally, taking a ratio of the total processing time to the data size as a first unit duration of desensitization processing performed on the current batch of data, includes:
determining a ratio of the total processing time to the data volume;
rounding off the thousandths of the ratio, reserving an estimated value before the thousandths of the ratio, and taking the time length corresponding to the estimated value as the first unit time length of desensitization processing of the current batch data.
An apparatus for data scheduling, comprising:
the unit duration obtaining unit is used for obtaining the first unit duration for desensitization processing of the current batch data when a data signal to be scheduled is received;
a scheduling proportion determining unit, configured to compare the first unit time length with a second unit time length for performing desensitization processing on the previous batch data of the pre-stored current batch of data, and determine a data amount scheduling proportion of the next batch of data of the current batch of data in a pre-established batch data planning scheduling record;
the to-be-scheduled data determining unit is used for adjusting the next batch of data according to the data amount scheduling proportion corresponding to the next batch of data to obtain to-be-scheduled batch of data;
and the data scheduling unit to be scheduled is used for scheduling the batch data to be scheduled.
Optionally, the apparatus further comprises:
the first scheduling record establishing unit is used for dividing each data to be distributed according to the local directory address of each data to be distributed to obtain a plurality of batches of data and determining the data volume of each batch of data;
the second scheduling record establishing unit is used for determining the data scheduling sequence of each batch of data in the plurality of batches of data;
and the third scheduling record establishing unit is used for establishing a batch data planning scheduling record according to the data quantity of each batch data and the data scheduling sequence of each batch data in the plurality of batch data.
Optionally, the scheduling ratio determining unit includes:
a first proportion determining unit, configured to determine, if the first unit duration is longer than a second unit duration for performing desensitization processing on previous batch data of the pre-stored current batch data, a data amount scheduling proportion of next batch data of the current batch data in an existing batch data scheduling record is a first proportion;
a second proportion determining unit, configured to determine, if the first unit duration is not greater than a second unit duration for performing desensitization processing on previous batch data of the pre-stored current batch data, that a data volume scheduling proportion of next batch data of the current batch data in an existing batch data scheduling record is a second proportion.
Optionally, the unit for determining data to be scheduled includes:
and the first proportion multiplying unit is used for selecting a first part of data in the next batch of data as the batch data to be scheduled when the data quantity scheduling proportion corresponding to the next batch of data is a first proportion, wherein the data quantity of the first part of data is a result of multiplying the data quantity of the next batch of data by the first proportion.
Optionally, the apparatus further comprises:
and a load data determining unit, configured to determine data other than the first part of data in the next batch of data as load suspension execution data, where the load suspension execution data is data after the first part of data in the scheduling order.
Optionally, the unit for determining data to be scheduled includes:
and the second proportion multiplying unit is used for selecting a second part of data in the next batch of data as the batch data to be scheduled when the data volume scheduling proportion corresponding to the next batch of data is a second proportion, and the data volume of the second part of data is a result of multiplying the data volume of the next batch of data by the second proportion.
Optionally, the unit for scheduling data to be scheduled includes:
and the queue data adding unit is used for adding the batch data to be scheduled to the existing task queue so that the data desensitization processor for data desensitization acquires the batch data to be scheduled from the task queue.
Optionally, the unit duration obtaining unit includes:
the processing information acquisition unit is used for acquiring the data volume and the total processing time of desensitization processing of the current batch data;
and the unit duration calculation unit is used for taking the ratio of the total processing time to the data volume as the first unit duration for desensitizing the current batch data.
Optionally, the unit duration calculating unit includes:
a time length result determining unit, configured to determine a ratio of the total processing time to the data amount;
and the duration result reduction unit is used for rounding off the thousandths of the ratio, reserving the estimated value before the thousandths of the ratio, and taking the duration corresponding to the estimated value as the first unit duration for desensitizing the current batch data.
An apparatus for data scheduling, comprising a memory and a processor;
the memory is used for storing programs;
the processor is configured to execute the program to implement the steps of the data scheduling method.
A readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method for data scheduling as described above.
By means of the technical scheme, when a data to-be-scheduled signal is received, the first unit time length for desensitization processing of current batch data is acquired, the first unit time length is compared with the second unit time length for desensitization processing of previous batch data of the current batch data stored in advance, the preset batch data planning scheduling record is determined, and the data volume scheduling proportion of the next batch data of the current batch data is adjusted according to the data volume scheduling proportion corresponding to the next batch data to obtain the batch data to be scheduled and schedule the batch data to be scheduled. Therefore, according to the result of the current batch data processing, the result is compared with the result of the most adjacent batch data processing, the data volume of the next batch data is controlled, so that the overhigh load of the data desensitization system is avoided, and finally the batch data to be scheduled is scheduled to the data desensitization processor, so that the massive image data can be desensitized efficiently.
Drawings
Various additional advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the application. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
fig. 1 is a schematic flowchart of data scheduling provided in an embodiment of the present application;
fig. 2 is a schematic structural diagram of a data scheduling system according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of an apparatus for data scheduling according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of a data scheduling device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The scheme can be realized based on the terminal with the data processing capacity, the terminal can be a data scheduler, and the specific form of the terminal can be a computer, a server, a cloud terminal and the like.
Next, as described in conjunction with fig. 1, the method for scheduling data of the present application may include the following steps:
step S110, when a data signal to be scheduled is received, acquiring the first unit time length of desensitization processing of the current batch data.
Specifically, the signal to be scheduled may be received after the desensitization processing of the current batch data is finished. The first unit time length of desensitization processing performed on the current batch data may represent an average time length of desensitization processing performed on each data in the current batch data, or may represent a total time length taken for desensitization processing on the current batch data.
The terminal can acquire the first unit duration of desensitization processing of current batch data in a module for monitoring data desensitization.
And S120, comparing the first unit time length with a second unit time length for desensitizing the previous batch of data of the pre-stored current batch of data, and determining the data quantity scheduling proportion of the next batch of data of the current batch of data in a pre-established batch data planning scheduling record.
Specifically, the batch data planning and scheduling record may be pre-established before scheduling each batch of data, and each batch of data may be scheduled according to an order of each batch of data in the batch data planning and scheduling record.
The previous batch data may indicate batch data of a previous batch whose scheduling order is in the current batch data, and the next batch data may indicate batch data of a next batch whose scheduling order is in the current batch data.
It is to be understood that a change from the second time unit duration to the first time unit duration may indicate a change in the load of the system, may indicate that the load of the system is not getting heavier when the first time unit duration is not greater than the second time unit duration, and may indicate that the load of the system is getting heavier when the first time unit duration is greater than the second time unit duration, possibly storing a risk of too high load. Based on this, a data volume scheduling proportion of next batch data of the current batch data in a pre-established batch data scheduling record may be determined.
Step S130, adjusting the next batch data according to the data quantity scheduling proportion corresponding to the next batch data to obtain batch data to be scheduled, and scheduling the batch data to be scheduled.
Specifically, the next batch data may be temporarily stored locally by the terminal in advance, or may be obtained by querying from a database according to information of the next batch data of the current batch data in the preset batch data planning and scheduling record.
It can be understood that, since the first unit time length and the second unit time length have different results, the next batch of data needs to be adjusted at the data amount scheduling proportion corresponding to the result based on the data amount scheduling proportion corresponding to the next batch of data, for example, when the first unit time length is not greater than the second unit time length, indicating that the load of the system is normal or not high, the next batch of data can be adjusted with a small reduction or no reduction, and when the first unit time length is greater than the second unit time length, indicating that the load of the system is at an excessive risk, the next batch of data can be adjusted with a large reduction.
Example fig. 2, fig. 2 shows a system architecture of a data scheduling system that may include a data scheduler, a task queue, a data desensitization processor, and a static queue. After the data desensitization processor finishes processing the current batch of data, information of a first unit time length for desensitizing processing of the current batch of data and a data signal to be scheduled can be sent to the static queue, the data scheduler can receive the data signal to be scheduled from the static queue and acquire the first unit time length, the first unit time length is compared with a second unit time length which is stored in advance, the data volume scheduling proportion of the next batch of data is determined and the next batch of data is adjusted, and finally the adjusted next batch of data is taken as the batch of data to be scheduled and is scheduled to the data desensitization processor through the task queue.
In the method for scheduling data provided in this embodiment, when a signal to be scheduled of data is received, a first unit duration for performing desensitization processing on current batch data is obtained, the first unit duration is compared with a second unit duration for performing desensitization processing on previous batch data of the current batch data, which is stored in advance, in a preset batch data planning scheduling record, a data quantity scheduling proportion of next batch data of the current batch data is determined, the next batch data is adjusted according to the data quantity scheduling proportion corresponding to the next batch data, so that batch data to be scheduled is obtained, and the batch data to be scheduled is scheduled. Therefore, according to the result of the current batch data processing, the result is compared with the result of the most adjacent batch data processing, the data volume of the next batch data is controlled, so that the overhigh load of the data desensitization system is avoided, and finally the batch data to be scheduled is scheduled to the data desensitization processor, so that the massive image data can be desensitized efficiently.
In some embodiments of the present application, a process for creating a batch data planning and scheduling record mentioned in the above embodiments is described, where the process for creating a batch data planning and scheduling record may include:
s1, dividing each piece of data to be distributed according to a local directory address of each piece of data to be distributed to obtain a plurality of pieces of batch data, and determining the data volume of each piece of batch data.
Specifically, all the data to be allocated may be temporarily stored in the local storage, each data to be allocated may have its own directory address, the data to be allocated at the same directory address is classified and batched, a plurality of batches of data may be obtained, and the number of the data to be allocated in each batch of data may be determined.
The directory address may be a directory address selected or fixed when the data to be allocated is uploaded.
And S2, determining the data scheduling sequence of each batch of data in the plurality of batches of data.
Specifically, the data scheduling sequence may be a time sequence of generation time of each batch of data, or may be an uploading time sequence of each batch of data during uploading.
And S3, establishing a batch data planning and scheduling record according to the data quantity of each batch data and the data scheduling sequence of each batch data in the plurality of batch data.
Specifically, the batch data planning scheduling record may include the data amount of each batch data and the data scheduling order of each batch data.
In the data scheduling method provided by this embodiment, a plurality of batch data are obtained by dividing each to-be-allocated data, and a scheduling order of each batch data is determined, so as to generate a batch data scheduling record for a data scheduler to query when scheduling the batch data.
In some embodiments of the present application, a process of determining a data volume scheduling proportion of a next batch of data of the current batch of data in a pre-established batch data scheduling record by comparing the first unit duration with a second unit duration pre-stored for desensitization of a previous batch of data of the current batch of data in step S120 is described, where the process may be divided into the following two cases:
firstly, if the first unit time length is longer than a second unit time length for desensitizing the previous batch of data of the current batch of data stored in advance, determining that the data volume scheduling proportion of the next batch of data of the current batch of data in the existing batch data planning scheduling record is a first proportion.
Specifically, when the first unit duration is longer than the second unit duration for desensitization of the previous batch of data of the current batch of data stored in advance, it may indicate that the system load is increasing and the risk of too high storage load may be present, and the determined first ratio may be a ratio for adjusting the next batch of data to be greatly reduced.
The first ratio can be customized, and an example is that the first ratio is 50%.
In this case, the step S130 of adjusting the next batch data according to the data amount scheduling ratio corresponding to the next batch data to obtain a process of obtaining batch data to be scheduled is introduced, where the process may include:
and when the data volume scheduling proportion corresponding to the next batch of data is a first proportion, selecting a first part of data in the next batch of data as batch data to be scheduled, wherein the data volume of the first part of data is a result of multiplying the data volume of the next batch of data by the first proportion.
Specifically, the selection of the first part of data in the next batch of data may be random.
For example, if the next batch of data contains 100 data, then when the first ratio is 50%, 100 × 50% =50 data may be randomly selected from the 100 data of the next batch of data as the batch of data to be scheduled.
It can be understood that when the data volume scheduling ratio corresponding to the next batch of data is the first ratio, it indicates that all data of the next batch of data cannot be scheduled to the data desensitization processor at one time due to the high load of the system being prevented, and therefore other data than the first part of data, such as 100-50=50 data, may be left in the next batch of data, and then the data may be regarded as load suspension execution data, and the scheduling may be arranged after the first part of data is scheduled, and then the load suspension execution data may be regarded as the next batch of data of the first part of data.
And secondly, if the first unit time length is not greater than a second unit time length for desensitizing the previous batch of data of the current batch of data stored in advance, determining that the data volume scheduling proportion of the next batch of data of the current batch of data in the existing batch data planning scheduling record is a second proportion.
Specifically, when the first unit duration is longer than a second unit duration for desensitizing the previous batch of data of the current batch of data stored in advance, the first unit duration may indicate that the system load is normal or not high, and the determined first ratio may be a ratio for performing small-amplitude adjustment or non-adjustment on the next batch of data.
The second ratio may be customized, and the second ratio may be greater than the first ratio, for example, the second ratio is 95% or 100%.
In this case, the step S130 of adjusting the next batch data according to the data amount scheduling ratio corresponding to the next batch data to obtain a process of obtaining batch data to be scheduled is introduced, where the process may include:
and when the data volume scheduling proportion corresponding to the next batch of data is a second proportion, selecting a second part of data in the next batch of data as the batch of data to be scheduled, wherein the data volume of the second part of data is a result of multiplying the data volume of the next batch of data by the second proportion.
Specifically, the second part of data selected from the next batch of data may be randomly selected.
For example, if the next batch contains 100 data, then when the second ratio is 95%, 100 × 95% =95 data may be randomly selected from the 100 data of the next batch as the batch to be scheduled.
The remaining data of the next batch of data except the second part of data may be the next batch of data after the second part of data as the scheduling order, or may be added to the next second batch of data of the current batch of data, so that the remaining data may be scheduled together when the next second batch of data is scheduled.
Further, for example, if the next batch of data contains 100 data, when the second proportion is 100%, all data of the next batch of data can be directly used as the batch of data to be scheduled.
According to the data scheduling method provided by the embodiment, by comparing the relationship between the first unit time length and the second unit time length, whether the current load of the system has a risk of overhigh load is analyzed, if not, the batch data is scheduled according to a normal scheduling proportion, and if so, the original batch data is scheduled in a halving mode, so that the high-efficiency desensitization of the whole system is ensured.
In some embodiments of the present application, a process of scheduling the batch data to be scheduled, which is mentioned in the foregoing embodiments, is described, where the process may include:
and adding the batch data to be scheduled to an existing task queue so that a data desensitization processor for data desensitization acquires the batch data to be scheduled from the task queue.
Specifically, scheduling of tasks can be arranged between the data scheduler and the data desensitization processor through the task queue, the data scheduler can add batch data to be scheduled to the task queue, and the data desensitization processor can obtain the batch data to be scheduled, which needs desensitization processing, from the task queue.
According to the data scheduling method provided by the embodiment, the batch data to be scheduled are added to the task queue, so that the data desensitization processor can access the task queue, the batch data to be scheduled, which needs desensitization processing, is obtained, and the data scheduler is prevented from being directly accessed, so that the operation efficiency of the whole system is improved.
In some embodiments of the present application, a process of acquiring the first unit processing time length for desensitization processing of current batch data, which is mentioned in the above embodiments, is described, and the process may include:
s1, acquiring the data volume and the total processing time of the current batch data for desensitization processing.
Specifically, the data volume and the total processing time of the current batch of data for desensitization processing may be obtained from a static queue used for monitoring the data desensitization state of the data desensitization processor.
When the data desensitization processor finishes processing each batch of data, the data desensitization processor can add the data volume information of the batch of data and the information of the total processing time to the static queue for the data scheduler to obtain from the static queue.
And S2, taking the ratio of the total processing time to the data volume as the first unit time length for carrying out desensitization processing on the current batch data.
It will be appreciated that the first time unit represents an average time period for processing a unit of data of the current batch of data, and the ratio of the total processing time to the data volume may be used as the first time unit for desensitizing the current batch of data.
Specifically, considering that the ratio of the total processing time to the data volume may be an infinite number, and is not an accurate time value, and an estimation process needs to be performed, the step S2 of determining the ratio of the total processing time to the data volume as the first unit duration of the desensitization process for the current batch of data may include:
and S21, determining the ratio of the total processing time to the data volume.
It will be appreciated that the ratio of the total processing time to the amount of data may be an infinite number of decimals, and that this ratio may be reserved for a preset length of decimals, for example, ten decimals.
S22, rounding off the thousandths of the ratio, reserving an estimated value before the thousandths of the ratio, and taking the time length corresponding to the estimated value as the first unit time length of desensitization treatment of the current batch data.
For example, if the ratio of the total processing time to the data amount is 0.450023(s), then the first time unit may be 0.45(s) after being rounded off in thousandths, and if the ratio of the total processing time to the data amount is 0.455023(s), then the first time unit may be 0.46(s) after being rounded off in thousandths.
In addition, to ensure that the first time unit is compared with the second time unit for safety, the first time unit may be estimated to be a value slightly larger than the actual value.
Specifically, when each sub-unit after the percentile of the ratio has a value other than 0, 1 may be added to the percentile of the first unit duration and each sub-unit after the percentile may be omitted, so as to obtain a final estimated value, which is the first unit duration for performing desensitization processing on the current batch data.
For example, the first duration of time is 0.450023(s), and since there is a non-0 in each quantile following percentile (5), the final first duration of time is 0.46(s).
The method for scheduling data provided by this embodiment obtains the first unit time length with higher accuracy and higher safety factor by calculating the ratio of the data volume for desensitizing the current batch data to the total processing time of the current batch data, and rounding up and estimating the ratio, or processing the data based on the safety of the analysis of the second unit time length.
The following describes the apparatus for implementing data scheduling provided in the embodiment of the present application, and the apparatus for implementing data scheduling described below and the method for implementing data scheduling described above may be referred to correspondingly.
Referring to fig. 3, fig. 3 is a schematic structural diagram of an apparatus for implementing data scheduling disclosed in the embodiment of the present application.
As shown in fig. 3, the apparatus may include:
a unit duration obtaining unit 11, configured to obtain a first unit duration for performing desensitization processing on current batch data when a data signal to be scheduled is received;
a scheduling proportion determining unit 12, configured to compare the first unit time length with a second unit time length for performing desensitization processing on the previous batch of data of the pre-stored current batch of data, and determine a data amount scheduling proportion of the next batch of data of the current batch of data in a pre-established batch data scheduling record;
a to-be-scheduled data determining unit 13, configured to adjust the next batch of data according to a data amount scheduling proportion corresponding to the next batch of data, to obtain to-be-scheduled batch of data;
and the data scheduling unit 14 to be scheduled is used for scheduling the batch data to be scheduled.
Optionally, the apparatus further comprises:
the first scheduling record establishing unit is used for dividing each piece of data to be distributed according to the local directory address of each piece of data to be distributed to obtain a plurality of pieces of batch data and determining the data volume of each piece of batch data;
the second scheduling record establishing unit is used for determining the data scheduling sequence of each batch of data in the plurality of batches of data;
and the third scheduling record establishing unit is used for establishing a batch data planning scheduling record according to the data quantity of each batch data and the data scheduling sequence of each batch data in the plurality of batch data.
Optionally, the scheduling ratio determining unit 12 includes:
a first proportion determining unit, configured to determine, if the first unit duration is greater than a second unit duration for performing desensitization processing on previous batch data of the pre-stored current batch of data, that a data amount scheduling proportion of next batch data of the current batch of data in an existing batch data scheduling record is a first proportion;
and a second proportion determining unit, configured to determine, if the first unit duration is not greater than a second unit duration for performing desensitization processing on previous batch data of the pre-stored current batch of data, that a data volume scheduling proportion of next batch data of the current batch of data in an existing batch data scheduling record is a second proportion.
Optionally, the unit for determining data to be scheduled 13 includes:
and the first proportion multiplying unit is used for selecting a first part of data in the next batch of data as the batch data to be scheduled when the data quantity scheduling proportion corresponding to the next batch of data is a first proportion, wherein the data quantity of the first part of data is a result of multiplying the data quantity of the next batch of data by the first proportion.
Optionally, the apparatus further comprises:
and a load data determining unit, configured to determine data other than the first part of data in the next batch of data as load suspension execution data, where the load suspension execution data is data after the first part of data in the scheduling order.
Optionally, the unit for determining data to be scheduled 13 includes:
and the second proportion multiplying unit is used for selecting a second part of data in the next batch of data as the batch data to be scheduled when the data volume scheduling proportion corresponding to the next batch of data is a second proportion, and the data volume of the second part of data is a result of multiplying the data volume of the next batch of data by the second proportion.
Optionally, the data scheduling unit 14 to be scheduled includes:
and the queue data adding unit is used for adding the batch data to be scheduled to the existing task queue so that the data desensitization processor for data desensitization acquires the batch data to be scheduled from the task queue.
Optionally, the unit duration obtaining unit includes:
the processing information acquisition unit is used for acquiring the data volume and the total processing time of desensitization processing of the current batch data;
and the unit duration calculating unit is used for taking the ratio of the total processing time to the data volume as the first unit duration for desensitizing the current batch data.
Optionally, the unit duration calculating unit includes:
a time length result determining unit, configured to determine a ratio of the total processing time to the data amount;
and the duration result reduction unit is used for rounding off the thousandths of the ratio, keeping an estimated value before the thousandths of the ratio, and taking the duration corresponding to the estimated value as the first unit duration for desensitization of the current batch data.
The data scheduling device provided by the embodiment of the application can be applied to data scheduling equipment, such as a terminal: mobile phones, computers, etc. Optionally, fig. 4 shows a block diagram of a hardware structure of the data scheduling apparatus, and referring to fig. 4, the hardware structure of the data scheduling apparatus may include: at least one processor 1, at least one communication interface 2, at least one memory 3 and at least one communication bus 4;
in the embodiment of the application, the number of the processor 1, the communication interface 2, the memory 3 and the communication bus 4 is at least one, and the processor 1, the communication interface 2 and the memory 3 complete mutual communication through the communication bus 4;
the processor 1 may be a central processing unit CPU, or an Application Specific Integrated Circuit ASIC (Application Specific Integrated Circuit), or one or more Integrated circuits configured to implement embodiments of the present invention, etc.;
the memory 3 may include a high-speed RAM memory, and may further include a non-volatile memory (non-volatile memory) or the like, such as at least one disk memory;
wherein the memory stores a program and the processor can call the program stored in the memory, the program for:
when a data signal to be scheduled is received, acquiring a first unit time length for desensitization processing of current batch data;
comparing the first unit time length with a second unit time length for desensitization processing of the previous batch of data of the current batch of data stored in advance, and determining a data quantity scheduling proportion of the next batch of data of the current batch of data in a preset batch data planning scheduling record;
and adjusting the next batch of data according to a data amount scheduling proportion corresponding to the next batch of data to obtain batch data to be scheduled, and scheduling the batch data to be scheduled.
Alternatively, the detailed function and the extended function of the program may refer to the above description.
Embodiments of the present application further provide a storage medium, where a program suitable for execution by a processor may be stored, where the program is configured to:
when a data signal to be scheduled is received, acquiring a first unit time length for desensitization processing of current batch data;
comparing the first unit time length with a second unit time length for desensitizing the last batch of data of the current batch of data stored in advance, and determining a data quantity scheduling proportion of the next batch of data of the current batch of data in a preset batch data planning scheduling record;
and adjusting the next batch of data according to a data amount scheduling proportion corresponding to the next batch of data to obtain batch data to be scheduled, and scheduling the batch data to be scheduled.
Alternatively, the detailed function and the extended function of the program may be as described above.
Finally, it should also be noted that, in this document, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising one of 8230; \8230;" 8230; "does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, the embodiments may be combined as needed, and the same and similar parts may be referred to each other.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (12)

1. A method of scheduling data, comprising:
when a data signal to be scheduled is received, acquiring a first unit time length for desensitization processing of current batch data;
comparing the first unit time length with a second unit time length for desensitizing the last batch of data of the current batch of data stored in advance, and determining a data quantity scheduling proportion of the next batch of data of the current batch of data in a preset batch data planning scheduling record;
and adjusting the next batch of data according to the data amount scheduling proportion corresponding to the next batch of data to obtain batch data to be scheduled, and scheduling the batch data to be scheduled.
2. The method of claim 1, wherein the batch data planning scheduling record creation process comprises:
dividing each data to be distributed according to the directory address of each local data to be distributed to obtain a plurality of batches of data, and determining the data volume of each batch of data;
determining a data scheduling sequence of each batch of data in the plurality of batches of data;
and establishing a batch data planning and scheduling record according to the data quantity of each batch data and the data scheduling sequence of each batch data in the plurality of batch data.
3. The method of claim 1, wherein comparing the first duration per unit to a second duration per unit for desensitization to a previously stored previous batch of data of the current batch of data to determine a data volume scheduling proportion of a next batch of data of the current batch of data in a pre-established batch data scheduling record comprises:
if the first unit time length is longer than a second unit time length of the previous batch of data of the current batch of data which is stored in advance and is subjected to desensitization processing, determining that the data quantity scheduling proportion of the next batch of data of the current batch of data in the existing batch data planning scheduling record is a first proportion;
and if the first unit time length is not greater than a second unit time length for desensitizing the previous batch of data of the current batch of data, determining that the data volume scheduling proportion of the next batch of data of the current batch of data in the existing batch data planning scheduling record is a second proportion.
4. The method of claim 3, wherein adjusting the next batch data according to a data amount scheduling proportion corresponding to the next batch data to obtain batch data to be scheduled comprises:
and when the data volume scheduling proportion corresponding to the next batch of data is a first proportion, selecting a first part of data in the next batch of data as batch data to be scheduled, wherein the data volume of the first part of data is a result of multiplying the data volume of the next batch of data by the first proportion.
5. The method of claim 4, wherein after selecting the first portion of data in the next batch of data as the batch of data to be scheduled, further comprising:
and determining data except the first part of data in the next batch of data as load suspension execution data, wherein the load suspension execution data is data after the first part of data in the scheduling sequence.
6. The method of claim 3, wherein adjusting the next batch data according to a data amount scheduling proportion corresponding to the next batch data to obtain batch data to be scheduled comprises:
and when the data volume scheduling proportion corresponding to the next batch of data is a second proportion, selecting a second part of data in the next batch of data as batch data to be scheduled, wherein the data volume of the second part of data is a result of multiplying the data volume of the next batch of data by the second proportion.
7. The method of claim 1, wherein scheduling the batch of data to be scheduled comprises:
and adding the batch data to be scheduled to an existing task queue so that a data desensitization processor for data desensitization acquires the batch data to be scheduled from the task queue.
8. The method of claim 1, wherein obtaining a first unit processing time duration for desensitization processing of the current batch of data comprises:
acquiring the data volume and the total processing time of desensitization processing of the current batch data;
and taking the ratio of the total processing time to the data volume as the first unit time length for desensitizing the current batch data.
9. The method of claim 8, wherein the determining a ratio of the total processing time to the data volume as a first unit time length for desensitization processing of the current batch of data comprises:
determining a ratio of the total processing time to the data volume;
rounding off the thousandths of the ratio, keeping an estimated value before the thousandths of the ratio, and taking the time length corresponding to the estimated value as the first unit time length for desensitization processing of the current batch data.
10. An apparatus for data scheduling, comprising:
a unit duration obtaining unit, configured to obtain a first unit duration for performing desensitization processing on current batch data when a data signal to be scheduled is received;
a scheduling proportion determining unit, configured to compare the first unit time length with a second unit time length for performing desensitization processing on the previous batch data of the pre-stored current batch of data, and determine a data amount scheduling proportion of the next batch of data of the current batch of data in a pre-established batch data planning scheduling record;
the to-be-scheduled data determining unit is used for adjusting the next batch of data according to the data amount scheduling proportion corresponding to the next batch of data to obtain to-be-scheduled batch of data;
and the data scheduling unit to be scheduled is used for scheduling the batch data to be scheduled.
11. An apparatus for data scheduling, comprising a memory and a processor;
the memory is used for storing programs;
the processor, configured to execute the program, implementing the steps of the method of data scheduling according to any one of claims 1 to 9.
12. A readable storage medium, having stored thereon a computer program, characterized in that the computer program, when being executed by a processor, carries out the steps of the method of data scheduling according to any one of the claims 1-9.
CN202210956574.0A 2022-08-10 2022-08-10 Data scheduling method, device, equipment and readable storage medium Pending CN115310129A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210956574.0A CN115310129A (en) 2022-08-10 2022-08-10 Data scheduling method, device, equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210956574.0A CN115310129A (en) 2022-08-10 2022-08-10 Data scheduling method, device, equipment and readable storage medium

Publications (1)

Publication Number Publication Date
CN115310129A true CN115310129A (en) 2022-11-08

Family

ID=83860252

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210956574.0A Pending CN115310129A (en) 2022-08-10 2022-08-10 Data scheduling method, device, equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN115310129A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117407919A (en) * 2023-10-31 2024-01-16 国网青海省电力公司信息通信公司 Sensitive data processing method and device, storage medium and electronic equipment

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117407919A (en) * 2023-10-31 2024-01-16 国网青海省电力公司信息通信公司 Sensitive data processing method and device, storage medium and electronic equipment

Similar Documents

Publication Publication Date Title
US9588813B1 (en) Determining cost of service call
CN108804231B (en) Memory optimization method and device, readable storage medium and mobile terminal
CN109901881B (en) Plug-in loading method and device of application program, computer equipment and storage medium
CN110659137B (en) Processing resource allocation method and system for offline tasks
CN111163072A (en) Method and device for determining characteristic value in machine learning model and electronic equipment
CN110851987A (en) Method, apparatus and storage medium for predicting calculated duration based on acceleration ratio
CN116594753A (en) Task scheduling method, device, electronic equipment, storage medium and program product
CN115310129A (en) Data scheduling method, device, equipment and readable storage medium
CN110569114B (en) Service processing method, device, equipment and storage medium
CN110188297B (en) Resource information display method, computing device and computer storage medium
CN112069337A (en) Picture processing method and device, electronic equipment and storage medium
CN115797267A (en) Image quality evaluation method, system, electronic device, and storage medium
CN111858542B (en) Data processing method, device, equipment and computer readable storage medium
CN114553786A (en) Network request fusing method and device, computer equipment and storage medium
CN113836130A (en) Data quality evaluation method, device, equipment and storage medium
CN113468442A (en) Resource bit flow distribution method, computing device and computer storage medium
CN112527761A (en) File processing method and device, electronic equipment and storage medium
JP2008516320A (en) Method and apparatus for determining the size of a memory frame
CN112631577A (en) Model scheduling method, model scheduler and model safety test platform
CN112509165A (en) Anti-cheating attendance checking method and system
CN113297358A (en) Data processing method, device, server and computer readable storage medium
CN114648289A (en) Schedule management method and system, terminal and computer storage medium
CN112463257B (en) Application mode determining method, computing device and computer storage medium
CN113590960B (en) User identification model training method, electronic equipment and computer storage medium
CN117057953A (en) Safe and stable engineering quality supervision method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination