CN115238596B - Data processing method and device, readable storage medium and electronic equipment - Google Patents
Data processing method and device, readable storage medium and electronic equipment Download PDFInfo
- Publication number
- CN115238596B CN115238596B CN202211158663.7A CN202211158663A CN115238596B CN 115238596 B CN115238596 B CN 115238596B CN 202211158663 A CN202211158663 A CN 202211158663A CN 115238596 B CN115238596 B CN 115238596B
- Authority
- CN
- China
- Prior art keywords
- data
- concentration
- aerosol
- contribution
- value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/20—Design optimisation, verification or simulation
- G06F30/27—Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/18—Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2119/00—Details relating to the type or aim of the analysis or the optimisation
- G06F2119/08—Thermal analysis or thermal optimisation
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Computational Mathematics (AREA)
- Evolutionary Computation (AREA)
- Operations Research (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Strategic Management (AREA)
- Pure & Applied Mathematics (AREA)
- Human Resources & Organizations (AREA)
- Mathematical Optimization (AREA)
- Mathematical Analysis (AREA)
- Economics (AREA)
- Evolutionary Biology (AREA)
- Development Economics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Entrepreneurship & Innovation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Game Theory and Decision Science (AREA)
- Probability & Statistics with Applications (AREA)
- General Business, Economics & Management (AREA)
- Algebra (AREA)
- Bioinformatics & Computational Biology (AREA)
- Databases & Information Systems (AREA)
- Tourism & Hospitality (AREA)
- Quality & Reliability (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Marketing (AREA)
- Medical Informatics (AREA)
- Computer Hardware Design (AREA)
- Geometry (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The disclosure relates to a data processing method, a data processing device, a readable storage medium and electronic equipment, and relates to the technical field of computers. The method comprises the following steps: acquiring environmental data at different moments within a first preset time length; predicting a plurality of aerosol concentrations corresponding to the environmental data at the different moments through a pollutant prediction model; determining a marginal contribution value of the environmental data to the aerosol concentration at a first moment according to the aerosol concentration at the first moment and an average value of the aerosol concentrations. By using the data processing method provided by the disclosure, the marginal contribution value of different environmental data to the aerosol concentration can be determined, so that the interpretability of the pollutant prediction model for predicting the aerosol concentration is enhanced.
Description
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a data processing method and apparatus, a readable storage medium, and an electronic device.
Background
The Secondary Organic Aerosol (SOA) is a product generated by oxidizing volatile organic compounds emitted from natural sources and artificial sources in the atmospheric environment, and is also an important component of atmospheric fine particulate matters, and environmental data of different types and different proportions may promote the generation of the secondary organic aerosol and also may reduce the generation of the secondary organic aerosol.
In the prior art, the concentration of the secondary organic aerosol is simulated, but the interpretability of the simulated concentration of the secondary organic aerosol is low.
Disclosure of Invention
The present disclosure is directed to a data processing method, an apparatus, a readable storage medium, and an electronic device, so as to solve the above technical problems.
In order to achieve the above object, a first aspect of the embodiments of the present disclosure provides a data processing method, including:
acquiring environmental data at different moments within a first preset time length;
predicting a plurality of aerosol concentrations corresponding to the environmental data at the different times through a pollutant prediction model;
determining a marginal contribution value of the environmental data to the aerosol concentration at a first time according to the aerosol concentration at the first time and an average value of the plurality of aerosol concentrations.
Optionally, the environmental data comprises at least one of meteorological pollution data and differential data, the meteorological pollution data comprising meteorological data and pollutant data, the differential data comprising meteorological differential data and pollutant differential data;
the weather difference data is a difference value between the weather data at the previous moment and the weather data at the current moment, and the pollutant difference data is a difference value between the pollutant data at the previous moment and the pollutant data at the current moment.
Optionally, the pollutant prediction model is obtained by training:
and training the first model by taking the environmental sample within a second preset time as training data and taking the target aerosol concentration sample as a label to obtain the pollutant prediction model.
Optionally, the target aerosol concentration sample is determined by:
determining a ratio interval formed by a first target ratio and a second target ratio from a plurality of first ratios between the organic carbon concentration and the element carbon concentration, wherein the first target ratio is smaller than the second target ratio;
determining a plurality of candidate aerosol solubility samples according to a plurality of third ratios in the ratio interval, and the elemental carbon concentrations and the total organic carbon concentrations corresponding to the plurality of third ratios;
determining the target aerosol concentration sample from the plurality of candidate aerosol concentration samples.
Optionally, the determining, from a plurality of first ratios between the organic carbon concentration and the elemental carbon concentration, a ratio interval formed by a first target ratio and a second target ratio includes:
determining a plurality of second ratios of the comprehensive pollutants to the primary pollutants at different times;
determining a third target ratio and a fourth target ratio from the plurality of second ratios, wherein the third target ratio is smaller than the fourth target ratio;
and taking an interval formed between a first target ratio at the same time as the third target ratio and a second target ratio at the same time as the fourth target ratio as the ratio interval.
Optionally, said determining said target aerosol concentration sample from said plurality of candidate aerosol concentration samples comprises:
determining a plurality of correlations between the plurality of candidate aerosol concentration samples and the elemental carbon concentration;
and determining a candidate aerosol concentration sample corresponding to the minimum correlation from the plurality of correlations as the target aerosol concentration sample.
Optionally, the determining a plurality of candidate aerosol solubility samples according to a plurality of third ratios in the ratio interval, and the elemental carbon concentrations and the total organic carbon concentrations corresponding to the plurality of third ratios includes:
determining a plurality of organic carbon concentrations discharged for one time according to a plurality of third ratios in the ratio interval and the element carbon concentrations corresponding to the third ratios;
determining a plurality of candidate aerosol concentration samples according to the organic carbon concentrations of the plurality of primary discharges and the total organic carbon concentrations of the plurality of primary discharges and the secondary generation.
Optionally, after determining the marginal contribution value of the environmental data to the aerosol concentration at the first time instant, the method comprises:
and dividing the environmental data at different moments and the marginal contribution value of the environmental data to the aerosol concentration into different data sets.
Optionally, the dividing environmental data at different time instants and the marginal contribution value of the environmental data to the aerosol concentration into different data sets includes:
storing the environmental data at different moments into a first data set;
for the environmental data at any moment in the first data set, under the condition that the concentration of meteorological pollution data in the environmental data is smaller than a preset concentration, dividing the environmental data into a second data set, and under the condition that the meteorological pollution data in the environmental data is greater than the preset concentration, dividing the environmental data into a third data set;
for the environment data at any moment in the first data set, under the condition that the increment of the difference data in the environment data is smaller than a preset increment, the environment data is divided into a fourth data set, and under the condition that the increment of the difference data in the environment data is larger than the preset increment, the environment data is divided into a fifth data set.
Optionally, in a case where the marginal contribution value of the differential data in the first data set is less than the marginal contribution value of the meteorological pollution data, after said determining the marginal contribution value of the environmental data to the aerosol concentration at the first time instant, the method comprises:
obtaining a first marginal contribution difference value of the target type data to the aerosol concentration within the first preset time period under a first data level according to the contribution average value of the target type data in the second data set and the contribution average value of the target type data in the first data set;
obtaining a second marginal contribution difference value of the target type data to the aerosol concentration within the first preset time period under a second data level according to the contribution average value of the target type in the third data set and the contribution average value of the target type data in the first data set;
the second data level is greater than the first data level.
Optionally, in a case where the marginal contribution value of the differential data in the first data set is greater than the marginal contribution value of the meteorological pollution data, after said determining the marginal contribution value of the environmental data to the aerosol concentration at the first time instant, the method comprises:
obtaining a difference value of the contribution average value of the target type data in the fourth data set and the contribution average value of the target type data in the first data set, wherein the difference value of the target type data in the first preset time period to the third margin contribution of the aerosol concentration is obtained in the first difference level;
obtaining a fourth boundary contribution difference value of the target type data to the aerosol concentration within the first preset time period under a second differential level according to the contribution average value of the target type data in the fifth data set and the contribution average value of the target type data in the first data set;
the second differential level is greater than the first differential level.
According to a second aspect of embodiments of the present disclosure, there is provided a data processing apparatus, the apparatus comprising:
the acquisition module is configured to acquire environmental data at different moments within a first preset time length;
a prediction module configured to predict, by a pollutant prediction model, a plurality of aerosol concentrations corresponding to the environmental data at the different times;
a contribution determination module configured to determine a marginal contribution of the environmental data to the aerosol concentration at a first time based on the aerosol concentration at the first time and an average of the plurality of aerosol concentrations.
According to a third aspect of embodiments of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the data processing method provided by the first aspect of embodiments of the present disclosure.
According to a fourth aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including:
a memory having a computer program stored thereon;
a processor for executing the computer program in the memory to implement the steps of the data processing method provided by the first aspect of the embodiments of the present disclosure.
According to the technical scheme, the aerosol concentrations at different moments can be predicted through the pollutant prediction model, and the marginal contribution value of the environmental data to the aerosol concentration at the first moment is obtained according to the aerosol concentration at the first moment and the average value of the aerosol concentrations. After the marginal contribution values of different environmental data to the aerosol concentration are obtained, the main factors and the secondary factors influencing the aerosol concentration can be known, when the aerosol concentration is predicted by using a pollutant prediction model in the subsequent process, workers determine whether the environmental data are the main factors or the secondary factors influencing the aerosol concentration according to the type and the content of the subsequently obtained environmental data, and the interpretability of the pollutant prediction model for outputting the aerosol concentration is enhanced.
Additional features and advantages of the disclosure will be set forth in the detailed description which follows.
Drawings
The accompanying drawings, which are included to provide a further understanding of the disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure without limiting the disclosure. In the drawings:
fig. 1 is a flowchart illustrating steps of a data processing method according to an exemplary embodiment of the present disclosure.
Fig. 2 is a logic diagram illustrating a data processing method according to an exemplary embodiment of the present disclosure.
FIG. 3 is a schematic diagram illustrating partitioning different data sets according to an exemplary embodiment of the present disclosure.
Fig. 4 is a block diagram of a data processing apparatus according to an exemplary embodiment of the present disclosure.
Fig. 5 is a block diagram of an electronic device shown in an exemplary embodiment of the present disclosure.
Fig. 6 is a block diagram of an electronic device shown in an exemplary embodiment of the present disclosure.
Detailed Description
The following detailed description of specific embodiments of the present disclosure is provided in connection with the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the present disclosure, are given by way of illustration and explanation only, not limitation.
It should be noted that all actions of acquiring signals, information or data in the present disclosure are performed under the premise of complying with the corresponding data protection regulation policy of the country of the location and obtaining the authorization given by the owner of the corresponding device.
Referring to fig. 1, a data processing method is shown, which includes the following steps:
in step S11, environmental data at different times within a first preset duration is obtained.
In the present disclosure, the first preset time period may be one week, one month, one year, two years, etc., and the time period between different times may be one hour, two hours, three hours, four hours, etc., which is not limited herein.
The environmental data is data contributing to the aerosol concentration, and environmental data of different types and contents can influence the aerosol concentration. The environmental data includes at least one of meteorological pollution data and differential data.
The meteorological pollution data includes meteorological data and pollution data. Meteorological data including temperature, relative humidity, barometric pressure, precipitation, wind speed, wind direction, radiation conditions, etc., and pollution data including VOCs data (VOCs data are, for example, alkanes, alkenes, alkynes, aromatics, etc.), O 3 Concentration, etc.
The differential data includes meteorological differential data and pollution differential data. The weather difference data is a difference between the weather data at the previous moment and the weather data at the current moment, for example, a difference between the temperature at the previous moment and the temperature at the current moment, or a difference between the relative humidity at the previous moment and the relative humidity at the current moment, or the like; the pollutant differential data is a difference between pollutant data at a previous time and pollutant data at a current time, such as O at the previous time 3 Concentration and O at the present time 3 A difference between the concentrations, or a difference between the VOCs data at the previous time and the VOCs data at the current time.
Wherein the different areas have one or more sites, which are environment monitoring networks for obtaining environmental data. Therefore, the present disclosure may obtain environmental data of a single site for subsequent data analysis, and may also obtain environmental data of multiple sites.
In step S12, a plurality of aerosol concentrations corresponding to the environmental data at the different times are predicted by the pollutant prediction model.
In the present disclosure, the pollutant prediction model is configured to predict, after receiving the environmental data, a plurality of different aerosol concentrations corresponding to the environmental data at a plurality of different times according to the environmental data, AND the pollutant prediction model may be any one of a random forest model, an Xgboost model (regression prediction model), a LightGBM model ((Light Gradient Boosting Machine, gradient Boosting decision tree algorithm), AND a Catboost model (Gradient Boosting AND category feature algorithm).
For example, the first preset time duration is one year, and the interval time duration between different moments is one hour, the pollutant prediction model can obtain environmental data of each hour in one year, and predict the aerosol concentration of each hour according to the environmental data of each hour, so as to obtain different aerosol concentrations of each hour in one year.
Wherein the aerosol is secondary organic aerosol, which is a product generated by oxidizing natural and artificial Volatile Organic Compounds (VOCs) discharged in the atmospheric environment, and is pollutant O 3 Are important oxidants for aerosol formation, and thus, VOCs and O 3 Can affect aerosol generation; meteorological data such as temperature, humidity, radiation conditions, air pressure and air speed also have different degrees of influence on the generation of the aerosol, for example, the temperature can influence the saturated vapor pressure of organic components, directly influences the distribution of oxidation products of VOCs in a gas-solid phase and further influences the generation of the aerosol; the increase in humidity may promote aerosol production and may also reduce aerosol production.
It can be seen that meteorological pollution data can affect aerosol generation, and thus, the present disclosure takes meteorological pollution data as part of environmental data as input to a pollutant prediction model to predict aerosol concentration.
In addition, in order to guarantee the accuracy of the predicted aerosol concentration, differential data are added, the difference value between the meteorological pollution data at the previous moment and the meteorological pollution data at the current moment is used as the input of a pollutant prediction model, and the differential data are used as the influence factors of the pollutant prediction model.
Wherein the aerosol is oxidant (oxidant including OH free radical, O) for VOCs in atmosphere 3 Or NO 3 Free radical) oxidation generation, the aerosol has stronger polarity, hygroscopicity and solubility, influences the global radiation balance through direct or indirect radiation area influence climate system, and can not directly obtain the aerosol concentration in the correlation technique, therefore this disclosure adopts pollutant prediction model to predict the aerosol concentration.
In step S13, a marginal contribution value of the environmental data to the aerosol concentration at a first time is determined according to the aerosol concentration at the first time and an average value of the aerosol concentrations.
In the present disclosure, the marginal contribution value of each environmental data to the aerosol concentration at the first time may be obtained according to the aerosol concentration at the first time and an average value of a plurality of aerosol concentrations.
The first time may be any one of the first preset time periods, and may also be understood as the current time, and the marginal contribution value refers to a contribution value made by the environmental data in a process of affecting the aerosol concentration.
The marginal contribution value of each environmental data to the aerosol concentration can be determined through an interpretable model (Xia Puli model), and the interpretable model reads the environmental data received by the pollutant prediction model and the aerosol data output by the pollutant prediction model after the pollutant prediction model is trained or the pollutant prediction model is trained, so that the marginal contribution value of each environmental data to the aerosol data is calculated.
In particular, the interpretable model may determine the marginal contribution of each environmental data to the aerosol concentration according to equation (1) below.
In the formula (1), the first and second groups,predicting the obtained aerosol concentration for the pollutant prediction model at the first moment;taking environmental samples as input for a pollutant prediction model, predicting the average value of aerosol concentration at a plurality of different moments,also can be used forData output according to training data when the pollutant prediction model is trained;the method comprises the steps of obtaining a total value of marginal contribution values of a plurality of different environmental data to the concentration of aerosol predicted by a pollutant prediction model at a first moment; and m is the number of the environment data.
As can be seen from equation (1), the average value of the aerosol concentrations at a plurality of different times is subtracted from the aerosol concentration at the first timeThe marginal contribution value of the environmental data to the aerosol concentration at the first moment can be obtainedThe sum of (a) and (b).
Based on the formula (1), the marginal contribution value of different environmental data to the aerosol concentration can be obtained by using the following formula (2):
in formula (2), M is the number of a plurality of combinations of a single environmental data and other environmental data;the number of environmental data contained in a single combination; i represents environmental data of one of the target types;a marginal contribution to aerosol concentration for one of the plurality of combinations;using environmental sample as input for pollutant prediction model, predicting multiple failuresMean value of aerosol concentration at the same time;is the marginal contribution of a single environmental datum to the aerosol concentration.
As can be seen from equation (2), the marginal contribution to aerosol concentration of a single combination relating to environmental data of a target type can be madeSubtracting the average value of the aerosol concentration at a plurality of different time instantsMultiplied by the weight of the individual combinationsTo obtain the numerical value of a single combination, and finally, the numerical values of a plurality of combinations are added to obtain the marginal contribution value of the environment data of the target type.
For example, for O 3 Temperature, VOCs data, can be combined into 7 types, respectively O 3 Temperature, VOCs data, O 3 Temperature, O 3 -VOCs data, temperature-VOCs data, O 3 temperature-VOCs data, if O is to be obtained 3 The marginal contribution value to the aerosol concentration is screened out from 7 combinations 3 Related to O 3 、O 3 Temperature, O 3 VOCs data, O 3 -4 combinations of temperature-VOCs data, for each of the 4 combinations, subtracting the average of the aerosol concentrations predicted by the pollutant prediction model from the environmental sample from the marginal contribution of the combination to the aerosol concentration to obtain a difference for the individual combination, multiplying the difference for the individual combination by the weight of the individual combination to obtain a value for the individual combination, and finally summing the values for the remaining three combinations with the value for the combination to obtain O 3 Marginal contribution to aerosol concentration.
The aerosol concentration at the first time is 35 mu g/m 3 A plurality of different timesAverage value of aerosol concentration at 3 The environmental data includes O 3 Taking concentration, temperature, and VOCs data as an example, O can be calculated by formula (1) 3 The sum of the marginal contribution values of the concentration, the temperature and the VOCs data to the organic aerosol at the first moment is 5 mu g/m 3 From the formula (2), it can be calculated that the contribution of VOCs data is 3 μ g/m 3 ,O 3 The concentration and the temperature respectively contribute to 1 mu g/m 3 。
According to the data processing method provided by the disclosure, the aerosol concentrations at different moments can be predicted through a pollutant prediction model, and the marginal contribution value of the environmental data to the aerosol concentration at the first moment is obtained according to the aerosol concentration at the first moment and the average value of the aerosol concentrations.
Under the condition that the staff cannot know the judgment basis of the aerosol concentration output by the pollutant prediction model, the staff cannot definitely determine how the environmental data influence the pollutant prediction model to predict the aerosol concentration, and the interpretability of the aerosol concentration output by the pollutant prediction model can be considered to be low. After the marginal contribution values of different environmental data to the aerosol concentration are obtained, the main factors and the secondary factors influencing the aerosol concentration can be known, when the aerosol concentration is predicted by using a pollutant prediction model in the subsequent process, workers determine whether the environmental data are the main factors or the secondary factors influencing the aerosol concentration according to the type and the content of the subsequently obtained environmental data, and the interpretability of the pollutant prediction model for outputting the aerosol concentration is enhanced.
In a possible implementation, please refer to fig. 2, the environmental sample within a second preset time period may be used as training data, and the target aerosol concentration sample is used as a label to train the first model, so as to obtain the pollutant prediction model.
Specifically, the first model is trained on environmental samples and target aerosol concentration samples at different moments to obtain a pollutant prediction model.
The second preset time period may be greater than or equal to the first preset time period, and the second preset time period may be one year, two years, three years, or the like, which is not limited in this disclosure.
Wherein the type of data contained in the environmental sample is the same as the type of data contained in the environmental data. The environmental sample is data used for training the first model to obtain a pollutant prediction model; the environmental data is test data when the contamination model is actually predicted. The environmental sample comprises a meteorological pollution sample and a differential sample, and the meteorological pollution sample comprises a meteorological sample and a pollutant sample.
The meteorological samples include temperature, relative humidity, air pressure, precipitation, wind speed, wind direction, etc., and the pollutant samples include VOCs samples (such as alkane, alkene, alkyne, aromatic hydrocarbon, etc.), O 3 Concentration, etc.
The differential samples include a meteorological differential sample and a contamination differential sample. The weather difference sample is a difference between a weather sample at a previous moment and a weather sample at a current moment, for example, a difference between a temperature at the previous moment and a temperature at the current moment, or a difference between a relative humidity at the previous moment and a relative humidity at the current moment, and the like; the contaminant difference sample is the difference between the contaminant sample at the previous time and the contaminant sample at the current time, e.g. O at the previous time 3 Concentration and O at the present time 3 The difference between the concentrations, or the difference between the samples of VOCs at the previous time and the samples of VOCs at the current time.
In the related art, when the first ratio between the organic carbon concentration and the elemental carbon concentration is calculated, the reliability of the first ratio directly calculated according to the organic carbon concentration and the elemental carbon concentration is low due to a certain defect of a calculation mode, and thus the reliability of the calculated target aerosol concentration sample is low.
In order to improve the confidence of the calculated target aerosol concentration sample, the target aerosol concentration sample can be obtained by the following steps:
in step S21, a ratio interval formed by a first target ratio and a second target ratio is determined from a plurality of first ratios between the organic carbon concentration and the elemental carbon concentration, where the first target ratio is smaller than the second target ratio.
Wherein the first ratio between the Organic Carbon concentration and the Elemental Carbon concentration refers to information of primary emission, and the Organic Carbon concentration (OC) and the Elemental Carbon concentration (EC) are PM 2.5 The component data in (1).
When the first target ratio and the second target ratio are determined, the first target ratio and the second target ratio can be determined according to the second ratio of the comprehensive pollutants to the primary pollutants at a plurality of different moments. Specifically, the ratio interval formed by the first target ratio and the second target ratio can be determined by the following steps:
in sub-step A1, a plurality of second ratios of integrated to primary contaminants at different times are determined.
Wherein the integrated pollutant is an integrated parameter including primary emission and secondary generation, such as PM 2.5 Concentration; the primary pollutant is a product of the primary emission, such as the CO concentration, and the second ratio may be PM 2.5 To the CO concentration. The first discharge refers to substances directly discharged into the atmospheric environment in the process of human activities, and the second generation refers to new products generated after the substances discharged into the atmospheric environment at the first time and the rest substances in the atmospheric environment are subjected to chemical reaction.
In sub-step A2, from the plurality of second ratios, a third target ratio and a fourth target ratio are determined, the third target ratio being smaller than the fourth target ratio.
After the second ratios at different times are obtained, the first ratios can be arranged in a descending order, then the minimum ratio of the second ratios at different times is used as a third target ratio, and the second ratio at a preset quantile (for example, 1/4 quantile or 1/2 quantile) in the second ratios at different times is used as a fourth target ratio.
For example, in the case where the plurality of first ratios includes a plurality of second ratios of 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.1 is the third target ratio, and 0.2 is in 1/4 quantile, 0.2 is the fourth target ratio.
In sub-step A3, an interval formed between a first target ratio at the same time as the third target ratio and a second target ratio at the same time as the fourth target ratio is used as the ratio interval.
The method comprises the steps of obtaining organic carbon concentration, element carbon concentration, comprehensive pollutants, primary pollutants and environment data at different moments, obtaining a first ratio between the organic carbon concentration and the element carbon concentration at each moment and a second ratio between the comprehensive pollutants and the primary pollutants at each moment, wherein the organic carbon concentration, the element carbon concentration, the comprehensive pollutants, the primary pollutants, the environment data, the first ratio and the second ratio at the same moment are in one-to-one correspondence.
On the basis, a first target ratio of the third target ratio at the same time and a second target ratio of the fourth target ratio at the same time can be determined from the plurality of first ratios. That is, from the plurality of first ratios, a first ratio at the same time as the minimum second ratio and a first ratio at the same time as the preset quantile second ratio are determined, and an interval formed by the two first ratios is used as a ratio interval.
In the present disclosure, the second ratio between the comprehensive pollutant and the primary pollutant represents a ratio between the secondarily generated information and the primarily discharged information, and a smaller second ratio indicates that the comprehensive pollutant accounts for a smaller second ratio, and a larger primary pollutant accounts for a larger first ratio, and indicates that the primarily discharged information in the second ratio can be accounted for a larger first ratio. Therefore, in the process of selecting the first target ratio and the second target ratio corresponding to the third target ratio and the fourth target ratio as the ratio interval from the second ratios of the plurality of comprehensive pollutants and the primary pollutants, since the ratio range between the third target ratio and the fourth target ratio is small, the plurality of third ratios in the ratio interval corresponding to the ratio range can be characterized as the ratio of the primary emission information.
The first ratio and the second ratio are in a direct proportion relation, and the smaller the second ratio is, the smaller the first ratio is, and the more the first ratio can represent that the first ratio is information of one-time emission.
For the first target ratio, two digits after the decimal point are reserved, and if the numerical values of the two digits after the decimal point of the first target ratio are within a first preset range, the value of the second digit after the decimal point is the minimum value of the numerical values within the first preset range; and if the numerical values of the second two digits after the decimal point of the first target ratio are within a second preset range, taking the value of the second two digits after the decimal point as the minimum value of the numerical values in the second preset range. The value in the first predetermined range is greater than the value in the second predetermined range, for example, the first predetermined range is [5,9 ], and the second predetermined range is [0,5 ].
For example, when the first target ratio is 0.3678, the first target ratio is 0.35 because the second digit 6 in the first target ratio is within the first preset range; when the first target ratio is 0.3278, the first target ratio is 0.30 because the second digit 2 in the first target ratio is within the second preset range.
For the second target ratio, reserving one digit after the decimal point, and if the numerical value of the one digit after the decimal point of the second target ratio is within a third preset range, taking the first decimal after the decimal point as the maximum value in the numerical values in the third preset range; and if the numerical value of one digit after the decimal point of the second target ratio is within a fourth preset range, taking the value of the first digit after the decimal point as the maximum value in the numerical values of the fourth preset range. The numerical value of the third predetermined range is greater than the numerical value of the fourth predetermined range, for example, the third predetermined range is [5,9 ], and the fourth predetermined range is [0,5 ].
For example, when the second target ratio is 2.3579, the first decimal 3 in the second target ratio is within the fourth preset range, so the second target ratio is 0.5; when the second target ratio is 2.6579, the first decimal 6 is within the third predetermined range, so the second target ratio is 2.9.
It can be seen that, by setting several bits behind the decimal of the first target ratio and the second target ratio, the numerical value of the first target ratio is smaller, the numerical value of the second target ratio is larger, and the range of the ratio interval is further enlarged.
In step S22, a plurality of candidate aerosol solubility samples are determined according to a plurality of third ratios in the ratio interval, and the elemental carbon concentrations and the total organic carbon concentrations corresponding to the plurality of third ratios.
In this disclosure, after the ratio interval, the total organic carbon concentration, and the elemental carbon concentration are obtained, the candidate aerosol concentration samples at a plurality of different times may be calculated according to a plurality of third ratios in the ratio interval and the total organic carbon concentrations and the elemental carbon concentrations corresponding to the plurality of third ratios.
Wherein, a plurality of third ratios in the ratio interval are primary emission information at different moments, the total organic carbon concentration is the sum of the primary emission and the secondary generated organic carbon concentration, and the elemental carbon concentration is the elemental carbon concentration of the primary emission.
The concentration of the organic carbon discharged for a plurality of times can be determined according to a plurality of third ratios in the ratio interval and the concentrations of the element carbon corresponding to the plurality of third ratios; and determining a plurality of candidate aerosol concentration samples according to the organic carbon concentrations of the plurality of primary discharges and the total organic carbon concentrations of the plurality of primary discharges and the secondary generation.
In particular, a plurality of candidate aerosol concentration samples may be determined according to the following equation (3):
in the formula (3), the first and second groups,is a third ratio, EC is the elemental carbon concentration of a primary emission, OC is the total organic carbon concentration, and SOA is a candidate aerosol concentration sample.
As can be seen from equation (3), the third ratio is based onThe concentration EC of the organic carbon discharged for the first time can obtain the concentration of the organic carbon discharged for the first time; and subtracting the concentration of the organic carbon discharged for the first time from the concentration of the total organic carbon discharged for the first time and the second time to obtain the concentration of the organic carbon generated for the second time, wherein the concentration of the organic carbon generated for the second time is a candidate aerosol concentration sample and is a pollutant generated in the process of the second time.
When the candidate aerosol concentration sample corresponding to the third ratio is calculated, the third ratio can be screened out from the ratio interval by using a preset step length, and then a plurality of candidate aerosol concentration samples are calculated, wherein the preset step length can be 0.02, 0.05 and the like, and the disclosure is not limited herein.
By calculating the plurality of candidate aerosol concentration samples according to the preset step length, the candidate aerosol concentration samples corresponding to all third ratio values in the ratio interval do not need to be calculated, but the third ratio values screened out according to the preset step length in the ratio interval are calculated, and therefore the data processing amount is reduced.
In step S23, the target aerosol concentration sample is determined from the plurality of candidate aerosol concentration samples.
In the present disclosure, the plurality of candidate aerosol concentration samples are secondarily generated information, but there is a target candidate aerosol concentration sample among the plurality of candidate aerosol concentration samples that can be more representative of the secondarily generated information.
Thus, a plurality of correlations between a plurality of candidate aerosol concentration samples and elemental carbon concentrations may also be determined; and determining a candidate aerosol concentration sample corresponding to the minimum correlation from the plurality of correlations as a target aerosol concentration sample.
In determining the correlation between the plurality of candidate aerosol concentration samples and the elemental carbon concentration, the greater the correlation between the candidate aerosol concentration samples and the elemental carbon concentration, the closer to 1; the smaller the correlation, the closer to 0.
Since the elemental carbon concentration may represent a product of primary emission, the smaller the correlation between the candidate aerosol concentration sample and the elemental carbon concentration, the more the obtained candidate aerosol concentration sample is a product of secondary generation, and the smallest the correlation between the target aerosol concentration sample determined from the plurality of candidate aerosol concentration samples and the elemental carbon concentration indicates that the obtained target aerosol concentration sample is among the plurality of candidate aerosol concentration samples and is most representative of information of secondary generation.
According to the data processing method provided by the disclosure, a third target ratio and a fourth target ratio in a plurality of second ratios formed by comprehensive pollutants and primary pollutants are determined, then a first target ratio corresponding to the third target ratio and a first target ratio corresponding to the fourth target ratio are determined from a plurality of first ratios formed by organic carbon concentration and elemental carbon concentration, finally a candidate aerosol concentration sample corresponding to the third ratio in a ratio interval is determined from the ratio interval formed by the first target ratio and the second target ratio, and a target aerosol concentration sample with the lowest correlation with the elemental carbon concentration is screened from the candidate aerosol concentration samples.
In this process, since the third target ratio and the fourth target ratio are the ratios with smaller values among the second ratios, and the second ratios and the first ratios are in a proportional relationship, the first target ratio and the second target ratio respectively corresponding to the third target ratio and the fourth target ratio are also ratios with smaller values, and naturally, the ratio interval formed by the first target ratio and the second target ratio is also smaller. Correspondingly, the third ratio between the comprehensive pollutant and the primary pollutant in the ratio interval is a ratio with a small value, so that the ratio of the comprehensive pollutant is small, the ratio of the primary pollutant is large, the obtained third ratio can represent the information of primary emission, and then the information of primary emission can be preliminarily rejected after the information of primary emission is subtracted from the total organic carbon concentration, so that a candidate aerosol concentration sample representing the information of secondary generation is obtained.
And screening out a target aerosol concentration sample with low correlation with the carbon concentration of the information element representing the primary emission from the plurality of candidate aerosol concentration samples to determine information capable of representing the secondary generation so as to further eliminate the influence of the primary emission information, so that the reliability of the obtained target aerosol concentration sample is high.
In the disclosure, because the difference between the environmental sample and the target aerosol concentration sample generated in different time periods within the second preset time period is relatively large, in order to make the concentration of the aerosol in the environmental sample more reliable, the aerosol concentration within the second preset time period may be divided into different time periods for calculation.
Wherein, the second preset duration is taken as one year, and the one year includes four time periods of spring, summer, autumn and winter. When determining the target aerosol concentration in the same season, a first ratio between the organic carbon concentration and the elemental carbon concentration in each hour in the season may be determined, the target aerosol concentration in each hour in the season may be calculated according to the first ratio, and finally, the average value of the target aerosol concentrations in each hour may be used as the target aerosol concentration in the season. Thus, the target aerosol concentration in four seasons of the year can be obtained.
After the target aerosol concentration in each season is determined, the corresponding relation between the environmental samples in the same season and at the same time and the target aerosol concentration can be established, and then the environmental samples in the four time periods are combined into the environmental sample in the second preset time. Specifically, the corresponding relation among the meteorological data, the pollution data, the meteorological differential data, the pollutant differential data and the target aerosol concentration at the same moment within the second preset time is established, so that the staff can conveniently check the correlation among the data of the training pollutant prediction model.
In a possible implementation manner, in the foregoing manner, it is disclosed that a marginal contribution value of the environmental data to the aerosol concentration at a first time is calculated, in some scenarios, a worker may need the marginal contribution value of the environmental data to the aerosol concentration in a period of time to perform contribution analysis on the environmental data, so that, in order to obtain the marginal contribution value of the environmental data to the aerosol concentration in the period of time, after the environmental data is acquired, the present disclosure may further divide the environmental data at different times and the marginal contribution value of the environmental data to the aerosol concentration into different data sets, and specifically, the present disclosure further includes the following steps:
in step S31, the environmental data at different times are stored in the first data set.
In the present disclosure, after the environmental data at different times within the first preset duration are obtained, the environmental data at different times may be all stored in the first data set, and all the original data are stored in the first data set.
The first data set comprises environmental data, aerosol concentration and marginal contribution values of the environmental data to the aerosol concentration at different moments.
Including O with environmental data 3 For example, with reference to Table 1, the first data set has O at the time point of 12 3 Environmental data with concentrations corresponding to VOCs concentrations, O 3 And the marginal contribution value of VOCs to the aerosol concentration and the aerosol solubility; also has O at the time of 13 3 Environmental data with concentrations corresponding to VOCs concentrations, O 3 With the marginal contribution of VOCs to aerosol concentration and aerosol solubility.
In step S32, for the environmental data at any time in the first data set, the environmental data is classified into a second data set when the concentration of the weather pollution data in the environmental data is less than a preset concentration, and the environmental data is classified into a third data set when the concentration of the weather pollution data in the environmental data is greater than the preset concentration.
In the disclosure, when the concentration of the meteorological pollution data is less than the preset concentration, the concentration level of the meteorological pollution data is lower, and at this time, the environmental data can be divided into the second data set; when the concentration of the meteorological pollution data is greater than the preset concentration, the concentration level of the data is higher, and at this time, the environmental data can be divided into a third data set.
Taking meteorological pollution data as O 3 For example, please refer to FIG. 3, which shows O at the current time 3 The concentration is less than 200 mu g/m 3 Then, all the environmental data (e.g., O) at the current time in the first data set are collected 3 Concentration, VOCs concentration, temperature, barometric pressure, relative humidity, wind speed, temperature differential data, O 3 Concentration difference data, O 3 Marginal contribution value, etc.) into a second data set of low ozone concentration states; at O 3 The concentration is more than 200 mu g/m 3 At present, all the environmental data (e.g. O) at the current time are stored 3 Concentration, VOCs concentration, temperature, barometric pressure, relative humidity, wind speed, temperature differential data, O 3 Concentration difference data, O 3 Concentration margin contribution, etc.) into a second data set of high ozone concentration states.
Taking meteorological pollution data as an example of temperature, when the temperature at the current moment is less than 40 ℃, dividing all environmental data in the first data set at the current moment into a second data set in a low-temperature state; and when the temperature at the current moment is higher than 40 ℃, dividing all the environmental data at the current moment into a third data set in a high-temperature state.
And combining the second data set and the data set at different moments in the third data set into the first data set. Taking the meteorological pollution data as an example of temperature, when the temperature of 12; when the temperature of 13.
As can be seen, in the above process, if O is used 3 The first data set is divided by the meteorological pollution data of the type, and the environmental data in the first data set at different moments can be respectively divided into a second data set and a third data set in different ozone states; if the first data set is divided by the meteorological pollution data of the temperature type, the first data set can be dividedThe environmental data at the same time are divided into a second data set and a third data set at different temperature states respectively.
Of course, the first data set may be divided into the second data set and the third data set according to different types of environmental data, such as air pressure, concentration of VOCs, etc., and the disclosure is not limited thereto.
In step S33, for the environment data at any time in the first data set, when an increment in the difference data in the environment data is smaller than a preset increment, the environment data is divided into a fourth data set, and when the increment in the difference data in the environment data is larger than the preset increment, the environment data is divided into a fifth data set.
In the present disclosure, the difference data is a difference between the previous time meteorological pollution data and the current time meteorological pollution data of the same type, and the difference data corresponds to the current time meteorological pollution data.
Taking the environmental data shown in table 1 as an example, the differential data may be O at 12 3 Concentration and O at 13 3 Difference between concentrations 10, which difference 10 is compared to O at 13 3 The concentrations are at the same time; the difference 10 between the concentration of VOCs at 12.
When the increment represented by the differential data is smaller than the preset increment, the increase amplitude of the environmental data at the current moment is slow, and at the moment, the environmental data can be divided into a fourth data set; in the case that the increment represented by the differential data is larger than the preset increment, which indicates that the increase of the environmental data at the current moment is relatively fast, the environmental data may be divided into the fifth data set.
With differential data as O 3 For example, referring to FIG. 3, the difference data is O at the current time 3 Concentration minus last time O 3 When the concentration difference is smaller than a preset increment, dividing all environmental data in the first data set at the current moment into a fourth data set with low ozone increment; at the present timeO of time 3 Concentration minus last time O 3 And when the difference value of the concentrations is larger than the preset increment, dividing all the environmental data in the first data set at the current moment into a fifth data set with high ozone concentration increment.
Taking the difference data as an example of temperature difference data, when the difference value obtained by subtracting the temperature at the previous moment from the temperature at the current moment is smaller than a preset increment, dividing all environmental data at the current moment in the first data set into a fourth data set with low temperature increment; and when the difference value of the temperature at the current moment minus the temperature at the previous moment is larger than a preset increment, dividing all the environmental data in the first data set at the current moment into a fifth data set with high temperature increment.
It can be seen that in the above process, if O is used 3 The first data set can be divided into a fourth data set and a fifth data set under different ozone increment states; if the first data set is divided by the meteorological pollution data of the type of temperature difference data, the first data set can be divided into a fourth data set and a fifth data set at different temperature increments.
Of course, the first data set may be divided into the fourth data set and the fifth data set according to different types of differential data, such as air pressure difference, VOCs concentration difference, and the like, and the disclosure is not limited herein.
In steps S31 to S33, if the marginal contribution value of the difference data in the first data set is smaller than the marginal contribution value of the weather pollution data, obtaining a first marginal contribution difference value of the target type data to the aerosol concentration within a first preset time period in the first data level according to the contribution average value of the target type data in the second data set and the contribution average value of the target type data in the first data set, and the contribution average value of the target type data in the second data set is the marginal contribution value of the target type data to the aerosol concentration within the first preset time period in the first data level; the difference value of the second marginal contribution of the target type data to the aerosol concentration in the first preset time period in the second data level can be obtained according to the average contribution value of the target type in the third data set and the average contribution value of the target type data in the first data set, and the average contribution value of the target type data in the third data set is the marginal contribution value of the target type data to the aerosol concentration in the first preset time period in the second data set level. The second data level is greater than the first data level.
The data processing method comprises the steps that a plurality of differential data and meteorological pollution data in a first data set are data at different moments in a first preset time length, and if the average value of marginal contribution values of the differential data in the first preset time length in the first data set is smaller than the average value of the marginal contribution values of the meteorological pollution data, the marginal contribution value of the differential data is smaller than the marginal contribution value of the meteorological pollution data.
Wherein in determining the magnitude relationship between the marginal contribution value of the differential data in the first data set and the marginal contribution value of the meteorological pollution data, the same type of differential data is compared to the meteorological pollution data, e.g. O 3 Concentration and O 3 Comparing the differential data, comparing the temperature with the temperature differential data, comparing the air pressure with the air pressure differential data, and the like.
Wherein the target type data may be the remaining environmental data excluding the weather pollution data compared in magnitude with the differential data. For example, if mixing O 3 Concentration and O 3 Comparing the difference data, and removing O from the target type data 3 When the temperature is compared with the temperature difference data, the target type data is the remaining environment data excluding the temperature.
Taking the first preset duration as one year and the meteorological pollution data as O 3 Concentration, differential data O 3 Differential data, for example, O if one year 3 The average value of the marginal contribution value of the concentration to the aerosol concentration is larger than O 3 The average of the marginal contribution values of the differential data to the aerosol concentration represents O 3 The contribution of concentration to aerosol concentration is more important.
At this point the second number may be setRemoval of O in the data set 3 The average value of the marginal contribution values of the rest of the environmental data except the concentration is subtracted by the value obtained by removing O in the first data set 3 The marginal contribution value of the rest environmental data except the concentration is averaged to obtain the value except O 3 The contribution values of the rest of the environmental data except the concentration are the first marginal contribution difference value under the first data level of low ozone concentration. For example, the average value of the marginal contribution values of the concentrations of the VOCs in the second data set is subtracted from the average value of the marginal contribution values of the concentrations of the VOCs in the first data set to obtain a first marginal contribution difference value of the contribution values of the concentrations of the VOCs at a low ozone concentration.
The third data set may also be freed of O 3 The average value of the marginal contribution values of the rest of the environmental data except the concentration is subtracted by the value obtained by removing O in the first data set 3 The marginal contribution value of the rest environmental data except the concentration is averaged to obtain the value except O 3 And the contribution value of the rest environmental data except the concentration is at a second marginal contribution difference value of the second data level of high ozone concentration. For example, the average value of the marginal contribution values of the concentrations of the VOCs in the third data set over one year is subtracted from the average value of the marginal contribution values of the concentrations of the VOCs in the first data set over one year to obtain a second marginal contribution difference value of the contribution values of the concentrations of the VOCs at a high ozone concentration.
Finally, determining a first preset time length according to the difference between the first marginal contribution difference value and the second marginal contribution difference value, and removing O 3 The marginal contribution of the remaining environmental data outside the concentration is different at different data levels, i.e. different ozone concentration levels. For example, determination of removal of O 3 And the VOCs data except the concentration, the temperature, the humidity and other environmental data make a contribution to the aerosol concentration under different ozone concentration levels. Under the condition that the first marginal contribution difference value is smaller than the second marginal contribution difference value, the aerosol concentration contribution of other environmental data under the second data level of high ozone concentration is determined to be higher, and under the condition that the first marginal contribution difference value is larger than the second marginal contribution difference value, the aerosol concentration contribution of other environmental data under the first data level of low ozone concentration is determined to be higherThe contribution is higher.
In steps S31 to S33, if the difference data in the first data set is greater than the weather pollution data, obtaining a difference value that the target type data contributes to a third margin of the aerosol concentration within a first preset time period in the first difference level according to the average contribution value of the target type data in the fourth data set and the average contribution value of the target type data in the first data set, and the average contribution value of the target type data in the fourth data set is a margin contribution value of the target type data in the first difference level to the aerosol concentration within the first preset time period; and obtaining a fourth boundary contribution difference value of the target type data in the fifth data set to the aerosol concentration within a first preset time period under a second differential level according to the contribution average value of the target type data in the fifth data set and the contribution average value of the target type data in the first data set, wherein the contribution average value of the target type data in the fifth data set is a marginal contribution value of the target type data in the second differential level to the aerosol concentration within the first preset time period. The second differential level is greater than the first differential level.
If the average value of the marginal contribution values of the differential data in the first preset time period in the first data set is larger than the average value of the marginal contribution values of the meteorological pollution data, the marginal contribution value of the differential data is larger than the marginal contribution value of the meteorological pollution data.
The first preset duration is taken as one year, and the meteorological pollution data is taken as O 3 Concentration, differential data O 3 Differential data, for example, O if one year 3 The average value of the marginal contribution value of the concentration to the aerosol concentration is less than O 3 The average of the marginal contribution of the differential data to the aerosol concentration represents O 3 The contribution of the differential data to the aerosol concentration is more important.
At this point, O may be removed from the third data set 3 The average value of the marginal contribution values of the environmental data except the differential data is subtracted by the value except O in the first data set 3 Average value of marginal contribution values of the rest of the environmental data except the differential data to obtain the value except O 3 Difference numberAnd according to the data, the contribution values of the rest environment data are the third marginal contribution difference value under the first difference level of low ozone increment. For example, the third boundary contribution difference value of the contribution value of the concentration of the VOCs at a low ozone concentration increment is obtained by subtracting the average value of the marginal contribution values of the concentration of the VOCs in the first data set from the average value of the marginal contribution values of the concentration of the VOCs in the fourth data set at a year.
The fourth data set may also be freed of O 3 The average value of the marginal contribution values of the environmental data except the differential data is subtracted by the value except O in the first data set 3 Average value of marginal contribution values of the rest of the environmental data except the differential data to obtain the value except O 3 The contribution values of the environmental data other than the differential data are the fourth boundary contribution difference value at the second differential level of high ozone concentration increment. For example, the average value of the marginal contribution values of the concentration of the VOCs in the fourth data set over the year is subtracted from the average value of the marginal contribution values of the concentration of the VOCs in the first data set over the year, so as to obtain a fourth marginal contribution difference value of the contribution values of the concentration of the VOCs under the high ozone concentration increment.
Finally, determining a first preset time length according to the difference between the third margin contribution difference value and the fourth margin contribution difference value, and removing O 3 And the marginal contribution values of the rest environmental data except the differential data are different under different differential levels. And in the case that the third margin contribution difference value is smaller than the fourth margin contribution difference value, determining that the rest of the environmental data contributes more to the aerosol concentration at the second difference level of the high ozone concentration increment, and in the case that the third margin contribution value is larger than the fourth margin contribution value, determining that the rest of the environmental data contributes more to the aerosol concentration at the first difference level of the low ozone concentration increment.
As can be seen, the marginal contribution values of the environmental data and the environmental data in the first data set, which contribute to the aerosol concentration at different times within the first preset time period, are divided, so that two data sets, namely a second data set and a third data set, at different concentration levels can be obtained, and two data sets, namely a fourth data set and a fifth data set, at different differential levels can also be obtained.
And subtracting the average value of the contribution values of the environmental data in the first data set from the average value of the contribution values of the environmental data in the data sets at different concentration levels to obtain the difference of the contribution of the environmental data in the first preset time period to the aerosol concentration at different concentration levels, so that the working personnel can be helped to determine the difference of the contribution of the environmental data to the aerosol concentration at different concentration levels within the macroscopic time period range of the first preset time period.
The average value of the contribution values of the environmental data in the first data set can be subtracted from the average value of the contribution values of the environmental data in the data sets at different differential levels to obtain the difference of the contribution of the environmental data in the first preset time period to the aerosol concentration at different differential levels, so that the worker is helped to determine the difference of the contribution of the environmental data to the aerosol concentration at different differential levels in the macroscopic time period range of the first preset time period.
By the data processing method provided by the disclosure, the marginal contribution value of the environmental data to the aerosol concentration at the first moment can be obtained, so that workers can be helped to more finely determine which environmental data contribute to the aerosol concentration at which moment; the marginal contribution value of the environmental data to the aerosol concentration within the first preset time period can also be obtained, and the size difference of the environmental data to the aerosol concentration within the first preset time period at different concentration levels or different differential levels is obtained, so that workers can be helped to determine the contribution of the environmental data to the aerosol concentration within which time period and the difference of the contribution are determined macroscopically.
When the pollutant prediction model is trained, the environmental sample is processed according to the way of processing the environmental data in steps S31 to S33, the environmental sample is processed into data in different data sets, and the pollutant prediction model is trained by separating the different data sets.
Based on the same inventive concept, please refer to fig. 4, the present disclosure further provides a data processing apparatus, where the data processing apparatus 120 includes: an obtaining module 121, a predicting module 122 and a contribution value determining module 123;
the obtaining module 121 is configured to obtain environmental data at different times within a first preset time length;
a prediction module 122 configured to predict, by a pollutant prediction model, a plurality of aerosol concentrations corresponding to the environmental data at the different time instants;
a contribution determination module 123 configured to determine a marginal contribution of the environmental data to the aerosol concentration at a first time instant according to the aerosol concentration at the first time instant and an average of the plurality of aerosol concentrations.
Optionally, the environmental data comprises at least one of meteorological pollution data and differential data, the meteorological pollution data comprising meteorological data and pollutant data, the differential data comprising meteorological differential data and pollutant differential data;
the meteorological difference data is a difference value between meteorological data at a previous moment and meteorological data at a current moment, and the pollutant difference data is a difference value between pollutant data at the previous moment and pollutant data at the current moment.
Optionally, the data processing device 120 comprises:
and the training module is configured to train the first model by taking the environmental sample within a second preset time as training data and taking the target aerosol concentration sample as a label to obtain the pollutant prediction model.
Optionally, the data processing device 120 comprises:
a ratio interval determination module configured to determine a ratio interval formed by a first target ratio and a second target ratio from a plurality of first ratios between the organic carbon concentration and the elemental carbon concentration, wherein the first target ratio is smaller than the second target ratio;
a candidate aerosol concentration determination module configured to determine a plurality of candidate aerosol solubility samples according to a plurality of third ratios in the ratio interval, and elemental carbon concentrations and total organic carbon concentrations corresponding to the plurality of third ratios;
a target aerosol concentration determination module configured to determine the target aerosol concentration sample from the plurality of candidate aerosol concentration samples.
Optionally, the ratio determining module comprises:
a second ratio determination module configured to determine a plurality of second ratios of the combined pollutant to the primary pollutant at different times;
a third and fourth target ratio determination module configured to determine a third target ratio and a fourth target ratio from the plurality of second ratios, the third target ratio being less than the fourth target ratio;
a first ratio interval determination module configured to determine, as the ratio interval, an interval formed between a first target ratio at the same time as the third target ratio and a second target ratio at the same time as the fourth target ratio.
Optionally, the target aerosol concentration determination module comprises:
a correlation determination module configured to determine a plurality of correlations between the plurality of candidate aerosol concentration samples and the elemental carbon concentration;
a first target aerosol concentration determination module configured to determine, from the plurality of correlations, a candidate aerosol concentration sample corresponding to a smallest correlation as the target aerosol concentration sample.
Optionally, the candidate aerosol concentration determination module comprises:
an organic carbon concentration determination module configured to determine a plurality of organic carbon concentrations of primary emission according to a plurality of third ratios in the ratio interval and elemental carbon concentrations corresponding to the plurality of third ratios;
a first candidate aerosol concentration determination module configured to determine a plurality of candidate aerosol concentration samples from the organic carbon concentrations of the plurality of primary emissions and a plurality of primary and secondary generated total organic carbon concentrations.
Optionally, the prediction module 122 comprises:
the first dividing module is configured to divide the environmental data and the marginal contribution value of the environmental data to the aerosol concentration at different moments into different data sets.
Optionally, the first partitioning module comprises:
the first storage module is configured to store the environmental data at different moments into a first data set;
the second storage module is configured to divide the environmental data into a second data set when the concentration of meteorological pollution data in the environmental data is smaller than a preset concentration and divide the environmental data into a third data set when the concentration of meteorological pollution data in the environmental data is larger than the preset concentration for the environmental data at any moment in the first data set;
the third storage module is configured to, for the environment data in the first data set at any time, partition the environment data into a fourth data set when an increment of differential data in the environment data is smaller than a preset increment, and partition the environment data into a fifth data set when the increment of the differential data in the environment data is larger than the preset increment.
Optionally, in a case that the marginal contribution value of the differential data in the first data set is smaller than the marginal contribution value of the meteorological pollution data, the data processing apparatus 120 includes:
a first marginal contribution value determining module configured to obtain, according to the average contribution value of the data of the target type in the second data set and the average contribution value of the data of the target type in the first data set, a first marginal contribution difference value of the data of the target type to the aerosol concentration within the first preset time period at a first data level;
a second marginal contribution value determining module configured to obtain a second marginal contribution difference value of the data of the target type to the aerosol concentration within the first preset time period under a second data level according to the average contribution value of the target type in the third data set and the average contribution value of the data of the target type in the first data set;
the second data level is greater than the first data level.
Optionally, in a case that a marginal contribution value of the differential data in the first data set is greater than a marginal contribution value of the meteorological pollution data, the data processing apparatus 120 includes:
a third interplanetary contribution value determining module configured to obtain, according to an average value of contributions of the data of the target type in the fourth data set and an average value of contributions of the data of the target type in the first data set, a third interplanetary contribution difference value of the data of the target type to the aerosol concentration within the first preset time period at a first difference level;
a fourth boundary contribution value determining module configured to obtain a fourth boundary contribution difference value of the target type data to the aerosol concentration within the first preset time period under a second difference level according to the contribution average value of the target type data in the fifth data set and the contribution average value of the target type data in the first data set;
the second differential level is greater than the first differential level.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
Fig. 5 is a block diagram illustrating an electronic device 700 according to an example embodiment. As shown in fig. 5, the electronic device 700 may include: a first processor 701 and a first memory 702. The electronic device 700 may also include one or more of a multimedia component 703, a first input/output (I/O) interface 704, and a first communication component 705.
The first processor 701 is configured to control the overall operation of the electronic device 700, so as to complete all or part of the steps in the data processing method. The first memory 702 is used to store various types of data to support operation at the electronic device 700, such as instructions for any application or method operating on the electronic device 700 and application-related data, such as contact data, messaging, pictures, audio, video, and the like. The first Memory 702 may be implemented by any type of volatile or non-volatile Memory device or combination thereof, such as Static Random Access Memory (SRAM), electrically Erasable Programmable Read-Only Memory (EEPROM), erasable Programmable Read-Only Memory (EPROM), programmable Read-Only Memory (PROM), read-Only Memory (ROM), magnetic Memory, flash Memory, magnetic disk or optical disk. The multimedia components 703 may include screen and audio components. Wherein the screen may be, for example, a touch screen and the audio component is used for outputting and/or inputting audio signals. For example, the audio component may include a microphone for receiving an external audio signal. The received audio signal may further be stored in the first memory 702 or transmitted through the first communication component 705. The audio assembly also includes at least one speaker for outputting audio signals. The first input/output interface 704 provides an interface between the first processor 701 and other interface modules, such as a keyboard, a mouse, buttons, etc. These buttons may be virtual buttons or physical buttons. The first communication component 705 is used for wired or wireless communication between the electronic device 700 and other devices. Wireless Communication, such as Wi-Fi, bluetooth, near Field Communication (NFC), 2G, 3G, 4G, NB-IOT, eMTC, or other 5G, etc., or one or a combination thereof, which is not limited herein. The corresponding first communication component 705 may thus comprise: wi-Fi module, bluetooth module, NFC module, etc.
In an exemplary embodiment, the electronic Device 700 may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic components for performing the above-described data Processing method.
In another exemplary embodiment, there is also provided a computer readable storage medium comprising program instructions which, when executed by a processor, implement the steps of the data processing method described above. For example, the computer readable storage medium may be the first memory 702 comprising program instructions executable by the first processor 701 of the electronic device 700 to perform the data processing method described above.
Fig. 6 is a block diagram illustrating an electronic device 1900 according to an example embodiment. For example, the electronic device 1900 may be provided as a server. Referring to fig. 6, the electronic device 1900 includes a second processor 1922, which may be one or more in number, and a second memory 1932 for storing computer programs executable by the second processor 1922. The computer program stored in the second memory 1932 may include one or more modules that each correspond to a set of instructions. Further, the second processor 1922 may be configured to execute the computer program to perform the data processing method described above.
Additionally, the electronic device 1900 may further include a power component 1926 and a second communication component 1950, the power component 1926 may be configured to perform power management of the electronic device 1900, and the second communication component 1950 may be configured to enable communication, e.g., wired or wireless communication, of the electronic device 1900. The electronic device 1900 may also include a second input/output (I/O) interface 1958. The electronic device 1900 may operate based on an operating system, such as Windows Server, stored in the second memory 1932 TM ,Mac OS X TM ,Unix TM ,Linux TM And so on.
In another exemplary embodiment, there is also provided a computer readable storage medium comprising program instructions which, when executed by a processor, implement the steps of the data processing method described above. For example, the non-transitory computer readable storage medium may be the second memory 1932 described above including program instructions that are executable by the second processor 1922 of the electronic device 1900 to perform the data processing method described above.
In another exemplary embodiment, a computer program product is also provided, which comprises a computer program executable by a programmable apparatus, the computer program having code portions for performing the above-mentioned data processing method when executed by the programmable apparatus.
The preferred embodiments of the present disclosure are described in detail with reference to the accompanying drawings, however, the present disclosure is not limited to the specific details of the above embodiments, and various simple modifications may be made to the technical solution of the present disclosure within the technical idea of the present disclosure, and these simple modifications all belong to the protection scope of the present disclosure.
It should be noted that, in the foregoing embodiments, various features described in the above embodiments may be combined in any suitable manner, and in order to avoid unnecessary repetition, various combinations that are possible in the present disclosure are not described again.
In addition, any combination of various embodiments of the present disclosure may be made, and the same should be considered as the disclosure of the present disclosure, as long as it does not depart from the spirit of the present disclosure.
Claims (14)
1. A method of data processing, the method comprising:
acquiring environmental data at different moments within a first preset time length;
predicting a plurality of aerosol concentrations corresponding to the environmental data at the different times through a pollutant prediction model;
determining a marginal contribution of the environmental data to the aerosol concentration at a first time instant based on the aerosol concentration at the first time instant and an average of the plurality of aerosol concentrations, comprising: subtracting the average value of the aerosol concentration at a plurality of different moments from the marginal contribution value of the aerosol concentration of the single combination related to the environmental data, and multiplying the average value by the weight of the single combination to obtain the numerical value of the single combination; and obtaining the marginal contribution value of the environment data according to the summation of the numerical values of the plurality of single combinations.
2. The data processing method of claim 1, wherein the environmental data comprises at least one of meteorological pollution data and differential data, the meteorological pollution data comprising meteorological data and pollutant data, the differential data comprising meteorological differential data and pollutant differential data;
the weather difference data is a difference value between the weather data at the previous moment and the weather data at the current moment, and the pollutant difference data is a difference value between the pollutant data at the previous moment and the pollutant data at the current moment.
3. The data processing method of claim 1, wherein the pollutant prediction model is trained by:
and training the first model by taking the environmental sample within a second preset time as training data and taking the target aerosol concentration sample as a label to obtain the pollutant prediction model.
4. A method of data processing according to claim 3, wherein the target aerosol concentration sample is determined by:
determining a ratio interval formed by a first target ratio and a second target ratio from a plurality of first ratios between the organic carbon concentration and the element carbon concentration, wherein the first target ratio is smaller than the second target ratio;
determining a plurality of candidate aerosol solubility samples according to a plurality of third ratios in the ratio interval, and elemental carbon concentrations and total organic carbon concentrations corresponding to the plurality of third ratios;
determining the target aerosol concentration sample from the plurality of candidate aerosol concentration samples.
5. The data processing method of claim 4, wherein determining a ratio interval formed by a first target ratio and a second target ratio from a plurality of first ratios between the organic carbon concentration and the elemental carbon concentration comprises:
determining a plurality of second ratios of the integrated contaminant to the primary contaminant at different times;
determining a third target ratio and a fourth target ratio from the plurality of second ratios, wherein the third target ratio is smaller than the fourth target ratio;
and taking an interval formed between a first target ratio which is at the same time as the third target ratio and a second target ratio which is at the same time as the fourth target ratio as the ratio interval.
6. The data processing method of claim 4, wherein the determining the target aerosol concentration sample from the plurality of candidate aerosol concentration samples comprises:
determining a plurality of correlations between the plurality of candidate aerosol concentration samples and the elemental carbon concentration;
and determining a candidate aerosol concentration sample corresponding to the minimum correlation from the plurality of correlations as the target aerosol concentration sample.
7. The data processing method of claim 4, wherein determining a plurality of candidate aerosol solubility samples according to a plurality of third ratios in the ratio interval, and the elemental carbon concentrations and the total organic carbon concentrations corresponding to the plurality of third ratios comprises:
determining a plurality of organic carbon concentrations discharged for one time according to a plurality of third ratios in the ratio interval and the element carbon concentrations corresponding to the third ratios;
determining a plurality of candidate aerosol concentration samples according to the organic carbon concentrations of the plurality of primary discharges and the total organic carbon concentrations of the plurality of primary discharges and the secondary generation.
8. The data processing method of claim 1, wherein after determining the marginal contribution of the environmental data to the aerosol concentration at the first time, the method comprises:
and dividing the environmental data at different moments and the marginal contribution value of the environmental data to the aerosol concentration into different data sets.
9. The data processing method of claim 8, wherein the dividing the environmental data at different time points and the marginal contribution value of the environmental data to the aerosol concentration into different data sets comprises:
storing the environmental data at different moments into a first data set;
for the environmental data at any moment in the first data set, under the condition that the concentration of meteorological pollution data in the environmental data is smaller than a preset concentration, dividing the environmental data into a second data set, and under the condition that the meteorological pollution data in the environmental data is greater than the preset concentration, dividing the environmental data into a third data set;
for the environment data at any moment in the first data set, under the condition that the increment of the difference data in the environment data is smaller than a preset increment, the environment data is divided into a fourth data set, and under the condition that the increment of the difference data in the environment data is larger than the preset increment, the environment data is divided into a fifth data set.
10. The method of data processing according to claim 9, wherein, in a case where the marginal contribution value of the differential data in the first data set is less than the marginal contribution value of the meteorological pollution data, the method comprises, after determining the marginal contribution value of the environmental data to the aerosol concentration at the first time instant:
obtaining a first marginal contribution difference value of the target type data to the aerosol concentration within the first preset time period under a first data level according to the contribution average value of the target type data in the second data set and the contribution average value of the target type data in the first data set;
obtaining a second marginal contribution difference value of the target type data to the aerosol concentration within the first preset time period under a second data level according to the contribution average value of the target type in the third data set and the contribution average value of the target type data in the first data set;
the second data level is greater than the first data level.
11. The method of data processing according to claim 9, wherein, in a case where the marginal contribution value of the differential data in the first data set is greater than the marginal contribution value of the meteorological pollution data, the method comprises, after determining the marginal contribution value of the environmental data to the aerosol concentration at the first time instant:
obtaining a difference value of the contribution average value of the target type data in the fourth data set and the contribution average value of the target type data in the first data set, wherein the difference value of the target type data in the first preset time period to the third margin contribution of the aerosol concentration is obtained in the first difference level;
obtaining a fourth boundary contribution difference value of the target type data to the aerosol concentration within the first preset time period under a second differential level according to the contribution average value of the target type data in the fifth data set and the contribution average value of the target type data in the first data set;
the second differential level is greater than the first differential level.
12. A data processing apparatus, characterized in that the apparatus comprises:
the acquisition module is configured to acquire environmental data at different moments within a first preset time length;
a prediction module configured to predict, by a pollutant prediction model, a plurality of aerosol concentrations corresponding to the environmental data at the different times;
a contribution determination module configured to determine a marginal contribution of the environmental data to the aerosol concentration at a first time based on the aerosol concentration at the first time and an average of the plurality of aerosol concentrations, comprising: subtracting the average value of the aerosol concentration at a plurality of different moments from the marginal contribution value of the aerosol concentration of the single combination related to the environmental data, and multiplying the average value by the weight of the single combination to obtain the numerical value of the single combination; and obtaining the marginal contribution value of the environment data according to the summation of the numerical values of the plurality of single combinations.
13. A non-transitory computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the data processing method of any one of claims 1 to 11.
14. An electronic device, comprising:
a memory having a computer program stored thereon;
a processor for executing the computer program in the memory to carry out the steps of the data processing method of any one of claims 1 to 11.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211158663.7A CN115238596B (en) | 2022-09-22 | 2022-09-22 | Data processing method and device, readable storage medium and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211158663.7A CN115238596B (en) | 2022-09-22 | 2022-09-22 | Data processing method and device, readable storage medium and electronic equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115238596A CN115238596A (en) | 2022-10-25 |
CN115238596B true CN115238596B (en) | 2023-01-31 |
Family
ID=83667185
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211158663.7A Active CN115238596B (en) | 2022-09-22 | 2022-09-22 | Data processing method and device, readable storage medium and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115238596B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116758342B (en) * | 2023-06-01 | 2023-12-15 | 中国地质科学院矿产资源研究所 | Atmospheric pollution grade assessment method and device based on rare earth mineral area |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10990885B1 (en) * | 2019-11-26 | 2021-04-27 | Capital One Services, Llc | Determining variable attribution between instances of discrete series models |
CN112734086A (en) * | 2020-12-24 | 2021-04-30 | 贝壳技术有限公司 | Method and device for updating neural network prediction model |
CN112784986A (en) * | 2021-02-08 | 2021-05-11 | 中国工商银行股份有限公司 | Feature interpretation method, device, equipment and medium for deep learning calculation result |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021158796A1 (en) * | 2020-02-05 | 2021-08-12 | Informed Data Systems Inc. D/B/A One Drop | Forecasting and explaining user health metrics |
-
2022
- 2022-09-22 CN CN202211158663.7A patent/CN115238596B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10990885B1 (en) * | 2019-11-26 | 2021-04-27 | Capital One Services, Llc | Determining variable attribution between instances of discrete series models |
CN112734086A (en) * | 2020-12-24 | 2021-04-30 | 贝壳技术有限公司 | Method and device for updating neural network prediction model |
CN112784986A (en) * | 2021-02-08 | 2021-05-11 | 中国工商银行股份有限公司 | Feature interpretation method, device, equipment and medium for deep learning calculation result |
Non-Patent Citations (1)
Title |
---|
中国城市工业化发展与PM2.5的关系:兼论EKC曲线形成的内在机制;李雅男 等;《环境科学》;20200430;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN115238596A (en) | 2022-10-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111753426B (en) | Method and device for analyzing source of particulate pollution | |
CN116227749B (en) | Method and device for determining pollutant emission amount, storage medium and electronic equipment | |
CN115238596B (en) | Data processing method and device, readable storage medium and electronic equipment | |
CN112505254B (en) | Method and device for analyzing atmospheric pollution source, storage medium and terminal | |
CN113155939A (en) | Online volatile organic compound source analysis method, system, equipment and medium | |
CN112001520B (en) | Weather forecast method, weather forecast device, computer equipment and storage medium | |
Sun et al. | The drivers and health risks of unexpected surface ozone enhancements over the Sichuan Basin, China, in 2020 | |
CN114896783A (en) | Method and device for evaluating air quality improvement effect | |
CN115759365A (en) | Photovoltaic power generation power prediction method and related equipment | |
CN113888381B (en) | Pollutant Concentration Forecasting Method and Device | |
CN115271547A (en) | Ozone pollution source analysis method and device and electronic equipment | |
CN112801423B (en) | Method and device for identifying abnormity of air quality monitoring data and storage medium | |
CN115271258B (en) | Method and device for predicting ozone main control pollutants and electronic equipment | |
CN116862081A (en) | Operation and maintenance method and system for pollution treatment equipment | |
CN116307268A (en) | Carbon emission prediction method and system based on polluted site restoration process | |
Haq et al. | IoT based air quality and weather monitoring system with android application | |
CN117610895A (en) | Method and device for determining heavy point pollution source management and control time, electronic equipment and medium | |
CN116228501B (en) | Pollution discharge exceeding area industry determining method and device, storage medium and electronic equipment | |
CN110399658B (en) | Method, device, equipment and storage medium for calculating acceleration factor value of battery | |
CN112710623A (en) | Method and equipment for remotely sensing and monitoring diffusion range and concentration of toxic and harmful gas | |
CN116776073B (en) | Pollutant concentration evaluation method and device | |
CN115238245B (en) | Pollutant monitoring method and device, storage medium and electronic equipment | |
CN114974452B (en) | Method and device for determining control target of secondary conversion source | |
CN117310500A (en) | Battery state classification model construction method and battery state classification method | |
CN116644884A (en) | Method and device for linking regional environment tracking evaluation and pollution discharge permission |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |