The content of the invention
The invention aims to solve, isomeric data in the prior art can not be stored, transmission bandwidth can not meet needs
Defect solved the above problems there is provided a kind of acquisition and storage method for being used for isomeric data between sewage treatment plant.
To achieve these goals, technical scheme is as follows:
A kind of acquisition and storage method for isomeric data between sewage treatment plant, comprises the following steps:
Data dictionary is defined, Ge Jia sewage treatment plants define the data dictionary of oneself;
Called data, data access module obtains sewage treatment plant's field data from multiple sewage treatment plants, is stored in data
In buffer pool;
Data are converted, and Data buffer periodically writes the data of data access module as the self-explanatory number comprising data dictionary
According to file;
Data file is stored, and Data buffer periodically parses self-explanatory data file, data dictionary and gathered data are deposited
Store up in NoSQL databases.
Described definition data dictionary comprises the following steps:
Treatment plant's dictionary information in each sewage treatment plant's difference log-on data dictionary, including treatment plant's Name & Location letter
Breath, generates unique treatment plant's mark;
Data dictionary, including technological process dictionary, equipment dictionary, collection point dictionary, technological process are defined using csv file
Dictionary, equipment dictionary, collection point dictionary are constituted by gauge outfit and example;Gauge outfit includes necessary field and extended field, necessary word
The dictionary information that section must provide for each treatment plant, extended field is according to the increased information of actual conditions;Wherein:
Technological process dictionary is the definition to treatment plant's sewage treatment process, and the necessary field that gauge outfit is included includes mark ID
With the title of flow, example is each nodal information for constituting sewage treatment plant's handling process;
Equipment dictionary is the definition to various equipment in treatment plant, and the necessary field that gauge outfit is included includes mark ID, equipment
Process node belonging to name and equipment, example is the various equipment in treatment plant;
Collection point dictionary is the definition to data collection point, and the necessary field that gauge outfit is included includes mark ID, collection point and retouched
State, the field that collection point is shown in form, generation form when calculation formula, the type of gathered data, the unit of data, data
Maximin, equipment corresponding device information and the virtual point data of normal range (NR), virtual point data are fixed with virtual_ prefixes
Justice;
Data dictionary csv file is associated with treatment plant dictionary information, set up treatment plant's title and respective handling factory number
According to the correspondence of dictionary.
Described called data comprises the following steps:
Data are reported interface to be supplied to Ge Jia sewage treatment plants by data access module in the way of web server;
Reported data is set up in sewage treatment plant, and reported data includes the mark ID and gathered data for the treatment of plant, gathered data
Include acquisition time, collection point identification ID, collection numerical value, treatment plant mark ID and collection point identification ID, from the number for the treatment of plant
ID and collection point identification ID are identified according to corresponding treatment plant is transferred in dictionary;
Reported data is encoded using Json forms and sent to Data buffer, and its data format is as follows:
factory_id:Obtained treatment plant is registered from data dictionary module and identifies ID;data:Array set;Data bags
Include data below form:datatime:The time of signaling point collection;signal_id:The signaling point defined in data dictionary module
Identify ID;value:Signal point values.
Described data conversion comprises the following steps:
Data buffer receives the reported data of Json forms using FIFO mechanism, and the data read out are deleted from buffer pool
Remove and be transmitted to memory module, be processed into self-explanatory data file, the time when filename of self-explanatory data file is to read
Stamp name;
Self-explanatory data file is periodically generated by Data buffer, self-explanatory data file includes dictionary portion and data
Part, treatment plant's dictionary information is added to before dictionary portion, will processing in technological process ID, device id, collection point ID
Factory ID be added in before as prefix;The gauge outfit of data division is collection point ID, acquisition time, collection numerical value, before the ID of collection point
Treatment plant ID is added as prefix in face, and collection numerical value is the interior all gathered datas preserved of process cycle in buffer pool.
Described data file storage comprises the following steps:
Set the process cycle of memory module consistent with the process cycle of Data buffer;
NoSQL databases design, NoSQL database defined in data dictionary table sum consistent with self-explanatory data file
According to table, the storage organization of NoSQL databases is defined as key-value pairs towards row, each row of data composition structure is:
Rowkey, column_name, column_value,
Wherein:
In the storage organization of data dictionary table, rowkey is mark ID, column_name in document time stamp+dictionary
It is the value of correspondence ID attribute for other attribute-names in dictionary, column_value;
In the storage organization of tables of data, rowkey be signaling point ID+ acquisition times stamp, column_name be svalue or
Dictionary_version, when column_name is svalue, column_value is correspondence signaling point ID collection number
Value, when column_name is dictionary_version, column_value is corresponding to rowkey in data dictionary table
Timestamp;
Definition according to data dictionary table and tables of data stores dictionary portion and data division respectively, by memory module
In data Cun Chudao NoSQL databases.
Also include the not processed judgement into self-explanatory data file of data file, it comprises the following steps:
In the data dictionary table of NoSQL databases, timestamp maximum in rowkey is obtained, all data files are traveled through
Filename, if timestamp in filename is more than timestamp maximum in database, then it represents that the not processed mistake of this document;
The file of not processed mistake is handled one by one, parses data file, dictionary portion and data division are stored respectively.
Beneficial effect
A kind of acquisition and storage method for isomeric data between sewage treatment plant of the present invention, subtracts compared with prior art
The reported data bandwidth consumption of few sewage treatment plant, effective integration Duo Jia sewage treatment plants isomeric data, and solve mass data
Storage problem.By the configuration of data dictionary, specification defines the collection point information of sewage treatment plant.By the way that collection point is belonged to
Property information and data separating, reduce the complexity that data are reported.By adding data dictionary in the data file, make file
With self-explanatory, it is ensured that data are complete in time attribute, historical data is allowed to keep associating with corresponding historical status, very
Restoring data scene, increases the reliability of historical data analysis on the spot.
Embodiment
To make to have a better understanding and awareness to architectural feature of the invention and the effect reached, to preferably
Embodiment and accompanying drawing coordinate detailed description, are described as follows:
As shown in figure 1, a kind of acquisition and storage method for isomeric data between sewage treatment plant of the present invention, bag
Include following steps:
The first step, defines data dictionary, and Ge Jia sewage treatment plants define the data dictionary of oneself.Each sewage treatment plant's note
Volume succeeds after the information of oneself, generates unique treatment plant's mark ID, and the data dictionary configured afterwards is associated with our factory ID.Number
The explanation and illustration to gathered data according to dictionary, define it is each collection point data attribute, including its corresponding device information,
Technological process information and sewage treatment plant's information.It is comprised the following steps that:
(1)Treatment plant's dictionary information in each sewage treatment plant's difference log-on data dictionary, including treatment plant's title
(name)And positional information(location), generate unique treatment plant's mark(factory_id).Treatment plant's dictionary information is
Each corresponding sewage treatment plant's information, unique treatment plant's determination is carried out by unique treatment plant mark.
(2)Data dictionary is defined using csv file, according to data dictionary, ID letters in collection point need to be only carried in reported data
Breath, is to belong in which the platform equipment in which technological process of which treatment plant, collection with regard to that can inquire a reported data
Point, which represents what meaning, data format and unit, is what, data normal range (NR) are how many, how to calculate and shown in form
The range of information such as value, so each reported data, which is given tacit consent to, has taken many explanatory information, it is convenient in generation form and
Unscrambling data during data analysis.Data dictionary also includes technological process dictionary, equipment dictionary, collection point dictionary.Technological process word
Allusion quotation, equipment dictionary and collection point dictionary are constituted by gauge outfit and example;Gauge outfit includes necessary field and extended field, and English is used respectively
Word segment table shows that first character section is fixed as identifying ID.Necessary field is the dictionary information that each treatment plant must provide, and is expanded
It is according to the increased information of actual conditions to open up field;When sewage treatment plant needs to sort out some gathered datas to enter line number
During according to analysis or technique upgrading, the extended field of oneself can be defined.Example be treatment plant in constitute the specific of the dictionary
One point, is distinguished by identifying ID.Each several part dictionary is separated by line Separator.
Wherein:
Technological process dictionary is the definition to treatment plant's sewage treatment process, and the necessary field that gauge outfit is included includes mark ID
process_id(Identified in each treatment plant unique)With the title process_name of flow(Technological process nodename),
Example is each nodal information of composition sewage treatment plant's handling process, such as coarse rack, fine fack, aeration tank etc..
Equipment dictionary is the definition to various equipment in treatment plant, and the necessary field that gauge outfit is included includes mark ID
device_id(Identified in each treatment plant unique), device name device_name and the process node belonging to equipment
process_id(It can be sky, that is, be not belonging to any technological process node), example is the various equipment in treatment plant, such as rouses
Blower fan, elevator pump etc..
Collection point dictionary is the definition to data collection point, and the necessary field that gauge outfit is included includes mark ID, collection point and retouched
State, the field that collection point is shown in form, generation form when calculation formula, the type of gathered data, the unit of data, data
Maximin, equipment corresponding device information and the virtual point data of normal range (NR), virtual point data are fixed with virtual_ prefixes
Justice.Dictionary necessary field in collection point is more, as follows:
signal_id:ID is identified, is identified in each treatment plant unique.Identify and use for virtual signal
" virtual_ " field adds numerical identity, such as:“virtual_001”.Manual signal is identified using " manual_ ", such as:
“manual_001”。
signal_desc:Collection point is described.
signal_mark:The field that collection point is shown in form.
data_type:The type of gathered data, can be integer, floating type, 0/1 semaphore etc..
unit:The unit of data.
calc_expr:Generate calculation formula during form.
min:The minimum value of the lowest critical point of collection point early warning value, i.e. data normal range (NR).
max:The maximum of the maximum critical point of collection point early warning value, i.e. data normal range (NR).
device_id:Equipment corresponding device information, can be sky.
Collection point dictionary is except the point reported in an automated manner of physical presence, also comprising virtual by manually reporting
Some chemical examination, administration data points, can make a distinction, such as virtual point is fixed using virtual_ prefixes during definition in mark ID
Justice.
It is the example of one data word allusion quotation csv file below:
## technological process dictionary portions
process_id,process_name
Proc_001, coarse rack
……
------------(Dictionary separator)
## equipment dictionary portions
device_id,device_name,process_id
Dev_001, left coarse rack, proc_001
……
------------(Dictionary separator)
## collection points dictionary portion
signal_id,signal_desc,signal_mark,data_type,unit,calc_expr,min,max,
device_id
Virtual_001, intake PH, PH_IN, float, nothing, avg, 5.0,11.0, proc_001, dev_001
……
The pattern of wherein dictionary separator is only used as example, can arbitrarily define.
(3)Data dictionary csv file is associated with treatment plant dictionary information, set up treatment plant's title and respective handling factory
The correspondence of data dictionary.Csv file is associated with treatment plant ID so that can inquire alignment processing factory according to factory_id
Data dictionary.
Second step, called data, data access module obtains sewage treatment plant's field data from multiple sewage treatment plants, deposits
Enter in Data buffer.It is comprised the following steps that:
(1)Data are reported interface to be supplied to Ge Jia sewage treatment plants by data access module in the way of web server.
Due to there are multiple sewage treatment plants, then there is the select permeability of interface, entered using web server modes of the prior art
OK.When needing selection A factories, data are then reported interface to be supplied to A factories by data access module, are now set up A factories and are connect with data
Enter the data communication between module;When if desired selecting B factories, data are then reported interface to be supplied to B factories by data access module, this
Data communication between Shi Jianli B factories and data access module.
(2)Reported data is set up in sewage treatment plant, and reported data includes the mark ID and gathered data for the treatment of plant, gathers number
According to acquisition time, collection point identification ID, collection numerical value, treatment plant mark ID and collection point identification ID is included, from treatment plant
Corresponding treatment plant mark ID and collection point identification ID are transferred in data dictionary.
(3)Reported data is encoded using Json forms and sent to Data buffer, and its data format is as follows:
factory_id:Obtained treatment plant is registered from data dictionary module and identifies ID;data:Array set;
Collection in worksite numerical value is dictionary set, and form is { " datetime ":“yyyy-mm-dd hh:mm:ss”,
“signal_id”:“001”,“value”:" 123.3 " }, wherein
datatime:The time of signaling point collection;signal_id:The signal point identification ID defined in data dictionary module;
value:Signal point values.
Example is as follows:
[{"factory_id":"jnshw_01","data":[{"datetime":"yyyy-mm-dd hh:mm:ss","
signal_id":"auto_001","value":"123.3"}, {"datetime":"yyyy-mm-dd hh:mm:
ss.zzz","signal_id":"auto_002","value":"123.3"},{"datetime":"yyyy-mm-dd hh:
mm:ss.zzz","signal_id":"manual_001","value":"123.3"}]}]。
3rd step, data conversion, Data buffer is periodically write the data of data access module comprising data dictionary as
Self-explanatory data file.It is comprised the following steps that:
(1)Data buffer uses FIFO(FIFO)Mechanism receives the reported data of Json forms, the number read out
According to being deleted from buffer pool and being transmitted to memory module, self-explanatory data file, the filename of self-explanatory data file are processed into
Timestamp name during reading.Explain that data file effect is to be matched dictionary portion and data division, data division
Matched one by one with the dictionary portion corresponding to it.Because each treatment plant's self-defining data dictionary, if data dictionary
Definition is not matched with data division, then can not correctly explain the content of data division.The filename of self-explanatory data file is to read
When taking timestamp name effect be in order to judge whether data file not processed into self-explanatory data file in next step,
The self-explanatory data file of generation is named with the timestamp of start to process, the similar 20140401102000.csv of form, below for
Example:
## treatment plants dictionary portion
factory_id,factory_name,factory_location
Fac_001, the first treatment plant, x areas of x cities x roads x
……
------------(Dictionary separator)
## technological process dictionary portions
process_id,process_name
Fac_001-proc_001, coarse rack
……
------------(Dictionary separator)
## equipment dictionary portions
device_id,device_name,process_id
Fac_001-dev_001, left coarse rack, fac_001-proc_001
……
------------(Dictionary separator)
## collection points dictionary portion
signal_id,signal_desc,signal_mark,data_type,unit,calc_expr,min,max,
device_id
Fac_001-virtual_001, intake PH, PH_IN, float, nothing, avg, 5.0,11.0, fac_001-dev_
001
……
------------(Dictionary separator)
## actual acquired datas part
signal_id,datetime,value
fac_001-virtual_001, 2014-03-31 23:00:10,9.0
fac_001-virtual_001, 2014-03-31 23:00:20,9.0
……
(2)Self-explanatory data file is periodically generated by Data buffer, Data buffer is first received according to FIFO mechanism
To data be first read.Buffer pool sets the cycle of digital independent, depending on according to its size and the quantity of access treatment plant
Phase generates oneself instrument of interpretation.Cycle is unsuitable oversize, because the data crossed before long period may result in are capped, typically with
Hour is period treatment, if data volume less can also be one day etc..Self-explanatory data file includes dictionary portion and data portion
Point, in dictionary portion, treatment plant's dictionary information is added to before dictionary portion, in technological process ID, device id, collection point
In ID by treatment plant ID be added in before as prefix, form global unique mark.In data division, the gauge outfit of data division
For collection point ID, acquisition time, collection numerical value, treatment plant ID is added before the ID of collection point as prefix, the overall situation is formed only
One mark.It is the interior all gathered datas preserved of process cycle, i.e., one hour in buffer pool to gather numerical value(One day)It is interior to receive
All gathered datas arrived.
4th step, data file storage, memory module periodically parses self-explanatory data file, by data dictionary and collection number
According to storage into NoSQL databases.
After self-explanatory data file is resolved, it is saved in NoSQL databases and is used for inquiry.The self-explanatory number being parsed
Release memory space can be directly deleted according to file, the file system of backup can also be saved in.NoSQL databases are NoSQL
, structured database towards row.NoSQL databases data storage in the form of a table, table is made up of row and column, and row is by row master
Key rowkey, attribute column_name and property value column_value compositions, are all preserved with character string forms.Data structure
It is as shown in table 1 below:
rowkey | column_name | column_value |
rowkey_001 | column_001 | value_101 |
rowkey_001 | column_002 | value_102 |
rowkey_002 | column_001 | value_201 |
rowkey_002 | column_003 | value_301 |
The data structure of the NoSQL databases of table 1
Row is made up of rowkey+ column_name, so difference rowkey attribute column_name can be different
(Such as rowkey_002 does not have column_002, there is column_003), attribute easily extends.Storage physically according to
Rowkey lexcographical order arrangement(Rowkey_001 is before rowkey_002)If what can be used when rowkey is by inquiring about is several
Dominant query conditional combination, it is possible to allow the row often read together to be stored together, with efficient query performance.For example make
Rowkey is constituted with [factory_id] _ [signal_id] _ [time] mode, then the data of same treatment plant are just protected
In the presence of together, the data of same signaling point are saved together in same treatment plant, the data of same signaling point be again by
According to time sequencing arrangement, to inquire about data of some signaling point of some factory within certain time will efficiency it is very high.
It specifically includes following steps:
(1)The process cycle of setting memory module is consistent with the process cycle of Data buffer, and Data buffer is typically set
For a hour, the cycle of processing module can also be more than, such as using day as the cycle, because sewage treatment industry demand is usually
Went out the form of the previous day at second day.
(2)NoSQL databases design, NoSQL database defined in data dictionary table consistent with self-explanatory data file
And tables of data, the storage organization of NoSQL databases is defined as key-value pairs towards row, each row of data composition structure is:
Rowkey, column_name, column_value,
Wherein:
In the storage organization of data dictionary table, rowkey is mark ID, column_name in document time stamp+dictionary
It is the value of correspondence ID attribute for other attribute-names in dictionary, column_value.Storage organization example is as follows:
rowkey | column_name | column_value |
1397808993fac_001 | factory_name | First treatment plant |
1397808993fac_001 | factory_location | X areas of x cities x roads x |
1397808993fac_001-proc_001 | process_name | Coarse rack |
1397808993fac_001-dev_001 | device_name | Left coarse rack |
The data store organisation of data dictionary table in the NoSQL databases of table 2
In the storage organization of tables of data, rowkey be signaling point ID+ acquisition times stamp, column_name be svalue or
Dictionary_version, when column_name is svalue, column_value is correspondence signaling point ID collection number
Value, when column_name is dictionary_version, column_value is corresponding to rowkey in data dictionary table
Timestamp.Storage organization example is as follows:
rowkey | column_name | column_value |
fac_001-virtual_001_1397804400 | svalue | 9.0 |
fac_001-virtual_001_1397804400 | dictionary_version | 1397808993 |
fac_001-virtual_001_1397804410 | svalue | 9.0 |
fac_001-virtual_001_1397804410 | dictionary_version | 1397808993 |
The data store organisation of tables of data in the NoSQL databases of table 3
Because rowkey is arranged according to lexcographical order, the character length that ensure each rowkey is consistent.
ID and timestamp can determine a maximum length, failing to maximum length is reached, can be with " 0 " polishing.
(3)Definition according to data dictionary table and tables of data stores dictionary portion and data division respectively, by memory module
In data Cun Chudao NoSQL databases in, complete data storage.
In order to ensure the accuracy of data storage, data file is prevented not processed into self-explanatory data file situation
Appearance, the not processed judgement into self-explanatory data file of data file can also be included.
(4)In the data dictionary table of NoSQL databases, timestamp maximum in rowkey is obtained, all data are traveled through
The filename of file, if the timestamp in filename is more than timestamp maximum in database, then it represents that this document is not processed
Cross.What such as in the current database entitled rowkey009 of maximum time stamp file, rowkey009 were then handled well for last
Self-explanatory file, then it is not processed into self-explanatory file more than rowkey009 in the data file.Travel through all data texts
The filename of part, sees if there is the filename more than rowkey009, such as rowkey010, rowkey011, if so, then illustrating
Both of these documents does not carry out the data of self-explanatory file, then handles the file of not processed mistake one by one, data file is parsed, by word
Allusion quotation part and data division are stored respectively, and at this moment maximum time stamp filename is then rowkey011 in current database.
The present invention separates collection point data with collection point attribute information, simplifies data and reports content, makes many sewage
The isomeric data for the treatment of plant can be merged, and can dynamically change collection point without changing in the case that storage logical sum reads logic
Attribute.
General principle, principal character and the advantages of the present invention of the present invention has been shown and described above.The technology of the industry
Personnel are it should be appreciated that the present invention is not limited to the above embodiments, and that described in above-described embodiment and specification is the present invention
Principle, various changes and modifications of the present invention are possible without departing from the spirit and scope of the present invention, these change and
Improvement is both fallen within the range of claimed invention.The protection domain of application claims by appended claims and its
Equivalent is defined.