CN113268499B - Data acquisition method and device, data acquisition system and server - Google Patents
Data acquisition method and device, data acquisition system and server Download PDFInfo
- Publication number
- CN113268499B CN113268499B CN202110622833.1A CN202110622833A CN113268499B CN 113268499 B CN113268499 B CN 113268499B CN 202110622833 A CN202110622833 A CN 202110622833A CN 113268499 B CN113268499 B CN 113268499B
- Authority
- CN
- China
- Prior art keywords
- data
- terminal
- target
- data set
- field
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 31
- 239000013598 vector Substances 0.000 claims description 38
- 230000008859 change Effects 0.000 claims description 11
- 230000005540 biological transmission Effects 0.000 claims description 9
- 238000012795 verification Methods 0.000 claims description 8
- 238000003860 storage Methods 0.000 claims description 5
- 238000004590 computer program Methods 0.000 claims description 3
- 238000007405 data analysis Methods 0.000 abstract description 3
- 238000004891 communication Methods 0.000 description 10
- 238000004458 analytical method Methods 0.000 description 9
- 230000006399 behavior Effects 0.000 description 7
- 238000012545 processing Methods 0.000 description 7
- 238000011161 development Methods 0.000 description 5
- 238000013523 data management Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 238000007726 management method Methods 0.000 description 3
- 238000006243 chemical reaction Methods 0.000 description 2
- 230000005058 diapause Effects 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 230000004927 fusion Effects 0.000 description 2
- 238000011068 loading method Methods 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 230000002688 persistence Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000004140 cleaning Methods 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000000802 evaporation-induced self-assembly Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/23—Updating
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Information Transfer Between Computers (AREA)
Abstract
The embodiment of the invention relates to a data acquisition method, a data acquisition device, a data acquisition system and a server, wherein the method comprises the following steps: the server receives a first data set which is acquired and reported by the terminal through a first data access mode; performing integrity check on the first data set; and when the first data set is determined to not pass the integrity check, controlling the terminal to acquire and report a second data set through a second data access mode, wherein the second data access mode is different from the first data access mode. Therefore, the combined application of at least two data access modes can be realized, and compared with the method which only adopts a single data access mode, the method can enable collected data to be more comprehensive and better meet the data analysis requirement.
Description
Technical Field
The embodiment of the invention relates to the technical field of data processing, in particular to a data acquisition method, a data acquisition device, a data acquisition system and a server.
Background
A large number of business systems, such as video monitoring systems, intelligent fire systems, access visitor systems, hotel PMS (Production MANAGEMENT SYSTEM) systems, energy management systems, intelligent lighting systems, etc., are integrated in the intelligent park.
When operation analysis tasks such as passenger flow statistics, hot-sale/diapause commodity analysis, order information analysis and the like are carried out, or algorithms such as prediction, recommendation, decision making and the like are applied, a large number of data of a service system are required to be accessed into a unified data management platform, so that the accessed data are processed and analyzed by the data management platform, and persistence is carried out to support subsequent data fusion.
Disclosure of Invention
In view of the above, the embodiment of the invention provides a data acquisition method, a data acquisition device, a data acquisition system and a server.
In a first aspect, an embodiment of the present invention provides a data acquisition method, applied to a server, where at least one terminal is connected to the server, and the terminal supports at least two data access modes, where the method includes:
receiving a first data set acquired and reported by the terminal through a first data access mode;
Performing integrity check on the first data set;
And when the first data set is determined to not pass the integrity check, controlling the terminal to acquire and report a second data set through a second data access mode, wherein the second data access mode is different from the first data access mode.
Optionally, the performing integrity check on the first data set includes:
acquiring a data demand table;
Analyzing at least one target field to be acquired from the data demand table;
Determining, for each of the target fields, whether there is target data in the first dataset that matches the target field;
If target data matched with the target fields exist in the first data set aiming at each target field, determining that the first data set passes the integrity check;
if target data matched with the target field does not exist in the first data set aiming at any target field, determining that the first data set does not pass the integrity check.
Optionally, the determining whether the first data set has target data matching the target field includes:
Determining the semantic vector of the target field and determining the semantic vector of the field to which each first data in the first data set belongs;
Determining the distance between the semantic vector of the target field and the semantic vector of the field to which each piece of first data belongs;
if any distance is larger than a set distance threshold value, determining that target data matched with the target field exists in the first data set;
And if any distance is not existed and is larger than the distance threshold value, determining that the target data matched with the target field does not exist in the first data set.
Optionally, the controlling the terminal to collect and report the second data set in a data access mode other than the first data access mode among at least two data access modes includes:
Determining a data query statement, wherein the data query statement is used for indicating that data of a set field is queried from the terminal, and the set field refers to a target field in the first data set, wherein no target data matched with the set field exists in the first data set;
and sending the data query statement to the terminal so that the terminal executes the data query statement and reports the queried second data set.
Optionally, the determining the data query statement includes:
acquiring metadata information of a database in the terminal, wherein the database comprises a plurality of data tables;
Determining a target data table containing the setting field from the database according to the metadata information;
And constructing a data query statement for indicating that the setting field is queried from the target data table.
Optionally, the metadata information at least includes table names of a plurality of the data tables;
the determining, according to the metadata information, a target data table containing the setting field from the database includes:
respectively determining the set fields and semantic vectors corresponding to the table names;
For each table name, determining the distance between the semantic vector corresponding to the table name and the semantic vector corresponding to the setting field;
and selecting a target data table from the data tables, wherein the distance corresponding to the target data table meets a set condition.
Optionally, the first data access manner includes at least one of the following: a buried point access mode, a captured data changing mode and an interface access mode;
When the first data access mode includes the buried point access mode, the receiving the first data set collected and reported by the terminal through the first data access mode includes: receiving a first data set which is acquired and reported by the terminal when a preset buried point trigger event is detected, wherein the first data set comprises buried point data corresponding to the buried point trigger event;
When the first data access mode includes the captured data change mode, the receiving the first data set collected and reported by the terminal through the first data access mode includes: receiving a first data set acquired and reported by the terminal when the database is captured to be changed, wherein the first data set at least comprises changed data in the database;
When the first data access mode includes the interface access mode, the receiving the first data set collected and reported by the terminal through the first data access mode includes: and receiving a first data set which is queried and reported by the terminal calling a preset data query interface.
In a second aspect, an embodiment of the present invention provides a data acquisition system, where the data acquisition system includes a server and at least one terminal, at least one of the terminals accesses the server, and the terminal supports at least two data access modes;
the terminal collects and reports a first data set to the server through a first data access mode;
The server performs integrity check on the first data set;
and when the server determines that the first data set does not pass the integrity check, controlling the terminal to acquire and report a second data set in a second data access mode, wherein the second data access mode is different from the first data access mode.
Optionally, the data acquisition system further includes: a message transmission component;
And the terminal reports the first data set/the second data set to the server through the message transmission component.
Optionally, the terminal includes at least one of: the system comprises a buried point component, a capturing component and an interface component, wherein the interface component at least comprises a data query interface;
When the terminal comprises the buried point component, the terminal acquires and reports the first data set to the server when a preset buried point trigger event is detected through the buried point component, wherein the first data set comprises buried point data corresponding to the buried point trigger event;
When the terminal comprises the capturing component, the terminal acquires and reports the first data set to the server when the capturing component captures that the database is changed, wherein the first data set at least comprises changed data in the database;
when the terminal comprises the interface component, the terminal queries data by calling the data query interface in the interface component, and reports the queried data to the server as the first data set.
In a third aspect, an embodiment of the present invention provides a data acquisition device, applied to a server, where at least one terminal is connected to the server, and the terminal supports at least two data access modes, where the device includes:
The access module is used for receiving a first data set acquired and reported by the terminal through a first data access mode;
The verification module is used for carrying out integrity verification on the first data set;
The access module is further configured to control the terminal to collect and report a second data set through a second data access mode when it is determined that the first data set fails the integrity check, where the second data access mode is different from the first data access mode.
In a fourth aspect, an embodiment of the present invention provides a server, including: a processor and a memory, the processor being configured to execute a data acquisition program stored in the memory, to implement the data acquisition method according to any one of the first aspects.
In a fifth aspect, an embodiment of the present invention provides a storage medium storing one or more programs executable by one or more processors to implement the data acquisition method of any one of the first aspects.
According to the technical scheme provided by the embodiment of the invention, after the server receives the first data set acquired and reported by the terminal through the first data access mode, the integrity check is carried out on the first data set, and when the fact that the first data set does not pass the integrity check is determined, the control terminal acquires and reports the second data set through at least two data access modes supported by the control terminal through the control terminal, wherein the second data access mode is different from the first data access mode, so that the at least two data access modes can be combined and applied, and compared with the mode of only adopting a single data access mode, the collected data can be more comprehensive, and the data analysis requirement can be better met.
Drawings
Fig. 1 is a schematic diagram of a system architecture of a data acquisition system according to an embodiment of the present invention;
FIG. 2 is a flowchart of an embodiment of a data acquisition method according to an embodiment of the present invention;
FIG. 3 is a block diagram of an embodiment of a data acquisition device according to an embodiment of the present invention;
Fig. 4 is a schematic structural diagram of an apparatus according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1, a schematic system architecture of a data acquisition system according to an embodiment of the present invention is provided. The data acquisition system 10 shown in fig. 1 includes: server 11, terminals 12 to 14. The terminals 12 to 14 may be hardware or software, and when the terminals are hardware, the terminals may be various electronic devices supporting data transmission, including but not limited to smart phones, tablet computers, desktop computers, laptop computers, and the like, and when the terminals are software, the terminals may be installed in the above-listed electronic devices, which may be implemented as a plurality of software or software modules, or may be implemented as a single software or software module.
Terminals 12-14 access server 11 and each terminal can support at least two data access modes. Here, the terminal accessing the server means that the terminal establishes a communication connection with the server, so that data transmission between the terminal and the server is enabled. Data access refers to loading data from a source (e.g., a terminal) to a destination (e.g., a server) through operations such as extraction, conversion, etc.
As an example of the terminal 12, the terminal 12 may include at least one of the following: the system comprises a buried point component, a capturing component and an interface component, wherein the interface component at least comprises a data query interface. In the embodiment of the present invention, the terminal 12 may collect the local data through any component described above, and report the collected data to the server 11, which also realizes that the data on the terminal 12 side is accessed to the server 11. As to how the access of the data on the terminal 12 side to the server 11 is realized in particular, it will be explained hereinafter by the flow shown in fig. 2, which is not described in detail here.
As one embodiment, the data acquisition system 10 further includes: a message transmission component 15. Alternatively, the message transmission component 15 is Kafka. In the data acquisition system 10 illustrated in fig. 1, the terminals 12 to 14 may report data to a server through the message transmission component 15 to implement data access.
It should be noted that the number of terminals shown in fig. 1 is merely illustrative, and in practice, data acquisition system 10 may include any number of terminals, and embodiments of the present invention are not limited in this regard.
Referring to fig. 2, a flowchart of an embodiment of a data acquisition method is provided in an embodiment of the present invention. As an embodiment, the method may be applied to the server 11 illustrated in fig. 1. As shown in fig. 2, the method may include the steps of:
step 201, a receiving terminal collects and reports a first data set through a first data access mode.
As an embodiment, the first data access manner includes at least one of the following: the buried point access mode, the captured data changing mode, and the interface access mode, that is, the first data access mode may be one of the three modes, or may be a combination of any two of the three modes, or may be a combination of the three modes.
When the first data access mode includes a buried point access mode, that is, when the terminal supports the buried point access mode, the terminal can collect buried point data and report the collected buried point data to the server when detecting a preset buried point trigger event. Thus, the server can receive the first data set which is collected and reported by the terminal when the terminal detects a preset buried point trigger event. Here, the first data set includes the above-described buried point data.
Specifically, in practice, the user (such as an operator, a developer, etc.) may set a buried point component on the terminal according to the requirement, so that the terminal may collect buried point data when detecting a buried point trigger event through the buried point component, and report the collected buried point data to the server.
Therefore, the embedded point access mode is a data access mode taking the requirement as a guide, the embedded point access mode is applied, corresponding information can be collected at the position to be monitored according to the requirement of a user, and the data of the terminal side is accessed to the server through an event triggering mechanism so as to carry out subsequent analysis.
In one example, the terminal may collect data of various behaviors (such as browsing behaviors, clicking behaviors, comment behaviors, praise behaviors, etc.) of the user at the terminal side through a buried point access manner, and report the collected behavior data to the server, so that the server performs multidimensional analysis on the behavior data of the user, restores the use scene of the user, and discovers the potential requirements of the user.
When the first data access mode includes a captured data change mode, that is, when the terminal supports the captured data change mode, the terminal can collect changed data when capturing that the local database is changed, and report the collected data to the server. Thus, the server can receive the first data set collected and reported by the terminal when the database is captured to be changed. Here, the first data set includes at least data changed in the database of the terminal.
Specifically, the capturing data changing mode is to build in a capturing component at the terminal side, and set a database to be monitored in the capturing component, so as to monitor the database at the terminal side through the capturing component, and capture changed data when the database is captured to change (such as operations of adding, deleting, changing, etc. the database). Here, the data in which the change occurs may include data before the change and data after the change.
The capture component may be, for example: canal, debezium, maxwell, flinkx, etc. The Canal and Maxwell support MySQL type databases, and the captured data changing mode can be realized only by opening a binlog log of the MySQL database; debezium and Flinkx support various types of databases, such as postgresql type databases, which require wal _level=logical to be set in a configuration file to a logical copy stream mode or the like, so as to implement the captured data modification mode.
When the first data access mode comprises an interface access mode, namely, the terminal supports the interface access mode, the terminal can call a local preset data query interface to query data and report the queried data to the server. Thus, the server can receive the first data set which is queried and reported by the terminal through calling a preset data query interface.
Specifically, in practice, a user (such as an operator, a developer, etc.) may set an interface component on the terminal according to the requirement, where the interface component at least includes a data query interface, and the terminal may periodically (for example, every 5 minutes) call the data query interface to perform data query in a manner of periodically calling the interface, and report the queried data as the first data set to the server.
As can be seen from the above description, the embedded point access mode, the captured data change mode, and the interface access mode have different data access mechanisms, and the three data access modes have respective advantages and disadvantages.
The buried point data collected by the buried point access mode is relatively clear and easy to analyze, but due to clear requirements, developers are required to cooperate, so that the implementation of the buried point data access mode can be difficult to advance under the condition that the terminal data of the buried point is relatively more, and after the terminal is stably operated, if new buried point requirements exist, secondary development of the terminal is inconvenient.
The data captured by the captured data changing mode can represent the data changing condition of the terminal side more completely and comprehensively, but the data captured by the mode is more complex in format and scattered, so that the server needs to execute a complex analysis process on the data reported by the terminal to analyze the association among a plurality of data, and then the real event information can be traced back. Therefore, the data of the terminal side is accessed to the server by adopting a captured data changing mode, and the data processing difficulty of the server side is increased.
The interface access mode has the advantages of simplicity and easiness in implementation, but the mode cannot meet the real-time requirement of data access, and has higher requirement on the compression resistance of the interface assembly.
In view of this, in a preferred implementation of the present invention, the first data access method may be set to a combination of any two of the above three, or a combination of the above three.
Step 202, performing integrity check on the first data set.
In the embodiment of the invention, the purpose of carrying out the integrity check on the first data set is as follows: it is checked whether the data in the first data set is complete. Here, whether the complete reference may be a user requirement, for example, a user provided data requirement table, which may include a plurality of fields required by the user. That is, when the first data set includes data of all fields required by the user, it means that the first data set is complete, whereas when the first data set includes data of only a part of fields required by the user, it means that the first data set is incomplete.
Specifically, as an alternative implementation, the integrity check of the first data set may be performed by: analyzing at least one target field to be acquired from the acquired data demand table, then determining whether data matched with the target field (hereinafter referred to as target data) exists in the first data set for each target field, and determining that the first data set passes the integrity check if the target data matched with the target field exists in the first data set for each target field; if there is no target data in the first data set that matches the target field for any (referring to one or several) target fields, it may be determined that the first data set fails the integrity check.
It should be noted that the target field may be directly parsed from the data requirement table, that is, the data requirement table includes the target field. The target field may also be obtained by parsing the data requirement table indirectly, that is, the data requirement table does not include the target field, but includes a field required by the user and a calculated relationship between fields used for obtaining the data of the field, in which case the target field may be obtained by parsing the calculated relationship between fields. For example, the data demand table includes a field of total transaction amount in the set history period, and the total transaction amount in the set history period is a statistic value obtained by data participation statistics of three fields of transaction time, transaction status (transaction success, transaction failure) and transaction amount, so that the three fields of transaction time, transaction status and transaction amount can be determined as the target fields.
Further, in this step 202, it may be determined whether there is target data in the first data set that matches the target field by the following procedure: determining a semantic vector of a target field and determining a semantic vector of a field to which each first data in the first data set belongs; determining the distance between the semantic vector of the target field and the semantic vector of the field to which each first data belongs; if any distance is greater than the set distance threshold, determining that the first data set has the target data matched with the target field, and if any distance is not greater than the distance threshold, determining that the first data set does not have the target data matched with the target field. It will be understood that if the distance between two semantic vectors is smaller, it means that the two semantic vectors are similar, so that the fields corresponding to the two semantic vectors are similar, and further, if the semantic vector of the target field is similar to the semantic vector of the field to which the first data belongs, the first data can be considered to be matched with the target field.
The target field or the field to which the first data belongs may be input to a trained semantic expression model as an optional implementation manner, so as to obtain a corresponding semantic vector. Here, the semantic expression model may be a model obtained by performing weak supervision training on a large number of fields, and may be, for example, a neural network model, a deep learning model, or the like.
Alternatively, the distance may be a cosine distance, a euclidean distance, a manhattan distance, or the like.
Step 203, when it is determined that the first data set fails to pass the integrity check, the control terminal collects and reports a second data set in a second data access mode, where the second data access mode is different from the first data access mode.
In the embodiment of the invention, if the first data set fails the integrity check, the first data set cannot meet the user requirement, so that in order to meet the user requirement, the embodiment of the invention provides that when the first data set fails the integrity check, the control terminal acquires and reports the second data set through a second data access mode different from the first data access mode in at least two data access modes.
As an embodiment, the second data access mode refers to a mode of performing a query using a data query statement. Based on this, in this step 203, when it is determined that the first data set fails the integrity check, a data query statement is determined, and the data query statement is sent to the terminal, so that the terminal executes the data query statement and reports the queried second data set. Here, the data query statement is used to indicate that data of at least one set field is queried from the terminal, and the set field refers to a target field in the first data set, for which there is no target data matching the set field.
It should be noted that, the preconditions of the above-described second data access manner are: the server has authority to access the terminal database, for example, the data query statement carries an account number and a password of the terminal database, so that the terminal receives the data query statement, authenticates the account number and the password carried by the data query statement, and executes the received data query statement after the authentication is passed. Through the processing, the safety of the terminal database can be ensured.
Specifically, as one embodiment, determining the data query statement includes: for each set field, metadata information of a database in the terminal is acquired, and a target data table containing the set field is determined from the database according to the metadata information. Finally, a data query statement is constructed that indicates that each set field is queried from each target data table. Further, the metadata information includes at least table names of a plurality of data tables, based on which a target data table including a setting field is determined from the database based on the metadata information, including: and respectively determining the set field and the semantic vector corresponding to each table name, determining the distance between the semantic vector corresponding to the table name and the semantic vector corresponding to the set field for each table name, selecting a target data table from the data tables, and enabling the distance corresponding to the target data table to meet the set condition.
Alternatively, the above setting condition may refer to: the distance corresponding to the target data table is larger than the distance corresponding to other data tables, or the distance corresponding to the target data table is smaller than the set distance threshold. Further, when the distance between the semantic vector corresponding to the table name and the semantic vector corresponding to the set field satisfies the set condition, it means that the greater the degree of association between the table name and the set field, so that when the distance between the semantic vector corresponding to the table name and the semantic vector corresponding to the set field satisfies the set condition, the data table corresponding to the table name is determined as the target data table containing the set field.
As another embodiment, before determining the data query statement, each setting field may be further output, then a selected setting field (hereinafter referred to as a target device field) is determined from the setting fields, and finally the data query statement for indicating that the target setting field is queried from the terminal is constructed using the above-described process of determining the data query statement. It can be seen that when the server determines that the first data set lacks the setting fields, the server may output each setting field to be selected from the setting fields of the requirement query by the user according to the actual situation. Here, the user may select all of the setting fields or may select part of the setting fields, which is not limited in the embodiment of the present invention. After the user selects the setting field, the server can then determine the selected target setting field and then construct a data query statement indicating to query the target setting field from the terminal.
Further, as an embodiment, the data query sentence may be output before the data query sentence is transmitted to the terminal, and then the data query sentence is transmitted to the terminal when an instruction message for instructing execution of the data query sentence is received. Through the processing, the user experience is improved, and meanwhile, the user can further confirm the automatically constructed data query statement so as to ensure the accuracy of the terminal-side data of the final access server.
Further, by displaying the automatically constructed data query statement to the user for the user to refer to, the effect of assisting the user in adjusting the subsequent development work can be achieved.
In addition, after the server receives the first data set and/or the second data set, operations such as filtering, cleaning, flattening, data format conversion, extraction, loading and the like can be performed on the received data, and the processed data can be persisted into a file system or a database according to a specified data format.
According to the technical scheme provided by the embodiment of the invention, after the server receives the first data set acquired and reported by the terminal through the first data access mode, the integrity check is carried out on the first data set, and when the fact that the first data set does not pass the integrity check is determined, the control terminal acquires and reports the second data set through at least two data access modes supported by the control terminal through the control terminal, wherein the second data access mode is different from the first data access mode, so that the at least two data access modes can be combined and applied, and compared with the mode of only adopting a single data access mode, the collected data can be more comprehensive, and the data analysis requirement can be better met.
In order to facilitate understanding of the embodiments of the present invention, the following illustrates an exemplary application scenario of the data acquisition method provided by the embodiments of the present invention:
In the smart park, a large number of business systems, such as video monitoring systems, smart fire systems, access visitor systems, hotel PMS systems, energy management systems, smart lighting systems, etc., are integrated. When operation analysis such as passenger flow statistics, hot-sale/diapause commodity analysis, order information analysis and the like is carried out, or when algorithms such as prediction, recommendation, decision and the like are applied, the data of the service system are required to be accessed into a unified data management platform, and the accessed data are subjected to persistence so as to support subsequent data fusion.
However, in practice, the development conditions of different service systems are different, which results in that a single data access mode cannot be suitable for the scenario with differences in the development conditions of different service systems, and in this way, the data access mode provided by the embodiment of the invention can provide multiple different data access modes for the same service system and multiple different data access modes for different service systems, thereby meeting the data access requirements of multiple service systems. In this application scenario, the service system corresponds to the role of the terminal in fig. 1, and the data management platform corresponds to the role of the server in fig. 1.
Further, by providing a plurality of different data access modes, the system can also provide convenience for development work of the developer on the data access of the service system, and the data access efficiency is improved macroscopically.
Referring to fig. 3, a block diagram of an embodiment of a data acquisition device is provided in an embodiment of the present invention. As an embodiment, the apparatus may be applied to the server illustrated in fig. 1, as shown in fig. 3, the apparatus includes:
The access module 31 is configured to receive a first data set collected and reported by the terminal through a first data access manner;
a verification module 32 for performing an integrity verification on the first data set;
The access module 31 is further configured to control the terminal to collect and report a second data set through a second data access mode when it is determined that the first data set fails the integrity check, where the second data access mode is different from the first data access mode.
Optionally, the verification module 32 is specifically configured to:
acquiring a data demand table; analyzing at least one target field to be acquired from the data demand table; determining, for each of the target fields, whether there is target data in the first dataset that matches the target field; if target data matched with the target fields exist in the first data set aiming at each target field, determining that the first data set passes the integrity check; if target data matched with the target field does not exist in the first data set aiming at any target field, determining that the first data set does not pass the integrity check.
Optionally, the checking module 32 determines whether there is target data in the first data set that matches the target field, including:
Determining the semantic vector of the target field and determining the semantic vector of the field to which each first data in the first data set belongs; determining the distance between the semantic vector of the target field and the semantic vector of the field to which each piece of first data belongs; if any distance is larger than a set distance threshold value, determining that target data matched with the target field exists in the first data set; and if any distance is not existed and is larger than the distance threshold value, determining that the target data matched with the target field does not exist in the first data set.
Optionally, the access module 31 controls the terminal to collect and report the second data set through a second data access manner, including:
determining a data query statement, wherein the data query statement is used for indicating that data of a set field is queried from the terminal, and the set field refers to a target field in the first data set, wherein no target data matched with the set field exists in the first data set; and sending the data query statement to the terminal so that the terminal executes the data query statement and reports the queried second data set.
Optionally, the determining, by the access module 31, a data query statement includes:
acquiring metadata information of a database in the terminal, wherein the database comprises a plurality of data tables; determining a target data table containing the setting field from the database according to the metadata information; and constructing a data query statement for indicating that the setting field is queried from the target data table.
Optionally, the metadata information at least includes table names of a plurality of the data tables; the access module 31 determines a target data table containing the setting field from the database according to the metadata information, including:
Respectively determining the set fields and semantic vectors corresponding to the table names; for each table name, determining the distance between the semantic vector corresponding to the table name and the semantic vector corresponding to the setting field; and selecting a target data table from the data tables, wherein the distance corresponding to the target data table meets a set condition.
Optionally, the first data access manner includes at least one of the following: a buried point access mode, a captured data changing mode and an interface access mode;
when the first data access manner includes the buried point access manner, the access module 31 is configured to receive a first data set collected and reported by the terminal when a preset buried point trigger event is detected, where the first data set includes buried point data corresponding to the buried point trigger event;
When the first data access manner includes the captured data change manner, the access module 31 is configured to receive a first data set collected and reported by the terminal when a change occurs in a captured database, where the first data set includes at least changed data in the database;
When the first data access manner includes the interface access manner, the access module 31 is configured to receive a first data set queried and reported by the terminal calling a preset data query interface.
The embodiment of the present invention further provides a server, as shown in fig. 4, including a processor 401, a communication interface 402, a memory 403, and a communication bus 404, where the processor 401, the communication interface 402, and the memory 403 complete communication with each other through the communication bus 404,
A memory 403 for storing a computer program;
The processor 401, when executing the program stored in the memory 403, implements the following steps:
The receiving terminal collects and reports a first data set in a first data access mode; performing integrity check on the first data set; and when the first data set is determined to not pass the integrity check, controlling the terminal to acquire and report a second data set through a second data access mode, wherein the second data access mode is different from the first data access mode.
The communication bus mentioned by the server may be a peripheral component interconnect standard (PERIPHERAL COMPONENT INTERCONNECT, abbreviated as PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture, abbreviated as EISA) bus, etc. The communication bus may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus.
The communication interface is used for communication between the server and other devices.
The memory may include random access memory (Random Access Memory, RAM) or may include non-volatile memory (non-volatile memory), such as at least one disk memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor.
The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, abbreviated as CPU), a network processor (Network Processor, abbreviated as NP), etc.; but may also be a digital signal processor (DIGITAL SIGNAL Processing, DSP), application Specific Integrated Circuit (ASIC), field-Programmable gate array (FPGA) or other Programmable logic device, discrete gate or transistor logic device, discrete hardware components.
The present application also provides a computer readable storage medium having stored thereon a computer program which when executed by a processor implements the steps of the data acquisition method provided by the embodiments of the present application.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of function in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied in hardware, in a software module executed by a processor, or in a combination of the two. The software modules may be disposed in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The foregoing description of the embodiments has been provided for the purpose of illustrating the general principles of the invention, and is not meant to limit the scope of the invention, but to limit the invention to the particular embodiments, and any modifications, equivalents, improvements, etc. that fall within the spirit and principles of the invention are intended to be included within the scope of the invention.
Claims (11)
1. The data acquisition method is characterized by being applied to a server, wherein at least one terminal is accessed to the server, and the terminal supports at least two data access modes, and the method comprises the following steps:
receiving a first data set acquired and reported by the terminal through a first data access mode;
Performing integrity check on the first data set;
When the first data set is determined to not pass the integrity check, the terminal is controlled to acquire and report a second data set through a second data access mode, wherein the second data access mode is different from the first data access mode;
The performing integrity check on the first data set includes:
acquiring a data demand table;
Analyzing at least one target field to be acquired from the data demand table;
Determining, for each of the target fields, whether there is target data in the first dataset that matches the target field;
If target data matched with the target fields exist in the first data set aiming at each target field, determining that the first data set passes the integrity check;
If target data matched with the target field does not exist in the first data set aiming at any target field, determining that the first data set does not pass the integrity check;
the controlling the terminal to collect and report a second data set through other data access modes except the first data access mode in at least two data access modes includes:
Determining a data query statement, wherein the data query statement is used for indicating that data of a set field is queried from the terminal, and the set field refers to a target field in the first data set, wherein no target data matched with the set field exists in the first data set;
and sending the data query statement to the terminal so that the terminal executes the data query statement and reports the queried second data set.
2. The method of claim 1, wherein the determining whether there is target data in the first dataset that matches the target field comprises:
Determining the semantic vector of the target field and determining the semantic vector of the field to which each first data in the first data set belongs;
Determining the distance between the semantic vector of the target field and the semantic vector of the field to which each piece of first data belongs;
if any distance is larger than a set distance threshold value, determining that target data matched with the target field exists in the first data set;
And if any distance is not existed and is larger than the distance threshold value, determining that the target data matched with the target field does not exist in the first data set.
3. The method of claim 1, wherein the determining a data query statement comprises:
acquiring metadata information of a database in the terminal, wherein the database comprises a plurality of data tables;
Determining a target data table containing the setting field from the database according to the metadata information;
And constructing a data query statement for indicating that the setting field is queried from the target data table.
4. A method according to claim 3, wherein said metadata information comprises at least table names of a plurality of said data tables;
the determining, according to the metadata information, a target data table containing the setting field from the database includes:
respectively determining the set fields and semantic vectors corresponding to the table names;
For each table name, determining the distance between the semantic vector corresponding to the table name and the semantic vector corresponding to the setting field;
and selecting a target data table from the data tables, wherein the distance corresponding to the target data table meets a set condition.
5. The method of claim 1, wherein the first data access means comprises at least one of: a buried point access mode, a captured data changing mode and an interface access mode;
When the first data access mode includes the buried point access mode, the receiving the first data set collected and reported by the terminal through the first data access mode includes: receiving a first data set which is acquired and reported by the terminal when a preset buried point trigger event is detected, wherein the first data set comprises buried point data corresponding to the buried point trigger event;
When the first data access mode includes the captured data change mode, the receiving the first data set collected and reported by the terminal through the first data access mode includes: receiving a first data set acquired and reported by the terminal when the database is captured to be changed, wherein the first data set at least comprises changed data in the database;
When the first data access mode includes the interface access mode, the receiving the first data set collected and reported by the terminal through the first data access mode includes: and receiving a first data set which is queried and reported by the terminal calling a preset data query interface.
6. The data acquisition system is characterized by comprising a server and at least one terminal, wherein at least one terminal is accessed to the server, and the terminal supports at least two data access modes;
the terminal collects and reports a first data set to the server through a first data access mode;
The server performs integrity check on the first data set;
When the server determines that the first data set does not pass the integrity check, the server controls the terminal to acquire and report a second data set in a second data access mode, wherein the second data access mode is different from the first data access mode;
The server is specifically configured to obtain a data demand table; analyzing at least one target field to be acquired from the data demand table; determining, for each of the target fields, whether there is target data in the first dataset that matches the target field; if target data matched with the target fields exist in the first data set aiming at each target field, determining that the first data set passes the integrity check; if target data matched with the target field does not exist in the first data set aiming at any target field, determining that the first data set does not pass the integrity check; determining a data query statement, wherein the data query statement is used for indicating that data of a set field is queried from the terminal, and the set field refers to a target field in the first data set, wherein no target data matched with the set field exists in the first data set; and sending the data query statement to the terminal so that the terminal executes the data query statement and reports the queried second data set.
7. The data acquisition system of claim 6, wherein the data acquisition system further comprises: a message transmission component;
And the terminal reports the first data set/the second data set to the server through the message transmission component.
8. The data acquisition system of claim 6 wherein the terminal comprises at least one of: the system comprises a buried point component, a capturing component and an interface component, wherein the interface component at least comprises a data query interface;
When the terminal comprises the buried point component, the terminal acquires and reports the first data set to the server when a preset buried point trigger event is detected through the buried point component, wherein the first data set comprises buried point data corresponding to the buried point trigger event;
When the terminal comprises the capturing component, the terminal acquires and reports the first data set to the server when the capturing component captures that the database is changed, wherein the first data set at least comprises changed data in the database;
when the terminal comprises the interface component, the terminal queries data by calling the data query interface in the interface component, and reports the queried data to the server as the first data set.
9. A data acquisition device, characterized in that it is applied to a server, said server having at least one terminal, said terminal supporting at least two data access modes, said device comprising:
The access module is used for receiving a first data set acquired and reported by the terminal through a first data access mode;
The verification module is used for carrying out integrity verification on the first data set;
The access module is further configured to control the terminal to collect and report a second data set through a second data access mode when it is determined that the first data set fails the integrity check, where the second data access mode is different from the first data access mode;
The verification module is specifically used for acquiring a data demand table; analyzing at least one target field to be acquired from the data demand table; determining, for each of the target fields, whether there is target data in the first dataset that matches the target field; if target data matched with the target fields exist in the first data set aiming at each target field, determining that the first data set passes the integrity check; if target data matched with the target field does not exist in the first data set aiming at any target field, determining that the first data set does not pass the integrity check;
The access module is specifically configured to determine a data query statement, where the data query statement is configured to instruct data in a setting field to be queried from the terminal, and the setting field refers to a target field in the first dataset, where no target data matching the target field exists; and sending the data query statement to the terminal so that the terminal executes the data query statement and reports the queried second data set.
10. A server for a server, which comprises a server and a server, characterized by comprising the following steps: a processor and a memory, the processor being configured to execute a data acquisition program stored in the memory to implement the data acquisition method of any one of claims 1 to 5.
11. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a computer program which, when executed by a processor, implements the steps of the method of any of claims 1-5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110622833.1A CN113268499B (en) | 2021-06-03 | 2021-06-03 | Data acquisition method and device, data acquisition system and server |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110622833.1A CN113268499B (en) | 2021-06-03 | 2021-06-03 | Data acquisition method and device, data acquisition system and server |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113268499A CN113268499A (en) | 2021-08-17 |
CN113268499B true CN113268499B (en) | 2024-08-13 |
Family
ID=77234400
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110622833.1A Active CN113268499B (en) | 2021-06-03 | 2021-06-03 | Data acquisition method and device, data acquisition system and server |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113268499B (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109150641A (en) * | 2017-06-15 | 2019-01-04 | 北京国双科技有限公司 | A kind of data acquisition, querying method, device, storage medium and processor |
CN111240936A (en) * | 2020-01-13 | 2020-06-05 | 北京点众科技股份有限公司 | Data integrity checking method and equipment |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109246035A (en) * | 2018-08-01 | 2019-01-18 | 平安科技(深圳)有限公司 | A kind of method and device of data transfer management |
CN111241850B (en) * | 2020-04-24 | 2020-07-17 | 支付宝(杭州)信息技术有限公司 | Method and device for providing business model |
-
2021
- 2021-06-03 CN CN202110622833.1A patent/CN113268499B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109150641A (en) * | 2017-06-15 | 2019-01-04 | 北京国双科技有限公司 | A kind of data acquisition, querying method, device, storage medium and processor |
CN111240936A (en) * | 2020-01-13 | 2020-06-05 | 北京点众科技股份有限公司 | Data integrity checking method and equipment |
Also Published As
Publication number | Publication date |
---|---|
CN113268499A (en) | 2021-08-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021174694A1 (en) | Operation and maintenance monitoring method and apparatus based on data center, device, and storage medium | |
CN110428127B (en) | Automatic analysis method, user equipment, storage medium and device | |
CN111143163B (en) | Data monitoring method, device, computer equipment and storage medium | |
CN110912738B (en) | Business exception processing method, device, platform and electronic equipment | |
CN110347582B (en) | Buried point testing method and device | |
CN107222331B (en) | method and device for monitoring performance of distributed application system, storage medium and equipment | |
CN110515808B (en) | Database monitoring method and device, computer equipment and storage medium | |
CN110784374A (en) | Method, device, equipment and system for monitoring operation state of service system | |
CN111046022A (en) | Database auditing method based on big data technology | |
CN111198797B (en) | Operation monitoring method and device and operation analysis method and device | |
CN113472787A (en) | Alarm information processing method, device, equipment and storage medium | |
CN110502366A (en) | Case executes method, apparatus, equipment and computer readable storage medium | |
CN105577472A (en) | Data acquisition test method and device | |
CN110851324A (en) | Log-based routing inspection processing method and device, electronic equipment and storage medium | |
CN117670033A (en) | Security check method, system, electronic equipment and storage medium | |
CN112148606B (en) | Buried point test method, buried point test device, buried point test equipment and computer readable medium | |
CN110347565B (en) | Application program abnormity analysis method and device and electronic equipment | |
CN113268499B (en) | Data acquisition method and device, data acquisition system and server | |
CN113448795B (en) | Method, apparatus and computer program product for obtaining system diagnostic information | |
CN113656391A (en) | Data detection method and device, storage medium and electronic equipment | |
CN111784176A (en) | Data processing method, device, server and medium | |
CN117493188A (en) | Interface testing method and device, electronic equipment and storage medium | |
CN105786865B (en) | Fault analysis method and device for retrieval system | |
CN108846634B (en) | Case automatic authorization method and system | |
CN113190458A (en) | Method and device for automatically analyzing buried point data, computer equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |