[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN107643956B - Method and apparatus for locating the origin of an anomaly in anomaly data - Google Patents

Method and apparatus for locating the origin of an anomaly in anomaly data Download PDF

Info

Publication number
CN107643956B
CN107643956B CN201710722887.9A CN201710722887A CN107643956B CN 107643956 B CN107643956 B CN 107643956B CN 201710722887 A CN201710722887 A CN 201710722887A CN 107643956 B CN107643956 B CN 107643956B
Authority
CN
China
Prior art keywords
data
node
abnormal
origin
nodes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710722887.9A
Other languages
Chinese (zh)
Other versions
CN107643956A (en
Inventor
钟媛媛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Jingdong Shangke Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN201710722887.9A priority Critical patent/CN107643956B/en
Publication of CN107643956A publication Critical patent/CN107643956A/en
Application granted granted Critical
Publication of CN107643956B publication Critical patent/CN107643956B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a method and a device for positioning the origin of an anomaly of anomalous data, and relates to the technical field of computers. One embodiment of the method comprises: comparing the data of the leaf nodes with the data of the corresponding preprocessing layer nodes, and when a certain preprocessing layer node is inconsistent with the corresponding leaf node, determining the preprocessing layer node as an abnormal origin and returning; when the abnormal data is not larger than the reference value, checking the integrity of each intermediate node except the preprocessing layer node, and when a certain intermediate node is incomplete, determining the intermediate node as an abnormal origin and returning; checking whether the aperture of each intermediate node except the preprocessing layer node is consistent with the standard aperture of the abnormal data, if the aperture of a certain intermediate node is not consistent with the aperture of the abnormal data, determining the intermediate node as the origin of the abnormality and returning. The implementation method can effectively avoid human errors, reduce the requirements on data exception handlers, and is quick and efficient.

Description

Method and apparatus for locating the origin of an anomaly in anomaly data
Technical Field
The invention relates to the technical field of computers, in particular to a method and a device for positioning the origin of an abnormality of abnormal data.
Background
The data warehouse is a relational database established by storing data in a specific mode for the convenience of multidimensional analysis and multi-angle presentation and is used for supporting decision analysis processing of enterprises or organizations.
For judging the reasons of data abnormity of the data warehouse, the only processing method at present is to completely manually perform investigation on the data warehouse, when the downstream feedback data of the data warehouse has problems, engineers start to apply the front end of the data warehouse to the bottom data source to perform downward investigation layer by layer, find one problem point to process one problem point, then re-run the problem point, or find out all the problem points to perform uniform processing, and then re-run the problem point.
In the process of implementing the invention, the inventor finds that at least the following problems exist in the prior art: the existing method for judging the reason of data abnormity of the data warehouse is carried out by pure manpower, if an irregular script (such as a few comments in the full script) is met, the difficulty and the cost are increased for the judgment and the processing work, the manpower cost is high, the possibility of human error is high, and the speed is slow. And the whole process has higher requirements on people for troubleshooting problems, the sources, the bottom layer processing logic and the business knowledge of problem data are required to be well known, otherwise, much time is wasted, the working efficiency is greatly reduced, even the direction is found wrongly, and the work is useless.
Therefore, a method and an apparatus for locating the origin of an anomaly of abnormal data are needed, which are fast and efficient, can effectively avoid human errors, and reduce the requirements of a data anomaly handler.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method and an apparatus for locating an abnormal origin of abnormal data, which can effectively avoid human errors, reduce requirements on a data exception handler, and are fast and efficient.
To achieve the above object, according to an aspect of the embodiments of the present invention, there is provided a method of locating an abnormal origin of abnormal data, the abnormal data corresponding to a logical relationship tree, a root node of the logical relationship tree being the abnormal data, leaf nodes being data tables of a data source, intermediate nodes being intermediate data tables involved in generation of the abnormal data,
the method comprises the following steps:
comparing the data of the leaf nodes with the data of the corresponding preprocessing layer nodes, wherein the preprocessing layer nodes are intermediate nodes generated after preprocessing of the data sources of the corresponding leaf nodes, when a certain preprocessing layer node is inconsistent with the corresponding leaf node, the preprocessing layer node is determined to be an abnormal origin and returns, and otherwise, the second step is executed;
step two, judging whether the abnormal data is larger than a corresponding reference value, if not, checking the integrity of each intermediate node except the preprocessing layer node, otherwise, executing step three, wherein if a certain intermediate node is incomplete, the intermediate node is determined to be an abnormal origin and returns, otherwise, executing step three;
and step three, checking whether the aperture of each intermediate node except the preprocessing layer node is consistent with the standard aperture of the abnormal data, and if the aperture of a certain intermediate node is not consistent with the aperture of the abnormal data, determining the intermediate node as an abnormal origin and returning.
Optionally, the logical relationship tree is obtained by cutting off a portion of the original business logical relationship tree corresponding to the abnormal data, which is irrelevant to the origin of the positioning abnormality.
Further, the method for locating the origin of the anomaly data provided by the embodiment of the invention further comprises: outputting a list of the determined origins of the anomalies.
Further, comparing the data of the leaf node with the data of the corresponding preprocessing layer node comprises:
obtaining a direct mapping relation tree of the abnormal data and the preprocessing layer nodes involved in the generation process of the abnormal data based on the logical relation tree;
comparing data of a preprocessing layer node in the direct map tree with data of the leaf node corresponding thereto.
Further, the checking whether the aperture of each intermediate node except the preprocessing layer node is consistent with the standard aperture of the abnormal data comprises:
acquiring a caliber list of each intermediate node except the preprocessing layer node according to the logical relationship tree;
and checking whether the caliber of the middle node in the caliber list is consistent with the caliber of the abnormal data.
To achieve the above object, according to another aspect of the embodiments of the present invention, there is also provided an apparatus for locating an origin of an exception of exception data, the exception data corresponding to a logical relationship tree, a root node of the logical relationship tree being the exception data, leaf nodes being data tables of a data source, intermediate nodes being intermediate data tables involved in generation of the exception data,
the device comprises:
the judgment module is used for comparing the data of the leaf nodes with the data of the corresponding preprocessing layer nodes in the first step, wherein the preprocessing layer nodes are intermediate nodes generated after the data sources of the corresponding leaf nodes are preprocessed, when a certain preprocessing layer node is inconsistent with the corresponding leaf node, the preprocessing layer node is determined to be an abnormal origin and returns, and otherwise, the second step is executed;
an integrity checking module, configured to determine whether the abnormal data is greater than a corresponding reference value, and if the abnormal data is not greater than the reference value, check integrity of each intermediate node except the preprocessing layer node, otherwise execute a third step, where if a certain intermediate node is incomplete, the intermediate node is determined to be an abnormal origin and returned, and otherwise execute the third step;
and an aperture checking module, configured to check whether the aperture of each intermediate node except the preprocessing layer node is consistent with the standard aperture of the abnormal data, and if the aperture of a certain intermediate node is not consistent with the aperture of the abnormal data, determine that the intermediate node is an origin of the abnormality and return the origin of the abnormality.
Further, the apparatus for locating the origin of the anomaly data provided by the embodiment of the present invention further includes: an output module to output a list of the determined origin of the anomaly.
Further, the liability determination module is further configured to obtain a direct mapping relationship tree of the abnormal data and the preprocessing layer node involved in the generation process of the abnormal data based on the logical relationship tree, and compare data of the preprocessing layer node in the direct mapping relationship tree with data of the corresponding leaf node.
Further, the caliber checking module is further configured to obtain a caliber list of each intermediate node except the preprocessing layer node according to the logical relationship tree, and check whether the calibers of the intermediate nodes in the caliber list are consistent with the calibers of the abnormal data.
In order to achieve the above object, according to another aspect of the embodiments of the present invention, there is also provided an electronic device for determining a cause of a data abnormality, the electronic device including:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement a method of locating an origin of an anomaly of anomaly data as provided by embodiments of the invention.
To achieve the above object, according to another aspect of the embodiments of the present invention, there is also provided a computer readable medium having stored thereon a computer program which, when executed by a processor, implements the method of locating an origin of an anomaly of anomaly data provided by the embodiments of the present invention.
According to the method and the device for positioning the abnormal origin of the abnormal data, which are provided by the embodiment of the invention, the original business logic relationship tree related to the abnormal data is pruned to obtain the corresponding logic relationship tree, so that the data table set where the direct factor causing the abnormal data to be abnormal is arranged, and then the problems of possible data source, incomplete data table and inconsistent aperture in the logic relationship tree are sequentially arranged according to the possibility of problems and the difficulty of arrangement from the problem of relatively high possibility and easy arrangement, so that the abnormal origin of the abnormal data is positioned. By the method provided by the invention, related personnel can self-help and quickly locate the origin of the abnormality and provide data table information related to the abnormality so as to facilitate subsequent problem processing and repairing, thereby shortening the waiting time of a demander and informing the service processing progress in time. Compared with the existing positioning method which needs pure manual positioning and data exception processing, the method can effectively avoid human errors, reduce the requirements on problem troubleshooting handlers, and enable the problem troubleshooting handlers not to be processed only by developers.
Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.
Drawings
The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:
FIG. 1 is a flow chart of a method of locating an anomaly origin of anomaly data provided by an embodiment of the present invention;
fig. 2 is a schematic diagram of a logical relationship tree corresponding to abnormal index data F according to an embodiment of the present invention;
FIG. 3 is a schematic application flow diagram of a method for locating an anomaly origin of anomaly data according to an embodiment of the present invention;
FIG. 4 is a diagram illustrating a mapping relation tree in which an anomaly indicator F directly depends on a data cleaning layer in a data warehouse according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of an apparatus for locating the origin of an anomaly in anomaly data provided by an embodiment of the present invention;
FIG. 6 is a schematic block diagram of a computer system suitable for use with the electronic device to implement an embodiment of the invention.
Detailed Description
Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
The embodiment of the invention provides a method for positioning the abnormal origin of abnormal data, which can be applied to a data warehouse or other databases with similar structures to position the abnormal origin of the abnormal data. For example, when an abnormal index data generated by the business application based on the data warehouse is found, the method provided by the present invention can be used to locate the abnormal origin of the abnormal index data in the data warehouse, and determine the data table with the problem, thereby facilitating the subsequent repair work. Of course, the anomaly data that is used to locate the origin of an anomaly using the method of the present invention may also be intermediate data in a data warehouse that is involved in generating certain index data.
In the method of the present invention, the abnormal data corresponds to a logical relationship tree, the logical relationship tree is a hierarchical set composed of data tables related to the abnormal data in the generation process according to the logical relationship for generating the data, the root node of the logical relationship tree is the abnormal data, the leaf node is the data table of the data source corresponding to the abnormal data, and the intermediate node is the intermediate data table related to the abnormal data in the generation process. The logic relation tree comprises field processing logic among the tables and conditions of sub-query, and can be obtained through a logic relation document of the service development related to the abnormal data.
In the invention, the logic relation document of the service development comprises an original service logic relation tree corresponding to abnormal data, and the logic relation tree is obtained by cutting off the part which is irrelevant to the origin of the positioning abnormality in the original service logic relation tree corresponding to the abnormal data. Because the original business logic relation tree corresponding to the abnormal data is very numerous and complex, some branch parts may not cause the abnormal occurrence of the abnormal data, or the probability of the problem occurring in the branch parts is very low, and the branch parts can be regarded as being irrelevant to the origin of the positioning abnormality. Therefore, in the invention, the original business logic relationship tree can be pruned according to the related experience, and the part irrelevant to the origin of the positioning exception is pruned to obtain the logic relationship tree used for positioning the origin of the exception in the subsequent step.
In the invention, when pruning is carried out on the original business logic relation tree, the position of the middle node of the original business logic relation tree can be changed to a certain degree according to the probability of the occurrence of the problem, so that the subsequent check of the consistency of the integrity and the caliber of the invention can be based on the obtained logic relation tree to preferentially check the node with higher probability of the occurrence of the problem, thereby more rapidly positioning the abnormal origin of the abnormal data. For example, when the integrity and the caliber of the middle node are checked to be consistent, the data downstream node of the logical relationship tree is checked to the data upstream node layer by layer, so that the position of the node with the large possibility of causing a problem can be correspondingly adjusted to the data downstream while pruning the original business logical relationship tree, and the node with the origin of the abnormality can be quickly positioned during checking.
As shown in fig. 1, the method for locating the origin of an anomaly of anomaly data provided by the present invention includes the following steps one, two and three.
In step one, comparing the data of the leaf node with the data of the corresponding preprocessing layer node, wherein when a certain preprocessing layer node is inconsistent with the corresponding leaf node, determining that the preprocessing layer node is the origin of the exception and returning.
The preprocessing layer node is an intermediate node generated after preprocessing a data source of a corresponding leaf node, and in the process of generating abnormal data, data source data is extracted, preprocessed and loaded into a system for generating abnormal data and stored in a preprocessing layer data table of the system, for example, for a data warehouse, the data source data is extracted, converted and loaded into a data table of a preprocessing layer of the data warehouse through ETL (Extract-Transform-Load).
Step one, comparing whether the data source data is consistent with the data of the preprocessing layer data table corresponding to the data source data, wherein the comparison comprises the following steps: and information such as numerical values and data volumes related to abnormal data in the data table, wherein the data volumes refer to the number of data records related to the abnormal data in the table within specific statistical time. Through the comparison, whether the data is abnormal caused by the problem of the data source or the problem of the preprocessing process can be judged. For example, since the preprocessing layer is extracted according to a certain logical increment or a full amount during the drawing, if the data source is changed and the drawing logic of the preprocessing layer does not change correspondingly, the data of the preprocessing layer is inconsistent with the data of the production system, so that the data generated by the subsequent logic is abnormal.
Therefore, when the data of a certain preprocessing layer data table is inconsistent with the data source data corresponding to the certain preprocessing layer data table, the preprocessing layer data table is determined to be the abnormal origin, wherein the abnormal origin may be multiple. In the invention, after the abnormal origin data table of the preprocessing layer is determined, the list of the determined abnormal origin is output, namely, the list of the data table of the preprocessing layer inconsistent with the corresponding data source data is output, and the list can be sent to a corresponding responsible person to inform the responsible person of processing, so that the corresponding responsible person can specifically repair the data source problem or the extraction, preprocessing and loading processes of the system bottom layer according to the list.
In the present invention, the first step of comparing the data of the leaf node with the data of the corresponding preprocessing layer node specifically includes: firstly, a direct mapping relation tree of abnormal data and preprocessing layer nodes involved in the generation process of the abnormal data is obtained based on a logic relation tree, and the purpose of the step one is to verify whether a problem occurs in a bottom layer data source and a preprocessing process of a system, so that only a preprocessing layer data table corresponding to the problem data needs to be found in the step one, and the problem data table is compared with the corresponding data source data. In this step, the direct mapping relation tree of the preprocessing layer node corresponding to the logical relation tree of the problem data is obtained, and other layer structure nodes between the problem data and the preprocessing layer node are omitted in the direct mapping relation tree, so that the preprocessing layer node corresponding to the problem data can be quickly and directly found through the direct mapping relation tree. The data of the preprocessing layer node in the direct map tree is then compared with the data of its corresponding leaf node.
In the application scenario faced by the method of the invention, as the abnormal data is probably caused by the problem of the bottom layer data source, the method firstly inspects the problem of the bottom layer data source when positioning the abnormal origin of the abnormal data, directly and quickly finds the related preprocessing layer node through the direct mapping relation tree of the abnormal data and the preprocessing layer node to carry out corresponding comparison inspection, returns to a caller of the method process of the invention after determining the abnormal origin, finishes positioning, and enables the abnormal origin to be rapidly determined and subsequently repaired through the first step for most of the abnormal data.
When the data of the leaf node corresponding to the abnormal data is consistent with the data of the corresponding preprocessing layer node through the comparison in the first step, it is indicated that the data source at the bottom layer of the system and the extraction, preprocessing and loading processes of the data source have no problem, and meanwhile, the range of the origin of the abnormal data can be reduced to the intermediate node of other layer structures between the preprocessing layer and the abnormal data in the logical relationship tree.
And step two, judging whether the abnormal data is larger than a corresponding reference value or not, and when the abnormal data is not larger than the reference value, checking the integrity of each intermediate node except the preprocessing layer node, wherein when a certain intermediate node is incomplete, determining that the intermediate node is an abnormal origin and returning. In this step, the magnitude of the abnormal data is first determined compared with the reference value thereof, where the reference value of the abnormal data is the standard value of the abnormal data under the non-abnormal condition, and the abnormal data is not equal to the reference value thereof, and the reference value may be determined empirically or obtained by other related systems, for example, a value may be predicted and evaluated as the reference value of the data according to the normal value of the data in the past every day/week/month.
And when the abnormal data is smaller than the reference value, checking the integrity of each intermediate node except the preprocessing layer node, namely checking whether the intermediate node data table has data missing. Wherein, the pretreatment layer node is checked in the first step, so the check is not performed in the second step.
In the present invention, checking the integrity of the intermediate node data table may include checking the integrity of partitions of the data table, and when a partition of an intermediate node data table is incomplete, determining that the intermediate node is an abnormal origin.
In the second step, the abnormal data being smaller than the reference value indicates that the reason for the abnormal data is that the data in the intermediate data table for generating the abnormal data is incomplete, because the abnormal data is lower than the corresponding reference value only when the related data is missing. Therefore, when a certain intermediate node is incomplete, the intermediate node is determined to be an abnormal origin, wherein the abnormal origin may be multiple, and after the abnormal origin is determined, the abnormal origin returns to a caller of the method process of the invention, and the positioning process of the invention is ended. In the invention, after the incomplete abnormal origin data table is determined, the list of the determined abnormal origin is output, namely the list of the data tables with data missing is output, and more specifically, in the output list, each data table can be listed corresponding to the missing partition. The list can be fed back to a corresponding person in charge to carry out subsequent historical complement and repair operations.
And if the intermediate node data tables are checked to be complete in the second step and the abnormal data are larger than the corresponding reference values, carrying out the subsequent caliber consistency check in the third step. In the third step, checking whether the caliber of each intermediate node except the preprocessing layer node is consistent with the standard caliber of the abnormal data, if the caliber of a certain intermediate node is not consistent with the caliber of the abnormal data, determining the intermediate node as the origin of the abnormality and returning. Wherein, the pretreatment layer node is checked in the first step, so the check is not performed in the second step.
Wherein, checking whether the caliber of the intermediate node is consistent with the standard caliber of the abnormal data specifically comprises: firstly, acquiring a list of the calibers of all intermediate nodes except the nodes of the preprocessing layer according to the logical relationship tree, wherein the list lists the calibers of data related to the abnormal data in all data tables of each layer in the logical relationship tree. Then, whether the caliber of the intermediate node in the caliber list is consistent with the caliber of the abnormal data or not is checked. The aperture refers to a statistical aperture, and in the logical relationship tree, the aperture of the root node is a set of the apertures of all descendant nodes thereof, that is, the standard aperture of the abnormal data is a union set of the apertures of the data table data involved in the generation process under the non-abnormal condition.
In the invention, whether the aperture of each intermediate node except the preprocessing layer node is consistent with the standard aperture of the abnormal data is checked, that is, whether the aperture of the data related to the abnormal data in each intermediate node data table belongs to the standard aperture set of the abnormal data is checked, if so, the aperture of the data table is determined to be consistent with the aperture of the abnormal data, otherwise, the data table is determined to be inconsistent, the data table is determined to be an abnormal origin, wherein the abnormal origin may be multiple, after the abnormal origin is determined, the caller of the method process of the invention is returned, and the positioning is ended. In the invention, after the abnormal origin data table with inconsistent calibers is determined, the data table list of the determined abnormal origin is output so that each table principal can repair the data table.
According to the possibility of problems and the difficulty of troubleshooting, the first step, the second step and the third step sequentially troubleshoot possible problems of data source, data table incompleteness and aperture inconsistency in the logical relation tree from the problem with relatively high possibility and easy troubleshooting, and the reason of abnormal data occurrence under normal conditions is gradually troubleshoot, so that the abnormal origin of the abnormal data is positioned.
The method for locating the origin of an anomaly in anomaly data provided by the present invention is described in more detail below in conjunction with a specific example.
In the example, the method for locating the abnormal origin of the abnormal data provided by the invention is used for locating the abnormal origin of the abnormal index data generated by the downstream business application of the data warehouse. Fig. 2 shows a logical relationship tree corresponding to the abnormal index data F, where the logical relationship tree is obtained by pruning an original service logical relationship tree in a logical relationship document actually developed by the index data F. In the logical relationship tree, the root node is index data F generated by the business application APP, the leaf nodes are data tables C1, C2, C3 … … C7 of the data source corresponding to the index data F, and the intermediate nodes represent the data table where the direct factor causing the abnormality of the index data F is located, and include: index summary layer data tables ADM1 and ADM2, general model layer data tables GDM1 and GDM2, an intermediate temporary layer data table TMP1 and data cleaning layer data tables FDM1, FDM2 and FDM3 … … FDM7 which are involved in the abnormal data generation process.
In the logical relationship tree, data of a data source is processed in a data warehouse through a data cleaning layer, an intermediate temporary layer, a general model layer, an index summary layer and a business application APP to generate index data F. The index summary layer is mainly used for storing various calculated indexes of various dimensions, and data statistical analysis, mining or various aggregation operations can be further performed on the index summary layer; the general model layer is a model which can describe the service condition of a certain subject domain and is abstracted by combining services according to the numerous and complicated bottom layer data of the certain subject domain in the data warehouse; the intermediate temporary layer is used for temporarily storing data in the model processing process; the data cleaning layer is the preprocessing layer mentioned above in the embodiments of the present invention, and the data source data is stored in the data cleaning layer after being extracted, cleaned, converted and loaded.
As shown in fig. 3, when the origin of an abnormality in the abnormality index data F is located, information such as an abnormality point and a table name of a direct source is first input, and as shown in table 1, four parameters are input, namely, an abnormality index name (e.g., a statistical index in the e-commerce field: offline sales cost), a table name of an abnormality index (adm _ s10_ spwms _ invt _ stock _ sum), a case where the abnormality index is compared with a standard index (i.e., a reference value) (whether the abnormality index is higher or lower than the reference standard, in this example, high), and a standard caliber of the abnormality index (in terms of financial settlement time, a sum of stock quotes of stock items sold on the day of all the statistical dates minus a sum of stock quotes of stock items returned on the day of the statistical dates). The name of the abnormal index is used for informing the system which index (which field is the database) has a problem, the table name of the input abnormal index is used for informing the system which table the abnormal index is in, and the approximate data table range directly depended on by the abnormal index can be determined according to the table name of the input abnormal index. The abnormal index is compared with the reference standard to assist in judging whether the partition integrity needs to be checked first in the subsequent process of locating the abnormal origin. The system can track, position and process the self-help problem based on the logic relation tree of the abnormal index according to the four input parameter systems, and the system can enter the disclaimer module firstly after the input is finished.
Figure BDA0001385352320000121
TABLE 1
And the responsibility judging module is used for comparing each data source data table (namely a production system table) with the data table of the data cleaning layer in the data warehouse, and mainly comparing the data quantity, the data value (money amount) and other related information related to the abnormal index F. The discriminant module finds the data cleaning layer data tables FDM1, FDM2, FDM3 … … FDM7 corresponding to the abnormal indexes based on the mapping relationship tree shown in fig. 4, on which the abnormal indexes F directly depend on the data cleaning layer in the data warehouse, and compares the data of the data cleaning layer data tables with the data of the production system data tables C1, C2, and C3 … … C7 corresponding to the data cleaning layer data tables.
If the data cleaning layer is inconsistent with the production system, the system outputs an inconsistent data table list to the corresponding responsible person, informs the responsible person of processing and feeds back the result. In the actual application process, the possibility that problems usually occur in the processes of cleaning, converting and loading data of a data source is very low, the problems often occur in a bottom-layer data source (namely a production system problem), for example, the data source is changed, and the data extraction logic of the data cleaning layer does not make corresponding changes, so that the data of the data cleaning layer is inconsistent with the data of the production system, and when the judgment module checks the above situation, the corresponding data cleaning layer data list is output to a corresponding responsible person to be subjected to extraction repair.
The judgment module can directly and quickly judge whether the abnormal origin is in the data source or the data warehouse through the comparison process, if the data warehouse data cleaning layer is consistent with the production system, the data source problem is eliminated, the next step of data warehouse self-checking processing is carried out, and the problem range is narrowed to a data application layer, an index summary layer, a general model layer or an intermediate temporary layer in the data warehouse.
And (4) performing self-checking inside the data warehouse, and screening all tables from the abnormal index F to a data cleaning layer (the abnormal index F to the FDM shown in the figure 2) in the logical relation tree. Firstly, judging whether the partition integrity condition needs to be checked firstly based on the condition that the abnormal index is compared with the reference standard in the data warehouse.
If the abnormal index is lower than the reference standard, the partition integrity of all tables from the abnormal index F to the data cleaning layer needs to be further checked, wherein the data cleaning layer does not need to be checked in the data warehouse internal self-checking process, and the data cleaning layer has no problem because the data warehouse internal self-checking process can be transferred.
The system checks the integrity of the data table layer by layer in the logical relationship tree, wherein preferably, the system starts to check the service application APP at the downstream of the data in the logical relationship tree layer by layer towards the upstream of the data until the general model layer GDM or the intermediate temporary layer TMP before the data cleaning layer is checked. Because the nodes at the downstream of the logic relation tree data are fewer than the nodes at the upstream, if the problem occurs at the downstream of the data, the abnormal origin can be more quickly positioned by adopting the checking sequence without passing through a plurality of data tables at the upstream, otherwise, the problem of the downstream node can be checked after a large number of upstream nodes are checked.
When the partition integrity condition of the data table is checked, the condition of the data table partition corresponding to the abnormal index counting time range is mainly verified, for example, the abnormal index counts the order quantity of 7, month and 1 days in 2017, when the partition integrity condition is checked, whether the partition of the relevant data table in 7, month and 1 days exists or not is checked, if the partition is missing, the partition of the data table is determined to be incomplete, and the system outputs an incomplete table list. For example, the statistical time range of the abnormal index is month 5, and when it is checked that a partition of month 5 of a certain data table is missing, as shown in table 2, the output list includes a table name gdm _ m10_ afs _ ser _ sum, the missing partition time range is separated by commas, and the list is fed back to the problem table leader, so that the task is automatically called up according to the list to perform historical complement and repair, and the result is fed back.
Table name Extent of partition
gdm_m10_afs_ser_sum 2017-05-10,2017-05-13,2017-05-14
TABLE 2
If the partitions of the data tables of all the intermediate nodes in the logical relation tree are complete or the abnormal indexes are higher than the standard indexes, the system automatically checks the caliber problem of the data tables of the intermediate nodes.
Similar to the integrity check, the system checks the aperture of the data table layer by layer in the logical relationship tree, wherein preferably, the inspection is performed layer by layer from the business application APP downstream of the data in the logical relationship tree to the data upstream until the generic model layer GDM or the intermediate temporary layer TMP before the data cleaning layer is checked.
And processing logic and sub-query conditions according to fields among tables in the logic relation tree, and sorting and outputting the aperture lists of the data of each intermediate node except the nodes of the data cleaning layer. And comparing the caliber list with the standard caliber of the input abnormal index. The caliber of the relevant data of each table in the caliber list belongs to a set of standard calibers (for example, the standard calibers in table 1 are calculated by financial settlement time, the sum of the warehouse quotes of spare part commodities sold on the day on the current day on all the statistical dates minus the sum of the warehouse quotes of spare part commodities returned on the current day on the statistical dates), if the relevant calibers belong to the set, the calibers of the two parties are consistent, the abnormal index F fed back by the previous service has no problem, the abnormal index F is fed back and informed to relevant personnel, otherwise, the calibers of the two parties are judged to be inconsistent, a list with inconsistent calibers is output and is led into a system, and the system informs the responsible personnel of each table to carry out script repair according to the list.
According to the method for positioning the abnormal origin of the abnormal data, which is provided by the embodiment of the invention, the original business logic relation tree related to the abnormal data is pruned to obtain the corresponding logic relation tree, so that the data table set where the direct factor causing the abnormal data to be abnormal is arranged, and then the problems of possible data source, incomplete data table and inconsistent caliber in the logic relation tree are sequentially checked according to the possibility of occurrence of problems and the difficulty degree of checking, starting from the problems of relatively high possibility and easiness in checking, so that the abnormal origin of the abnormal data is positioned. By the method provided by the invention, related personnel can self-help and quickly locate the origin of the abnormality and provide data table information related to the abnormality so as to facilitate subsequent problem processing and repairing, thereby shortening the waiting time of a demander and informing the service processing progress in time. Compared with the existing positioning method which needs pure manual positioning and data exception processing, the method can effectively avoid human errors, reduce the requirements on problem troubleshooting handlers, and enable the problem troubleshooting handlers not to be processed only by developers.
An embodiment of the present invention further provides an apparatus for locating an anomaly origin of anomaly data, as shown in fig. 5, where the apparatus 500 includes: a discriminant module 501, an integrity check module 502 and an aperture check module 503.
In the invention, the abnormal data corresponds to a logical relation tree, the logical relation tree is a hierarchical set formed by data tables related to the abnormal data in the generation process according to the logical relation for generating the data, the root node of the logical relation tree is the abnormal data, the leaf node is a data table of a data source corresponding to the abnormal data, and the intermediate node is an intermediate data table related to the abnormal data in the generation process. The logic relation tree comprises field processing logic among the tables and conditions of sub-query, and can be obtained through a logic relation document of the service development related to the abnormal data.
In the invention, the logic relation document of the service development comprises an original service logic relation tree corresponding to abnormal data, and the logic relation tree is obtained by cutting off the part which is irrelevant to the origin of the positioning abnormality in the original service logic relation tree corresponding to the abnormal data.
The discriminant module 501 is configured to compare data of a leaf node with data of a corresponding preprocessing layer node, where the preprocessing layer node is an intermediate node generated after preprocessing a data source of the corresponding leaf node, and when a certain preprocessing layer node is inconsistent with the corresponding leaf node, the preprocessing layer node is determined to be an origin of an exception and returns. The judgment module can directly and quickly judge whether the origin of the abnormality is in the data source or in the data warehouse or the database through the comparison process, if the data cleaning layer of the data warehouse is consistent with the production system, the problem of the data source is eliminated, and the subsequent internal self-checking processing is carried out.
The integrity check module 502 is configured to determine whether the abnormal data is greater than a corresponding reference value, and check, when the abnormal data is not greater than the reference value, integrity of each intermediate node except the preprocessing layer node, where, when a certain intermediate node is incomplete, the intermediate node is determined to be an origin of the abnormality and returns.
The caliber checking module 503 is configured to check whether calibers of the intermediate nodes except the preprocessing layer node are consistent with a standard caliber of the abnormal data, and if the caliber of a certain intermediate node is not consistent with the caliber of the abnormal data, determine that the intermediate node is an origin of the abnormality and return the origin of the abnormality.
The device for locating the abnormal origin of the abnormal data further comprises an output module, and the output module is used for outputting the list of the determined abnormal origin.
The discriminant module 501 is further configured to obtain a direct mapping relationship tree of the abnormal data and the preprocessing layer node involved in the generation process thereof based on the logical relationship tree, and compare the data of the preprocessing layer node in the direct mapping relationship tree with the data of the corresponding leaf node.
The caliber checking module 503 is further configured to obtain a caliber list of each intermediate node except the preprocessing layer node according to the logical relationship tree, and check whether the calibers of the intermediate nodes in the caliber list are consistent with the calibers of the abnormal data.
The device for positioning the abnormal origin of the abnormal data provided by the embodiment of the invention is used for pruning the original business logic relationship tree related to the abnormal data to obtain the corresponding logic relationship tree, so as to arrange the data table set where the direct factor causing the abnormal data to be abnormal is positioned, and then sequentially searching the problems of possible data source, incomplete data table and inconsistent caliber in the logic relationship tree according to the possibility of problems and the difficulty of the searching from the problems of relatively high possibility and easy searching, so as to position the abnormal origin of the abnormal data. By the method provided by the invention, related personnel can self-help and quickly locate the origin of the abnormality and provide data table information related to the abnormality so as to facilitate subsequent problem processing and repairing, thereby shortening the waiting time of a demander and informing the service processing progress in time. Compared with the existing positioning method which needs pure manual positioning and data exception processing, the method can effectively avoid human errors, reduce the requirements on problem troubleshooting handlers, and enable the problem troubleshooting handlers not to be processed only by developers.
Referring now to FIG. 6, there is illustrated a schematic block diagram of a computer system X00 suitable for use in implementing an electronic device of an embodiment of the invention. The electronic device shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 6, the computer system X00 includes a Central Processing Unit (CPU) X01, which can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM) X02 or a program loaded from a storage portion X08 into a Random Access Memory (RAM) X03. In the RAM X03, various programs and data necessary for the operation of the system X00 are also stored. The CPU X01, ROM X02, and RAM X03 are connected to each other via a bus X04. An input/output (I/O) interface X05 is also connected to bus X04.
The following components are connected to the I/O interface X05: an input portion X06 including a keyboard, a mouse, and the like; an output portion X07 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker and the like; a storage portion X08 including a hard disk and the like; and a communication section X09 including a network interface card such as a LAN card, a modem, or the like. The communication section X09 performs communication processing via a network such as the internet. A drive X10 is also connected to I/O interface X05 as required. A removable medium X11 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive X10 as needed, so that a computer program read out therefrom is mounted in the storage section X08 as needed.
In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication part X09, and/or installed from a removable medium X11. The above-described functions defined in the system of the present invention are executed when the computer program is executed by a Central Processing Unit (CPU) X01.
It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present invention may be implemented by software or hardware. The described modules may also be provided in a processor, which may be described as: a processor includes a discipline module, an integrity check module, and an aperture check module. Where the names of these modules do not in some cases constitute a limitation on the module itself, for example, the accountability module may also be described as a "module for comparing data of a leaf node with data of a corresponding pre-processing layer node".
As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to comprise:
the abnormal data corresponds to a logical relation tree, the root node of the logical relation tree is the abnormal data, the leaf nodes are data tables of the data source, the intermediate nodes are intermediate data tables involved in the abnormal data generation process,
comparing the data of the leaf nodes with the data of the corresponding preprocessing layer nodes, wherein the preprocessing layer nodes are intermediate nodes generated after preprocessing of the data sources of the corresponding leaf nodes, and when a certain preprocessing layer node is inconsistent with the corresponding leaf node, the preprocessing layer node is determined to be an abnormal origin and returns;
judging whether the abnormal data is larger than a corresponding reference value or not, and when the abnormal data is not larger than the reference value, checking the integrity of each intermediate node except the preprocessing layer node, wherein when a certain intermediate node is incomplete, determining that the intermediate node is an abnormal origin and returning;
checking whether the aperture of each intermediate node except the preprocessing layer node is consistent with the standard aperture of the abnormal data, if the aperture of a certain intermediate node is not consistent with the aperture of the abnormal data, determining that the intermediate node is an abnormal origin and returning.
The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (11)

1. A method for locating the origin of an anomaly in anomalous data, said anomalous data corresponding to a logical relationship tree, the root node of said logical relationship tree being the anomalous data, leaf nodes being data tables of the data source, intermediate nodes being intermediate data tables involved in the generation of the anomalous data,
the method comprises the following steps:
comparing the data of the leaf nodes with the data of the corresponding preprocessing layer nodes, wherein the preprocessing layer nodes are intermediate nodes generated after preprocessing of the data sources of the corresponding leaf nodes, when a certain preprocessing layer node is inconsistent with the corresponding leaf node, the preprocessing layer node is determined to be an abnormal origin and returns, and otherwise, the second step is executed; the pretreatment layer is a data cleaning layer, and the pretreatment is data cleaning;
step two, judging whether the abnormal data is larger than a corresponding reference value, when the abnormal data is not larger than the reference value, checking the integrity of each intermediate node except the pretreatment layer node, otherwise, executing step three; when the abnormal data is not larger than the reference value, when the integrity of each intermediate node except the preprocessing layer node is checked, if a certain intermediate node is incomplete, the intermediate node is determined to be an abnormal origin and returns, and if all the intermediate nodes are intact, the third step is executed;
and step three, checking whether the aperture of each intermediate node except the preprocessing layer node is consistent with the standard aperture of the abnormal data, and if the aperture of a certain intermediate node is not consistent with the aperture of the abnormal data, determining the intermediate node as an abnormal origin and returning.
2. The method of claim 1, wherein the logical relationship tree is obtained by pruning a portion of an original business logical relationship tree corresponding to the anomaly data that is not related to the origin of the positioning anomaly.
3. The method of claim 1, further comprising: outputting a list of the determined origins of the anomalies.
4. The method of claim 1, wherein comparing the data of the leaf node with the data of the corresponding preprocessing layer node comprises:
obtaining a direct mapping relation tree of the abnormal data and the preprocessing layer nodes involved in the generation process of the abnormal data based on the logical relation tree;
comparing data of a preprocessing layer node in the direct map tree with data of the leaf node corresponding thereto.
5. The method of claim 1, wherein said checking whether the aperture of each intermediate node other than the preprocessing layer node is consistent with the standard aperture of the exception data comprises:
acquiring a caliber list of each intermediate node except the preprocessing layer node according to the logical relationship tree;
and checking whether the caliber of the middle node in the caliber list is consistent with the caliber of the abnormal data.
6. An apparatus for locating an origin of an exception for exception data, the exception data corresponding to a logical relationship tree, a root node of the logical relationship tree being the exception data, leaf nodes being data tables of a data source, intermediate nodes being intermediate data tables involved in the generation of the exception data,
the device comprises:
the judgment module is used for comparing the data of the leaf nodes with the data of the corresponding preprocessing layer nodes in the first step, wherein the preprocessing layer nodes are intermediate nodes generated after the data sources of the corresponding leaf nodes are preprocessed, when a certain preprocessing layer node is inconsistent with the corresponding leaf node, the preprocessing layer node is determined to be an abnormal origin and returns, and otherwise, the second step is executed; the pretreatment layer is a data cleaning layer, and the pretreatment is data cleaning;
an integrity checking module, configured to determine whether the abnormal data is greater than a corresponding reference value, and if the abnormal data is not greater than the reference value, check integrity of each intermediate node except the pre-processing layer node, otherwise, execute step three; when the abnormal data is not larger than the reference value, when the integrity of each intermediate node except the preprocessing layer node is checked, if a certain intermediate node is incomplete, the intermediate node is determined to be an abnormal origin and returns, and if all the intermediate nodes are intact, the third step is executed;
and an aperture checking module, configured to check whether the aperture of each intermediate node except the preprocessing layer node is consistent with the standard aperture of the abnormal data, and if the aperture of a certain intermediate node is not consistent with the aperture of the abnormal data, determine that the intermediate node is an origin of the abnormality and return the origin of the abnormality.
7. The apparatus of claim 6, further comprising:
an output module to output a list of the determined origin of the anomaly.
8. The apparatus of claim 6, wherein the accountability module is further configured to obtain a direct-mapped relationship tree of the abnormal data and the preprocessing layer node involved in the generation of the abnormal data based on the logical relationship tree, and compare data of the preprocessing layer node in the direct-mapped relationship tree with data of the corresponding leaf node.
9. The apparatus of claim 6, wherein the caliber checking module is further configured to obtain a caliber list of each intermediate node except the preprocessing layer node according to the logical relationship tree, and check whether calibers of the intermediate nodes in the caliber list are consistent with calibers of the abnormal data.
10. An electronic device for determining a cause of a data abnormality, comprising:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-5.
11. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-5.
CN201710722887.9A 2017-08-22 2017-08-22 Method and apparatus for locating the origin of an anomaly in anomaly data Active CN107643956B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710722887.9A CN107643956B (en) 2017-08-22 2017-08-22 Method and apparatus for locating the origin of an anomaly in anomaly data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710722887.9A CN107643956B (en) 2017-08-22 2017-08-22 Method and apparatus for locating the origin of an anomaly in anomaly data

Publications (2)

Publication Number Publication Date
CN107643956A CN107643956A (en) 2018-01-30
CN107643956B true CN107643956B (en) 2020-09-01

Family

ID=61110186

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710722887.9A Active CN107643956B (en) 2017-08-22 2017-08-22 Method and apparatus for locating the origin of an anomaly in anomaly data

Country Status (1)

Country Link
CN (1) CN107643956B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108429636B (en) * 2018-02-01 2021-11-23 创新先进技术有限公司 Method and device for positioning abnormal system and electronic equipment
CN109254986A (en) * 2018-08-31 2019-01-22 阿里巴巴集团控股有限公司 A kind of determination method and device of abnormal data
CN109144884A (en) * 2018-09-29 2019-01-04 平安科技(深圳)有限公司 Program error localization method, device and computer readable storage medium
CN111367775B (en) * 2018-12-26 2023-11-14 北京嘀嘀无限科技发展有限公司 Problem node positioning method, computer device, and computer-readable storage medium
CN110471962B (en) * 2019-07-05 2023-11-03 中国平安人寿保险股份有限公司 Method and system for generating active data report
CN112668660B (en) * 2020-12-31 2024-07-12 新奥数能科技有限公司 Abnormal point detection method and device based on time sequence data

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102117306A (en) * 2010-01-04 2011-07-06 阿里巴巴集团控股有限公司 Method and system for monitoring ETL (extract-transform-load) data processing process
CN102650992A (en) * 2011-02-25 2012-08-29 国际商业机器公司 Method and device for generating binary XML (extensible markup language) data and locating nodes of the binary XML data
CN105302657A (en) * 2015-11-05 2016-02-03 网易宝有限公司 Abnormal condition analysis method and apparatus
CN105760383A (en) * 2014-12-16 2016-07-13 阿里巴巴集团控股有限公司 Method and device for detecting index alteration in ETL (extract-transform-load) task
CN105897922A (en) * 2016-05-30 2016-08-24 乐视控股(北京)有限公司 Data transmission method and device
CN106709024A (en) * 2016-12-28 2017-05-24 深圳市华傲数据技术有限公司 Data table source-tracing method and device based on consanguinity analysis
CN106951315A (en) * 2017-03-17 2017-07-14 北京搜狐新媒体信息技术有限公司 A kind of data task dispatching method and system based on ETL

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101261602A (en) * 2008-04-08 2008-09-10 杭州电子科技大学 Program correctness verification method based on syntax tree
US9298878B2 (en) * 2010-07-29 2016-03-29 Oracle International Corporation System and method for real-time transactional data obfuscation
JP5748636B2 (en) * 2011-10-26 2015-07-15 富士フイルム株式会社 Image processing apparatus and method, and program
WO2016093797A1 (en) * 2014-12-09 2016-06-16 Hitachi Data Systems Corporation A system and method for providing thin-provisioned block storage with multiple data protection classes
CN106802931B (en) * 2016-12-28 2020-06-09 深圳市华傲数据技术有限公司 Method and device for searching data table based on influence analysis

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102117306A (en) * 2010-01-04 2011-07-06 阿里巴巴集团控股有限公司 Method and system for monitoring ETL (extract-transform-load) data processing process
CN102650992A (en) * 2011-02-25 2012-08-29 国际商业机器公司 Method and device for generating binary XML (extensible markup language) data and locating nodes of the binary XML data
CN105760383A (en) * 2014-12-16 2016-07-13 阿里巴巴集团控股有限公司 Method and device for detecting index alteration in ETL (extract-transform-load) task
CN105302657A (en) * 2015-11-05 2016-02-03 网易宝有限公司 Abnormal condition analysis method and apparatus
CN105897922A (en) * 2016-05-30 2016-08-24 乐视控股(北京)有限公司 Data transmission method and device
CN106709024A (en) * 2016-12-28 2017-05-24 深圳市华傲数据技术有限公司 Data table source-tracing method and device based on consanguinity analysis
CN106951315A (en) * 2017-03-17 2017-07-14 北京搜狐新媒体信息技术有限公司 A kind of data task dispatching method and system based on ETL

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于数据仓库的动态异常点检测研究;王丽珍等;《计算机研究与发展》;20081231;全文 *

Also Published As

Publication number Publication date
CN107643956A (en) 2018-01-30

Similar Documents

Publication Publication Date Title
CN107643956B (en) Method and apparatus for locating the origin of an anomaly in anomaly data
JP7507602B2 (en) Data Quality Analysis
US9547702B2 (en) Validating code of an extract, transform and load (ETL) tool
CN106951369B (en) Management method and device for joint debugging test
CN113396395B (en) Method for effectively evaluating log mode
CA3155689A1 (en) Early-warning method for commodity inventory risk based on a statistical interquartile range, and system and computer-readable storage medium thereof
CN107908550B (en) Software defect statistical processing method and device
JP2016502166A (en) Profiling data with source tracking
CN111737335B (en) Product information integration processing method and device, computer equipment and storage medium
Helal et al. Online correlation for unlabeled process events: A flexible CEP-based approach
US9959329B2 (en) Unified master report generator
CN109947797B (en) Data inspection device and method
CN113901094B (en) Data processing method, device, equipment and storage medium
CN109271431A (en) Data pick-up method, apparatus, computer equipment and storage medium
US20230029262A1 (en) Related change analysis of multiple version control systems
US9947044B2 (en) Improper financial activity detection tool
CN114462859A (en) Workflow processing method and device, computer equipment and storage medium
US12008006B1 (en) Assessments based on data that changes retroactively
US20230342281A1 (en) Branching data monitoring watchpoints to enable continuous integration and continuous delivery of data
CN118113689A (en) Data quality analysis method and system
CN112328455A (en) System for realizing general service monitoring based on database in computer software system
CN117493215A (en) Program testing method, electronic device and storage medium
CN116401140A (en) Data processing method, device, equipment, readable medium and software product
CN117785939A (en) Data analysis method and device based on rule engine and computer equipment
CN118521036A (en) Analysis method for manufacturing cost in vehicle type development process

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant