CN118051649A - Intelligent matching method based on data specification - Google Patents
Intelligent matching method based on data specification Download PDFInfo
- Publication number
- CN118051649A CN118051649A CN202410282874.4A CN202410282874A CN118051649A CN 118051649 A CN118051649 A CN 118051649A CN 202410282874 A CN202410282874 A CN 202410282874A CN 118051649 A CN118051649 A CN 118051649A
- Authority
- CN
- China
- Prior art keywords
- data
- target
- rule
- matching
- business system
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 64
- 230000002159 abnormal effect Effects 0.000 claims abstract description 9
- 238000007689 inspection Methods 0.000 claims description 40
- 238000013507 mapping Methods 0.000 claims description 19
- 238000005516 engineering process Methods 0.000 claims description 9
- 238000004458 analytical method Methods 0.000 claims description 8
- 238000009960 carding Methods 0.000 claims description 8
- 230000008569 process Effects 0.000 claims description 8
- 238000007405 data analysis Methods 0.000 claims description 7
- 238000010606 normalization Methods 0.000 claims description 2
- 238000000605 extraction Methods 0.000 description 6
- 238000012545 processing Methods 0.000 description 6
- 230000011218 segmentation Effects 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 238000004590 computer program Methods 0.000 description 4
- 238000013523 data management Methods 0.000 description 4
- 238000001514 detection method Methods 0.000 description 4
- 238000012544 monitoring process Methods 0.000 description 3
- 230000008520 organization Effects 0.000 description 3
- 241000282414 Homo sapiens Species 0.000 description 2
- XEEYBQQBJWHFJM-UHFFFAOYSA-N Iron Chemical compound [Fe] XEEYBQQBJWHFJM-UHFFFAOYSA-N 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 2
- 230000002457 bidirectional effect Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 238000011835 investigation Methods 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000007726 management method Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000007619 statistical method Methods 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 229910001037 White iron Inorganic materials 0.000 description 1
- 230000005856 abnormality Effects 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 229910052742 iron Inorganic materials 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000011022 operating instruction Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 238000007670 refining Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/903—Querying
- G06F16/90335—Query processing
- G06F16/90344—Query processing by using string matching techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/237—Lexical tools
- G06F40/247—Thesauruses; Synonyms
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The specification discloses an intelligent matching method based on data specification, which relates to the technical field of computers and information, and comprises the steps of determining the data specification and business system data of enterprises; establishing a data standard set based on the data specification; establishing a synonym library and a rule library based on the data specification, the data standard set and the business system data; the synonym library is used for storing keywords of the data specification text; the rule base is used for checking the target business system data; and determining target rules in the rule base based on a target checking task, and then matching and checking target business system data corresponding to the target checking task in sequence by utilizing a preset intelligent matching rule, the synonym base and the target rules to obtain a checking report so as to solve the problem of abnormal system operation caused by data non-standardization of a business system of an enterprise at present.
Description
Technical Field
The invention belongs to the technical field of computers and information, and particularly relates to an intelligent matching method based on data specifications.
Background
With the rapid development of information technology, computing capacity, data processing capacity and processing speed are greatly improved, and a machine learning algorithm rapidly evolves. Big data are processed aiming at massive data, including data acquisition, storage, treatment, analysis application and the like, the improvement of the processing process is not separated from the participation of an AI analysis technology, the AI analysis technology can analyze and mine the data, useful information is extracted, and a large amount of training data can be provided for artificial intelligence by the big data, so that the accuracy and efficiency of the artificial intelligence are improved.
With the operation development of the enterprise, the data volume in the enterprise business field is increased sharply, and the non-standardization of each business system in terms of data is easy to cause a plurality of problems, so that the enterprise needs to meet the standardization in terms of data standards, data classification, data storage, data security and the like. In the aspect of big data management, a HoloClean system [1] is developed by professor Ihab Ilyas of the university of sliding iron and white iron, canada, ACM/IEEE Fellow, and is mainly applied to the fields of cities, medical treatment and the like, so that the problem of error detection and restoration of relational data is solved. Therefore, the business system of the current enterprise has the problem that the system operates abnormally due to data non-standardization.
Disclosure of Invention
The invention aims to provide an intelligent matching method based on data specifications, which aims to solve the problem of abnormal system operation caused by data non-specification in a business system of an enterprise at present.
In order to achieve the above purpose, the invention adopts the following technical scheme:
in one aspect, the present disclosure provides an intelligent matching method based on data specifications, including:
step 102, determining the data specification and business system data of an enterprise;
step 104, establishing a data standard set based on the data specification;
Step 106, establishing a synonym library and a rule library based on the data specification, the data standard set and the business system data; the synonym library is used for storing keywords of the data specification text; the rule base is used for checking the target business system data;
And step 108, determining target rules in the rule base based on a target inspection task, and then utilizing a preset intelligent matching rule, the synonym base and the target rules to match and inspect target business system data corresponding to the target inspection task in sequence, so as to obtain an inspection report.
In another aspect, the present disclosure provides an intelligent matching apparatus based on data specifications, including:
The enterprise data acquisition module is used for determining the data specification and the business system data of an enterprise;
The standard set establishing module is used for establishing a data standard set based on the data specification;
The mapping rule building module is used for building a synonym library and a rule library based on the data specification, the data standard set and the service system data; the synonym library is used for storing keywords of the data specification text; the rule base is used for checking the target business system data;
And the business data detection module is used for determining target rules in the rule base based on a target inspection task, and then utilizing a preset intelligent matching rule to match and inspect the target business system data corresponding to the target inspection task in sequence with the synonym base and the target rules to obtain an inspection report.
Based on the technical scheme, the following technical effects can be obtained in the specification:
the method comprises the steps of carrying out key information extraction on data specifications by researching and carding the data specifications of enterprises in the running process, establishing a data specification standard data set, and intelligently connecting and checking the standard data set; meanwhile, the business system data is collected through big data, a mapping rule of the business data and a standard data set is built, an automatic matching timing task is built, compliance checking is carried out on the business system data, an inspection report is generated, real-time monitoring data is not in compliance, the business system is supervised and improved, and risks are prevented, so that the problem that abnormal system operation is caused by data non-standardization in the business system of an enterprise at present is solved.
Drawings
Fig. 1 is a flow chart of an intelligent matching method based on data specifications according to an embodiment of the invention.
FIG. 2 is a schematic diagram of an enterprise data specification in accordance with an embodiment of the present invention.
Fig. 3 is a flow chart of an intelligent matching method based on data specifications according to an embodiment of the invention.
Fig. 4 is a schematic diagram of a mapping relationship between data of an enterprise business system and a standard specification according to an embodiment of the present invention.
FIG. 5 is a schematic diagram of rule classification in a rule base according to an embodiment of the invention.
FIG. 6 is a flow chart of an intelligent matching method based on data specifications according to an embodiment of the invention
FIG. 7 is a flow chart of matching business system data with a thesaurus in an embodiment of the present invention.
FIG. 8 is a schematic diagram of an intelligent matching solver based on data specifications according to an embodiment of the present invention.
Fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the invention.
Detailed Description
The advantages and features of the present invention will become more fully apparent from the following description and appended claims, taken in conjunction with the accompanying drawings and detailed description. It should be noted that the drawings are in a very simplified form and are adapted to non-precise proportions, merely for the purpose of facilitating and clearly aiding in the description of embodiments of the invention.
It should be noted that, in order to clearly illustrate the present invention, various embodiments of the present invention are specifically illustrated by the present embodiments to further illustrate different implementations of the present invention, where the various embodiments are listed and not exhaustive. Furthermore, for simplicity of explanation, what has been mentioned in the previous embodiment is often omitted in the latter embodiment, and therefore, what has not been mentioned in the latter embodiment can be referred to the previous embodiment accordingly.
Example 1
Referring to fig. 1, fig. 1 shows an intelligent matching method based on data specifications according to this embodiment. In this embodiment, the method includes:
step 102, determining the data specification and business system data of an enterprise;
in this embodiment, one implementation manner of step 102 is:
Carding a plurality of data specifications of an enterprise in the running process;
Specifically, the data specifications of the carding enterprise in the operation process comprise code data specifications, model data specifications, contract data specifications, article data specifications, main data design specifications, accounting subject data specifications, institution data specifications, main data management specifications, supplier data specifications, customer data specifications, project data specifications, human resource data specifications and the like. All the specification files are uploaded in the system, centralized management is carried out on the system, all departments can check the specification files, requirements are issued for business operation, and the specification content of certain enterprise data is shown in figure 2.
And connecting a service system data source by adopting a big data technology to obtain service system data.
In this embodiment, the method further includes converting the service system data into a fact data object that can be identified by a rule engine and storing the fact data object in a database after the service system data is obtained by connecting the service system data source with the big data technology.
Specifically, the method is applied to a business scene of a certain central enterprise business system data standard, a business system data source is connected through a data service interface, a matched object of a data standard specification, namely business system data, is established, the business data is converted into a fact data object which can be identified by a rule engine, and the fact data object is stored in a business database. The key business data is very important, the system does not edit the business data, does not display the key business data on the page, and takes encryption measures such as passwords and identity card numbers on the key business data. Because of the encryption and security of critical data, it is not accessible by human beings, how to verify and check whether these data are compliant and correct, and then it is necessary to establish a mapping matching relationship between the business system and the data standard set. The method connects business data sources through big data technology, performs unified data acquisition, management and analysis on business data, and overcomes the current situation that data analysis, management and management of cross-system, cross-business departments and cross-organization levels are difficult in daily operation activities.
Based on the above, according to the investigation results of the aspects of enterprise key business system data supervision and the like through the business scene investigation, the embodiment applies a key factor decomposition method to collect data specifications from enterprises for carding, extracts data specification key information and constructs a data specification standard set; the data object of the service system is established by checking the existing state of the key service system and adopting a data mining method and connecting the data source of the service system by a big data technology.
Step 104, establishing a data standard set based on the data specification;
In this embodiment, one implementation manner of step 104 is:
Establishing a plurality of standard data sources based on the data specifications; and connecting the standard data sources to obtain a data standard set.
Specifically, a standard data set is established based on a data specification, standard data sources are established in a specification database, connection and viewing of the standard data sources are realized in a system, and the interconnected standard data sources form the standard data set. Such as connection institution standard data sources, view enterprise organization codes, organization names, organizations, enterprise social credit codes, etc. When an enterprise is very large in scale and has a plurality of subordinate units or factories, organization names and codes cannot be memorized manually, a synonym library needs to be established, keywords and words with higher frequency in data specification text information are extracted from the synonym library, and the keywords and words with higher frequency are intelligently connected with a standard data set of a database through a system.
Step 106, establishing a synonym library and a rule library based on the data specification, the data standard set and the business system data; the synonym library is used for storing keywords of the data specification text; the rule base is used for checking the target business system data;
in this embodiment, referring to fig. 3, one implementation of step 106 is:
Step 202, extracting keywords of the text of the data specification and storing the keywords in the synonym library;
Specifically, feature extraction is performed on text key information of the data specification, and extraction is mainly performed by TF-IDF (word frequency-inverse text frequency). The TF-IDF word frequency-inverse text frequency index is a statistical method used to evaluate the importance of a word to a document that is a data specification. The importance of a word increases proportionally with the number of occurrences in the data specification text, while decreasing inversely with the frequency of its occurrences in other documents. One word appears more frequently in a certain data specification document, and other documents do not appear, so that the word is important for classifying the document. However, if there are more documents, the word is not discriminated and the IDF is used to reduce the weight of the word.
Mathematical algorithm: TF-IDF is proportional to the number of occurrences of a word in a data canonical document and inversely proportional to the number of occurrences of the word in the entire data canonical document
TF-idf=tf (word frequency) ×idf (inverse document frequency)
Word frequency: TF = number of times a word appears in a data canonical document/total number of words in a document
Inverse document frequency: idf=log (total number of documents in data specification document library/document number containing the word+1)
204, Extracting the mapping relation between the data standard set and the service system data, comparing the data in the data standard set with the mapping relation with the keywords in the synonym library, if the comparison is successful, reserving the keywords, and if the comparison is failed, saving the data in the synonym library;
Specifically, based on the data specification and the service system data, the mapping relation between the data standard set and the service system data is refined and established, and the data with the mapping relation is maintained to a synonym library. For example, referring to FIG. 4, the mapping of the business system to the standard dataset may be maintained in a synonym library.
And 206, establishing a plurality of rules based on the data specifications and storing the rules in the rule base.
In this embodiment, referring to fig. 5, the rule classification of the rule includes integrity, rationality, accuracy, normalization, and consistency; the rule subclasses include reference integrity, null check, string length, custom check, scope, data format.
In the embodiment, the synonym library and the rule library have certain reusability, and the mapping relation data of the rule library can be exported and imported by SQL, so that the method is convenient for large enterprises to apply in units and factories across areas.
And step 108, determining target rules in the rule base based on a target inspection task, and then utilizing a preset intelligent matching rule, the synonym base and the target rules to match and inspect target business system data corresponding to the target inspection task in sequence, so as to obtain an inspection report.
In this embodiment, referring to fig. 6, one implementation of step 108 is:
Step 302, calling a target rule engine based on a target checking task; the target rule engine corresponds to the target rule in the rule base;
specifically, based on the mapping relation of the service data and the data standard set, an intelligent matching checking task is established. The task scheduling calls the target rule engines at regular time or in real time according to the trigger event, each rule engine corresponds to a corresponding rule in the rule base, and the target rule engines are called to carry out intelligent matching on the fact data object (target business system data) and the target rule loaded in the rule engines.
Step 304, matching the target business system data corresponding to the target checking task with the synonym library according to a preset intelligent matching rule, if the matching is successful, checking the target business system data based on the target rule to obtain a checking analysis result and judging whether the result is abnormal, if so, generating corresponding non-compliance data;
Specifically, matching target service system data with data of a synonym library according to a preset intelligent matching rule, if the fact data object is matched with a standard data set, checking according to the target rule to generate a checking analysis result, and if no abnormality exists, enabling the service data to accord with the standard specification; if the data is found to be out of specification, a piece of out-of-compliance data is generated.
In this embodiment, the preset intelligent matching rule is any one of a forward maximum matching method, a reverse maximum matching method and a bidirectional maximum matching method.
In this embodiment, referring to fig. 7, the step of matching the target business system data corresponding to the target inspection task with the synonym library according to the forward maximum matching method includes:
obtaining a target character number based on the length of the longest word in the synonym library;
Taking a plurality of characters of the front target character number of the sentence to be processed in the service system data as matching fields;
Searching the synonym library based on the matching field, if the matching is successful, cutting the matching field as a target word, if the matching is failed, removing the last character of the matching field, and searching the synonym library by using the remaining characters as new matching fields until the matching is successful or the length of the remaining character string is 0.
For example, assuming that the length of the longest word in the synonym library is i characters, the first i characters of the processed sentence in the business system are required to be used as matching fields to find the synonym library. If the field is matched, the field is split as a word. If the synonym library does not match the field, the last character of the matching field is removed, and the rest of the matching field is matched with the synonym until the matching is completed or the length of the rest of character strings is 0. If all the character strings fail to match, the first character is deleted successively and the above operation is repeated. The basic idea of the reverse maximum matching method (Reverse Maximum Match Method, abbreviated RMM) is similar to the forward direction, but the segmentation direction is opposite to the MM method. And defining a character string R1 of the service system, outputting R2 after word segmentation, and enabling the maximum word length of a dictionary to be maxlen.
Based on the above, the method of combining text feature extraction of data specification and rule-based matching is mainly used in the embodiment, on one hand, the matching accuracy can be improved, on the other hand, the field adaptability can be improved, and the word segmentation based on the data specification is mainly carried out by maintaining a synonym library (or word library), so that each character string of the business system object is matched with words in the synonym library one by one. According to the statement segmentation mode, the method can be divided into: the forward maximum matching method, the reverse maximum matching method and the bidirectional maximum matching method have the defect that a large number of dictionaries are needed to support. The text feature extraction of the data specification adopts a TF-IDF (word frequency-inverse text frequency) method to extract, the TF-IDF word frequency-inverse text frequency index is a statistical method, a large number of data specification texts are given first, a statistical machine learning model is utilized to learn the word segmentation rule (called training), so that text segmentation is realized, text key information is extracted, and the two methods are combined to improve the accuracy of intelligent matching.
Step 306, obtaining an inspection report based on a number of the above-mentioned non-compliance data.
Specifically, traversing data table data of the service system object, generating a plurality of data tables if one data table has a plurality of non-compliance data, and finally forming an inspection report.
In this embodiment, after step 108, the method further includes:
And displaying the data information and the data analysis result which are not compliant in the inspection report, and feeding back the data information and the data analysis result to the service system.
Specifically, if the service system data are all compliant, the wrong data are not displayed, if the service system data are not compliant, the contents of the system, the data table, the data field, the number of lines of the non-compliant data, the non-compliant data and the like are displayed in the inspection report, and the data analysis result is displayed through the visual chart. And feeding back the non-compliance data inspection report to each service system, rectifying the service system, and automatically and circularly inspecting the system after rectifying.
In summary, the method refines key information of the data specification by researching and carding the data specification of an enterprise in the operation process, establishes a data specification standard data set, and can intelligently connect and check the standard data set; meanwhile, the business system data is collected through big data, a mapping rule of the business data and a standard data set is built, an automatic matching timing task is built, compliance checking is carried out on the business system data, an inspection report is generated, real-time monitoring data is not in compliance, the business system is supervised and improved, and risks are prevented, so that the problem that abnormal system operation is caused by data non-standardization in the business system of an enterprise at present is solved.
Example 2
Referring to fig. 8, fig. 8 shows that the present embodiment provides an intelligent matching solver based on data specifications, which includes:
The enterprise data acquisition module is used for determining the data specification and the business system data of an enterprise;
The standard set establishing module is used for establishing a data standard set based on the data specification;
The mapping rule building module is used for building a synonym library and a rule library based on the data specification, the data standard set and the service system data; the synonym library is used for storing keywords of the data specification text; the rule base is used for checking the target business system data;
And the business data detection module is used for determining target rules in the rule base based on a target inspection task, and then utilizing a preset intelligent matching rule to match and inspect the target business system data corresponding to the target inspection task in sequence with the synonym base and the target rules to obtain an inspection report.
Optionally, the enterprise data acquisition module includes:
the data specification acquisition unit is used for carding a plurality of data specifications of the enterprise in the operation process;
and the service data acquisition unit is used for connecting a service system data source by adopting a big data technology to acquire service system data.
Optionally, the mapping rule establishing module includes:
A keyword extraction unit for extracting keywords of the text of the data specification and storing the keywords in the synonym library;
The synonym library maintenance unit is used for refining the mapping relation between the data standard set and the business system data, comparing the data in the data standard set with the mapping relation with keywords in the synonym library, if the comparison is successful, reserving the keywords, and if the comparison is failed, saving the data in the synonym library;
And the rule base establishing unit is used for establishing a plurality of rules based on the data specifications and storing the rules into the rule base.
Optionally, the service data detection module includes:
The calling rule unit is used for calling a target rule engine based on the target checking task; the target rule engine corresponds to the target rule in the rule base;
The data matching unit is used for matching the target business system data corresponding to the target checking task with the synonym library according to a preset intelligent matching rule, checking the target business system data based on the target rule if the matching is successful, obtaining a checking analysis result and judging whether the result is abnormal, and generating corresponding non-compliance data if the result is abnormal;
And an inspection report generating unit for obtaining an inspection report based on a plurality of the above-mentioned non-compliance data.
Optionally, the method further comprises:
And the result feedback module is used for displaying the data information and the data analysis result which are not in compliance in the inspection report and feeding back the data information and the data analysis result to the service system.
Based on the data, the device refines key information of the data specification by researching and carding the data specification of enterprises in the operation process, establishes a data specification standard data set and can intelligently connect and check the standard data set; meanwhile, the business system data is collected through big data, a mapping rule of the business data and a standard data set is built, an automatic matching timing task is built, compliance checking is carried out on the business system data, an inspection report is generated, real-time monitoring data is not in compliance, the business system is supervised and improved, and risks are prevented, so that the problem that abnormal system operation is caused by data non-standardization in the business system of an enterprise at present is solved.
Example 3
Referring to fig. 9, the present embodiment provides an electronic device, which includes a processor, an internal bus, a network interface, a memory, and a nonvolatile memory, and may include hardware required by other services. The processor reads the corresponding computer program from the nonvolatile memory to the memory and then runs the computer program to form an intelligent matching method based on the data specification on a logic level. Of course, other implementations, such as logic devices or combinations of hardware and software, are not excluded from the present description, that is, the execution subject of the following processing flows is not limited to each logic unit, but may be hardware or logic devices.
The network interface, processor and memory may be interconnected by a bus system. The buses may be classified into address buses, data buses, control buses, and the like.
The memory is used for storing programs. In particular, the program may include program code including computer-operating instructions. The memory may include read only memory and random access memory and provide instructions and data to the processor.
The processor is used for executing the program stored in the memory and specifically executing:
step 102, determining the data specification and business system data of an enterprise;
step 104, establishing a data standard set based on the data specification;
Step 106, establishing a synonym library and a rule library based on the data specification, the data standard set and the business system data; the synonym library is used for storing keywords of the data specification text; the rule base is used for checking the target business system data;
And step 108, determining target rules in the rule base based on a target inspection task, and then utilizing a preset intelligent matching rule, the synonym base and the target rules to match and inspect target business system data corresponding to the target inspection task in sequence, so as to obtain an inspection report.
The processor may be an integrated circuit chip having signal processing capabilities. In implementation, each step of the above method may be implemented by an integrated logic circuit of hardware of a processor or an instruction in a software form.
Based on the same invention, the embodiments of the present specification also provide a computer-readable storage medium storing one or more programs that, when executed by an electronic device including a plurality of application programs, cause the electronic device to perform the embodiments corresponding to fig. 1 to 7 to provide an intelligent matching method based on data specifications.
It will be appreciated by those skilled in the art that embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the present specification may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present description can take the form of a computer program product on one or more computer-readable storage media having computer-usable program code embodied therein.
In addition, for the device embodiments described above, since they are substantially similar to the method embodiments, the description is relatively simple, and references to the parts of the description of the method embodiments are only required. Also, it should be noted that in the respective modules of the system of the present application, components thereof are logically divided according to functions to be implemented, but the present application is not limited thereto, and the respective components may be re-divided or combined as needed.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments.
The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the particular order shown, or the sequential order shown, is not necessarily required to achieve desirable results in the course of drawing figures, and in some embodiments, multitasking and parallel processing may be possible or advantageous.
The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and variations of the present application will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the application are to be included in the scope of the claims of the present application.
Claims (10)
1. An intelligent matching method based on data specification is characterized by comprising the following steps:
determining data specifications and business system data of an enterprise;
Establishing a data standard set based on the data specification;
Establishing a synonym library and a rule library based on the data specification, the data standard set and the business system data; the synonym library is used for storing keywords of the data specification text; the rule base is used for checking the target business system data;
And determining target rules in the rule base based on a target inspection task, and then utilizing a preset intelligent matching rule, the synonym base and the target rules to match and inspect target business system data corresponding to the target inspection task in sequence to obtain an inspection report.
2. The method of claim 1, wherein the step of determining the business system data and the data specification of the enterprise comprises:
Carding a plurality of data specifications of an enterprise in the running process;
And connecting a service system data source by adopting a big data technology to obtain service system data.
3. The method of claim 2, further comprising converting the business system data to a fact data object recognizable by a rules engine and storing in a database after the connecting the business system data source using the big data technique.
4. The method of claim 1, wherein the establishing a set of data criteria based on the data specification is by establishing a number of standard data sources based on the data specification; and connecting the standard data sources to obtain a data standard set.
5. The method of claim 1, wherein the step of establishing a synonym library and a rule library based on the data specification, the set of data criteria, and the business system data comprises:
extracting keywords of the text of the data specification and storing the keywords into the synonym library;
Extracting the mapping relation between the data standard set and the business system data, comparing the data in the data standard set with the mapping relation with the keywords in the synonym library, if the comparison is successful, reserving the keywords, and if the comparison is failed, saving the data in the synonym library;
and establishing a plurality of rules based on the data specification and storing the rules into the rule base.
6. The method of claim 5, wherein the rule classification of the rule includes integrity, rationality, accuracy, normalization, consistency; the rule subclasses include reference integrity, null check, string length, custom check, scope, data format.
7. The method according to claim 6, wherein the step of determining the target rule in the rule base based on the target inspection task, and then sequentially matching and inspecting the target business system data corresponding to the target inspection task by using a preset intelligent matching rule, the synonym base and the target rule, and obtaining the inspection report includes:
Invoking a target rule engine based on the target inspection task; the target rule engine corresponds to a target rule in the rule base;
Matching the target business system data corresponding to the target inspection task with the synonym library according to a preset intelligent matching rule, if the matching is successful, inspecting the target business system data based on the target rule to obtain an inspection analysis result and judging whether the result is abnormal, if so, generating corresponding non-compliance data;
Based on a number of the non-compliance data, an inspection report is obtained.
8. The method of claim 7, wherein the predetermined intelligent matching rule is any one of a forward maximum matching method, a reverse maximum matching method, and a bi-directional maximum matching method.
9. The method of claim 8, wherein the step of matching the target business system data corresponding to the target inspection task with the thesaurus according to the forward maximum matching method comprises:
Obtaining a target character number based on the length of the longest word in the synonym library;
taking a plurality of characters of the front target character number of the sentence to be processed in the service system data as matching fields;
Searching the synonym library based on the matching field, if matching is successful, cutting the matching field as a target word, if matching is failed, removing the last character of the matching field, and searching the synonym library by using the remaining characters as new matching fields until matching is successful or the length of the remaining character string is 0.
10. The method of claim 9, further comprising, after the obtaining the inspection report, displaying data information of the inspection report that is not compliant and data analysis results and feeding back to a business system.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410282874.4A CN118051649A (en) | 2024-03-13 | 2024-03-13 | Intelligent matching method based on data specification |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410282874.4A CN118051649A (en) | 2024-03-13 | 2024-03-13 | Intelligent matching method based on data specification |
Publications (1)
Publication Number | Publication Date |
---|---|
CN118051649A true CN118051649A (en) | 2024-05-17 |
Family
ID=91048484
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410282874.4A Pending CN118051649A (en) | 2024-03-13 | 2024-03-13 | Intelligent matching method based on data specification |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN118051649A (en) |
-
2024
- 2024-03-13 CN CN202410282874.4A patent/CN118051649A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111553137B (en) | Report generation method and device, storage medium and computer equipment | |
WO2020233330A1 (en) | Batch testing method, apparatus, and computer-readable storage medium | |
CN113836131B (en) | Big data cleaning method and device, computer equipment and storage medium | |
US20160321405A1 (en) | Extracting clinical care pathways correlated with outcomes | |
US20070088743A1 (en) | Information processing device and information processing method | |
CN112000773B (en) | Search engine technology-based data association relation mining method and application | |
US20130238531A1 (en) | Automatic Combination and Mapping of Text-Mining Services | |
CN108304382B (en) | Quality analysis method and system based on text data mining in manufacturing process | |
CN110765101B (en) | Label generation method and device, computer readable storage medium and server | |
CN113919336A (en) | Article generation method and device based on deep learning and related equipment | |
US9594757B2 (en) | Document management system, document management method, and document management program | |
Jeon et al. | Making a graph database from unstructured text | |
Williams et al. | txttool: Utilities for text analysis in Stata | |
CN112579781B (en) | Text classification method, device, electronic equipment and medium | |
CN118051649A (en) | Intelligent matching method based on data specification | |
CN117077668A (en) | Risk image display method, apparatus, computer device, and readable storage medium | |
CN117131106A (en) | Scientific and technological data mining and decision-making auxiliary system | |
CN116860311A (en) | Script analysis method, script analysis device, computer equipment and storage medium | |
CN115203339A (en) | Multi-data source integration method and device, computer equipment and storage medium | |
CN115034187A (en) | Annotating method, annotating device, computer equipment and storage medium | |
CN113888265A (en) | Product recommendation method, device, equipment and computer-readable storage medium | |
Lu et al. | Modeling semantics between programming codes and annotations | |
CN114115831A (en) | Data processing method, device, equipment and storage medium | |
CN113344674A (en) | Product recommendation method, device, equipment and storage medium based on user purchasing power | |
CN113435701B (en) | Method and device for processing consumption quality information |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |