[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN109086444B - Data standardization method and device and electronic equipment - Google Patents

Data standardization method and device and electronic equipment Download PDF

Info

Publication number
CN109086444B
CN109086444B CN201810940191.8A CN201810940191A CN109086444B CN 109086444 B CN109086444 B CN 109086444B CN 201810940191 A CN201810940191 A CN 201810940191A CN 109086444 B CN109086444 B CN 109086444B
Authority
CN
China
Prior art keywords
data
message
field
processed
standardized
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810940191.8A
Other languages
Chinese (zh)
Other versions
CN109086444A (en
Inventor
陈红梅
王文剑
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jilin Yillion Bank Co ltd
Original Assignee
Jilin Yillion Bank Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jilin Yillion Bank Co ltd filed Critical Jilin Yillion Bank Co ltd
Priority to CN201810940191.8A priority Critical patent/CN109086444B/en
Publication of CN109086444A publication Critical patent/CN109086444A/en
Application granted granted Critical
Publication of CN109086444B publication Critical patent/CN109086444B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof

Landscapes

  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Engineering & Computer Science (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention provides a data standardization method, a device and electronic equipment, wherein after messages to be processed with different data structures are obtained, the format of the messages to be processed is firstly converted to obtain a middle message, and then the middle message is subjected to field analysis and standardization processing to obtain standardized data, so that the messages to be processed with different data structures can be subjected to standardization processing to obtain a data structure with a unified format.

Description

Data standardization method and device and electronic equipment
Technical Field
The invention relates to the field of data processing, in particular to a data standardization method and device and electronic equipment.
Background
With the increasing abundance of loan products on the internet, the application of third-party credit investigation data in the financial field is increasingly wide and deepened. Most internet financial institutions will introduce credit investigation data of third-party institutions as examination and approval bases during loan examination and approval. Such as sesame credit score data incorporating sesame credits, etc.
However, the credit investigation data of the third-party organization has different data structures and does not have a uniform data structure.
Disclosure of Invention
In view of the above, the present invention provides a data standardization method, apparatus and electronic device, so as to solve the problem that the credit investigation data of the third-party organization has different data structures and does not have a uniform data structure.
In order to solve the technical problems, the invention adopts the following technical scheme:
a method of data normalization, comprising:
acquiring a message to be processed and a data source of the message to be processed;
carrying out format conversion on the message to be processed, and converting the message to be processed into a middle message with a preset format;
and according to the field analysis rule of the message to be processed and the data source, carrying out field analysis and standardized processing on the intermediate message to obtain standardized data.
Preferably, according to the field analysis rule of the message to be processed and the data source, performing field analysis and standardization processing on the intermediate message to obtain standardized data, including:
according to the field analysis rule and the data source of the message to be processed, carrying out field analysis and configuration on the intermediate message to obtain the content of a preset identification field;
adding the preset identification field and the content of the preset identification field into the intermediate message to obtain a target message;
according to the preset identification field and the content of the preset identification field, carrying out field analysis on the target message, and analyzing to obtain a corresponding relation between a field path name and a field value;
and carrying out name standardization processing on the field path names in the corresponding relation to obtain standardized data.
Preferably, after the name standardization processing is performed on the field path name in the corresponding relationship to obtain standardized data, the method further includes:
storing the standardized data and setting the data validity period of the standardized data;
acquiring query time for querying the standardized data;
if the query time is within the data validity period, returning a query result comprising the standardized data;
and if the query time is not within the data validity period, returning a query result representing query failure.
Preferably, after the content of the preset identification field and the preset identification field is added to the intermediate message to obtain the target message, the method further includes:
acquiring a plurality of different messages to be integrated;
screening out at least one message to be integrated, the content of which is the same as that of at least one preset identification field in the target message;
and integrating the screened messages to be integrated with the target messages.
Preferably, after performing field analysis on the intermediate packet according to the field analysis rule and the data source of the packet to be processed to obtain the content of the preset identification field, the method further includes:
and identifying the content of an error field in the intermediate message, and setting the content of the error field as preset data.
A data normalization apparatus, comprising:
the device comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a message to be processed and a data source of the message to be processed;
the format conversion module is used for carrying out format conversion on the message to be processed and converting the message to be processed into a middle message with a preset format;
and the data processing module is used for carrying out field analysis and standardized processing on the intermediate message according to the field analysis rule of the message to be processed and the data source to obtain standardized data.
Preferably, the data processing module includes:
the data processing submodule is used for carrying out field analysis and configuration on the intermediate message according to the field analysis rule and the data source of the message to be processed to obtain the content of a preset identification field;
the data adding submodule is used for adding the preset identification field and the content of the preset identification field into the intermediate message to obtain a target message;
the analysis submodule is used for carrying out field analysis on the target message according to the preset identification field and the content of the preset identification field, and analyzing to obtain the corresponding relation between the field path name and the field value;
and the standardization processing submodule is used for carrying out name standardization processing on the field path names in the corresponding relation to obtain standardized data.
Preferably, the method further comprises the following steps:
the data setting module is used for the standardization processing submodule to carry out name standardization processing on the field path name in the corresponding relation, storing the standardized data after the standardized data is obtained, and setting the data validity period of the standardized data;
the query acquisition module is used for acquiring query time for querying the standardized data;
the first result feedback module is used for returning a query result comprising the standardized data if the query time is within the data validity period;
and the second result feedback module is used for returning the query result representing the query failure if the query time is not within the data validity period.
Preferably, the method further comprises the following steps:
the message acquisition module is used for the data adding submodule to add the preset identification field and the content of the preset identification field into the intermediate message to obtain a target message and then acquire a plurality of different messages to be integrated;
the message screening module is used for screening out a message to be integrated, the content of at least one field of which is the same as the content of at least one preset identification field in the target message;
and the message integration module is used for integrating the screened messages to be integrated with the target messages.
An electronic device, comprising: a memory and a processor;
wherein the memory is used for storing programs;
the processor calls a program and is used to:
acquiring a message to be processed and a data source of the message to be processed;
carrying out format conversion on the message to be processed, and converting the message to be processed into a middle message with a preset format;
and according to the field analysis rule of the message to be processed and the data source, carrying out field analysis and standardized processing on the intermediate message to obtain standardized data.
Compared with the prior art, the invention has the following beneficial effects:
the invention provides a data standardization method, a device and electronic equipment, wherein after messages to be processed with different data structures are obtained, the format of the messages to be processed is firstly converted to obtain a middle message, and then the middle message is subjected to field analysis and standardization processing to obtain standardized data, so that the messages to be processed with different data structures can be subjected to standardization processing to obtain a data structure with a unified format.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
FIG. 1 is a flow chart of a method for data normalization according to an embodiment of the present invention;
FIG. 2 is a flow chart of another method for data normalization according to an embodiment of the present invention;
FIG. 3 is a flow chart of a method for data normalization according to another embodiment of the present invention;
fig. 4 is a schematic structural diagram of a data normalization apparatus according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
An embodiment of the present invention provides a data normalization method, and with reference to fig. 1, the data normalization method may include:
s11, acquiring a message to be processed and a data source of the message to be processed;
the message to be processed may be credit investigation data sent by a third party organization, and the data source may be information such as a name and a number of the third party organization. If the data can be the sesame credit data sent by the sesame credit, the data source is the sesame credit.
S12, converting the format of the message to be processed into a middle message with a preset format;
specifically, the format of the to-be-processed message is generally a non-json (JavaScript object notation) format, and for example, the to-be-processed message may be an extensible markup language xml format, at this time, the to-be-processed messages in different formats are all converted into intermediate messages having a preset format, and the preset format may be a json format.
And S13, according to the field analysis rule of the message to be processed and the data source, carrying out field analysis and standardization processing on the intermediate message to obtain standardized data.
The field analysis rule of the message to be processed may be an interface description document sent by a third-party organization for explaining the meaning of each field in the message to be processed.
The standard data may be a relational data table including the contents of the fields in the pending message.
In this embodiment, after the to-be-processed messages with different data structures are obtained, the format of the to-be-processed messages is converted to obtain the intermediate message, and then the intermediate message is subjected to field analysis and standardized processing to obtain standardized data, so that the to-be-processed messages with different data structures can be subjected to standardized processing to obtain a data structure with a uniform format.
Alternatively, on the basis of the foregoing embodiment, referring to fig. 2, step S13 may include:
s21, according to the field analysis rule and the data source of the message to be processed, carrying out field analysis on the intermediate message to obtain the content of a preset identification field;
specifically, the preset identification field may be a credit channel number, a person name, an identification number, a mobile phone number, query time, an application order number, or the like.
The preset identification field can be a credit investigation identification index field, and the credit investigation identification index field comprises two identification fields, one is an identification field for representing a third-party organization, such as a data interface, a credit investigation channel number, an application order number and the like, and the other is a common field extracted in advance for different messages to be processed, such as a person name, an identity card number, a mobile phone number and the like.
The content of the identification field representing the third-party organization can be obtained according to the data source configuration of the message to be processed, for example, the content of the data interface of the third party is set to be DNBBJHV, and the content of the credit channel number is set to be 1231212.
Specifically, according to a field analysis rule of the message to be processed, the content represented by each field in the message to be processed can be known, and then the content of the common field can be obtained, if the common field is an identity card number, and in the message to be processed, idcard represents the identity card number, the content of the idcard is used as the content of the identity card number of the common field.
It should be noted that the field names of the common fields may be the same as or different from the field names of the corresponding fields in the message to be processed.
Optionally, on the basis of this embodiment, after step S21, the method may further include:
and identifying the content of an error field in the intermediate message, and setting the content of the error field as preset data.
Specifically, the error field content may be the field content with "NULL", which may be changed to 9999, so that the data may be known to be error data later when used.
S22, adding the preset identification field and the content of the preset identification field into the intermediate message to obtain a target message;
specifically, a preset identification field and the content of the preset identification field may be added to the forefront of the content of the middle packet.
S23, according to the preset identification field and the content of the preset identification field, carrying out field analysis on the target message, and analyzing to obtain the corresponding relation between the field path name and the field value;
specifically, a data flow real-time processing technology is adopted, and the target message is pushed to the KAFKA server to carry out real-time data standardization and structuralization.
Using a storm framework to firstly carry out field analysis on a target message to obtain the corresponding relation between a field path name and a field value, and then changing the last letter of the field path name which represents a field with a preset identification field in the field path name into the preset identification field to obtain the final corresponding relation between the field path name and the field value. The correspondence may be presented in the form of a field list, as shown in table 1. The obtained corresponding relation can be stored in hbase.
Table 1 table of correspondence between field path name and field value
Figure BDA0001768852040000061
Figure BDA0001768852040000071
It should be noted that, for a field of an array type, a field "field path _ length" field may be added to identify the array length. Meanwhile, in order to avoid the problem of excessive fields caused by the overlong array, only the first 10 pieces of data of the array are stored in the corresponding relation.
And S24, carrying out name standardization processing on the field path names in the corresponding relation to obtain standardized data.
Specifically, hive is used for mapping, and data in hbase is mapped into a relational data table; while field pathnames are normalized during the mapping process.
The method of normalizing the fields is as follows:
the external credit investigation interfaces are uniformly written according to two layers of product abbreviation and interface abbreviation. For example: the interface for applying fraud scoring under sesame credit, corresponding abbreviations "zmxy" and "sqqz".
And carrying out name standardization processing on the field path name in each interface. The processing rule is "product abbreviation _ interface abbreviation _" + standardized field path name.
In addition, the field storage types of all data sources can be unified.
For example, the following fields are standardized:
TABLE 2 field standardization scheme
Original field name Standardized field names
response_score zmxy_sqqz_fraudScore
response_errorMessage zmxy_sqqz_errorMessage
response_errorCode zmxy_sqqz_errorCode
In the embodiment, the purchased third-party credit investigation data is ensured to be effective and completely retained.
In addition, a structuring method for a json message for third party credit investigation is provided and is realized based on a data stream real-time processing technology, so that data can be converted into a relational form, namely a two-dimensional table form, from a semi-structural form of json in real time.
And thirdly, standardized naming and standardized storage methods for third-party credit investigation data with different sources are provided, so that the cost of data analysis and modeling is reduced, and the utilization efficiency of big data is improved.
Optionally, on the basis of the previous embodiment, after the step S24, the method may further include:
s35, storing the standardized data and setting the data validity period of the standardized data;
specifically, some data can not be changed in a short time, such as data of names, identification numbers, academic calendars and the like, the data are inquired in a short time, the obtained result is the same, further, the standardized data can be put into a cache data table, different validity periods are set for different external credit investigation interfaces, the external data source is not inquired any more for the follow-up repeated inquiry of the same interface of the same client in the validity period, the repeated inquiry of the external credit investigation interfaces is avoided, and the inquiry cost is saved. The data validity period may be a data validity expiration date.
S36, acquiring query time for querying the standardized data;
s37, if the query time is within the data validity period, returning a query result comprising the standardized data;
and S38, if the query time is not within the data validity period, returning a query result representing the failure of query.
Specifically, the query of the normalized data is processed according to the following logic:
and inquiring the cache database, if the corresponding credit investigation data of the client does not exist, returning an inquiry result representing inquiry failure, and directly inquiring the external credit investigation service.
If there is standardized data cached by the client accordingly:
if the query time is within the data validity period, returning a query result comprising the standardized data; and if the query time is not within the data validity period, returning a query result representing query failure.
In addition, the data validity period may be a data validity time, and if the data is valid within one year, a creation time field "createtime" needs to be obtained at this time, and this field is added when the standardized data is put into the cache data table. This field is summed with the interface cache expiration date stored in the parameter table and compared to the current time. And if the result is larger than or equal to the current time, returning the cached standardized data. And if the result is less than the current time, inquiring the external credit investigation service.
In the embodiment, different validity periods are set for different external credit investigation interfaces, and subsequent repeated inquiry of the same interface of the same client in the validity period does not inquire an external data source any more, so that repeated inquiry of the external credit investigation interfaces is avoided, and in addition, a cache mechanism is added, so that the fund is saved for a financial institution using third-party credit investigation data.
It should be noted that, in steps S31-34 of this embodiment, please refer to the corresponding descriptions in the above embodiments, which are not described herein again.
Optionally, on the basis of the embodiment corresponding to fig. 2 or fig. 3, adding the preset identification field and the content of the preset identification field to the intermediate message to obtain the target message, further including:
1) acquiring a plurality of different messages to be integrated;
the messages to be integrated are all target messages added with preset identification fields and the content of the preset identification fields.
2) Screening out at least one message to be integrated, the content of which is the same as that of at least one preset identification field in the target message;
3) and integrating the screened messages to be integrated with the target messages.
Specifically, whether the field contents of each preset identification field are the same or not can be sequentially compared, for example, whether the names and the identification numbers of the comparison personnel are the same or not, and if the field contents of one preset identification field are the same, the message to be integrated and the target message data are the same as the message of the same user. At this time, messages belonging to the same user may be integrated. If the content of the target message is a scholastic calendar, the content of the message to be integrated is a growth experience, and the two messages can be integrated into one message.
In the embodiment, the data among different credit investigation institutions is effectively integrated by uniformly adding credit investigation identification index fields to heterogeneous messages from different credit investigation data sources, so that uniform analysis and modeling of the credit investigation data of each third party become possible.
Optionally, on the basis of the above embodiment of the data normalization method, another embodiment of the present invention provides a data normalization apparatus, and with reference to fig. 4, the data normalization apparatus may include:
an obtaining module 101, configured to obtain a to-be-processed packet and a data source of the to-be-processed packet;
a format conversion module 102, configured to perform format conversion on the to-be-processed packet, and convert the to-be-processed packet into a middle packet with a preset format;
and the data processing module 103 is configured to perform field analysis and standardization processing on the intermediate packet according to the field analysis rule of the packet to be processed and the data source, so as to obtain standardized data.
In this embodiment, after the to-be-processed messages with different data structures are obtained, the format of the to-be-processed messages is converted to obtain the intermediate message, and then the intermediate message is subjected to field analysis and standardized processing to obtain standardized data, so that the to-be-processed messages with different data structures can be subjected to standardized processing to obtain a data structure with a uniform format.
It should be noted that, for the working process of each module in this embodiment, please refer to the corresponding description in the above embodiments, which is not described herein again.
Optionally, on the basis of the foregoing embodiment, the data processing module includes:
the data processing submodule is used for carrying out field analysis and configuration on the intermediate message according to the field analysis rule and the data source of the message to be processed to obtain the content of a preset identification field;
the data adding submodule is used for adding the preset identification field and the content of the preset identification field into the intermediate message to obtain a target message;
the analysis submodule is used for carrying out field analysis on the target message according to the preset identification field and the content of the preset identification field, and analyzing to obtain the corresponding relation between the field path name and the field value;
and the standardization processing submodule is used for carrying out name standardization processing on the field path names in the corresponding relation to obtain standardized data.
Further, still include:
and the data correction submodule is used for the data processing submodule to carry out field analysis and configuration on the intermediate message according to the field analysis rule and the data source of the message to be processed, identify the content of an error field in the intermediate message after the content of a preset identification field is obtained, and set the content of the error field as preset data.
In the embodiment, the purchased third-party credit investigation data is ensured to be effective and completely retained.
In addition, a structuring method for a json message for third party credit investigation is provided and is realized based on a data stream real-time processing technology, so that data can be converted into a relational form, namely a two-dimensional table form, from a semi-structural form of json in real time.
And thirdly, standardized naming and standardized storage methods for third-party credit investigation data with different sources are provided, so that the cost of data analysis and modeling is reduced, and the utilization efficiency of big data is improved.
It should be noted that, for the working processes of each module and sub-module in this embodiment, please refer to the corresponding description in the above embodiments, which is not described herein again.
Optionally, on the basis of the above embodiment, the method further includes:
the data setting module is used for the standardization processing submodule to carry out name standardization processing on the field path name in the corresponding relation, storing the standardized data after the standardized data is obtained, and setting the data validity period of the standardized data;
the query acquisition module is used for acquiring query time for querying the standardized data;
the first result feedback module is used for returning a query result comprising the standardized data if the query time is within the data validity period;
and the second result feedback module is used for returning the query result representing the query failure if the query time is not within the data validity period.
In the embodiment, different validity periods are set for different external credit investigation interfaces, and subsequent repeated inquiry of the same interface of the same client in the validity period does not inquire an external data source any more, so that repeated inquiry of the external credit investigation interfaces is avoided, and in addition, a cache mechanism is added, so that the fund is saved for a financial institution using third-party credit investigation data.
It should be noted that, for the working processes of each module and sub-module in this embodiment, please refer to the corresponding description in the above embodiments, which is not described herein again.
Optionally, on the basis of the above embodiment that includes the data adding sub-module, the method further includes:
the message acquisition module is used for the data adding submodule to add the preset identification field and the content of the preset identification field into the intermediate message to obtain a target message and then acquire a plurality of different messages to be integrated;
the message screening module is used for screening out a message to be integrated, the content of at least one field of which is the same as the content of at least one preset identification field in the target message;
and the message integration module is used for integrating the screened messages to be integrated with the target messages.
In the embodiment, the data among different credit investigation institutions is effectively integrated by uniformly adding credit investigation identification index fields to heterogeneous messages from different credit investigation data sources, so that uniform analysis and modeling of the credit investigation data of each third party become possible.
It should be noted that, for the working processes of each module and sub-module in this embodiment, please refer to the corresponding description in the above embodiments, which is not described herein again.
Optionally, on the basis of the embodiments of the data normalization method and apparatus, another embodiment of the present invention provides an electronic device, which may include: a memory and a processor;
wherein the memory is used for storing programs;
the processor calls a program and is used to:
acquiring a message to be processed and a data source of the message to be processed;
carrying out format conversion on the message to be processed, and converting the message to be processed into a middle message with a preset format;
and according to the field analysis rule of the message to be processed and the data source, carrying out field analysis and standardized processing on the intermediate message to obtain standardized data.
In this embodiment, after the to-be-processed messages with different data structures are obtained, the format of the to-be-processed messages is converted to obtain the intermediate message, and then the intermediate message is subjected to field analysis and standardized processing to obtain standardized data, so that the to-be-processed messages with different data structures can be subjected to standardized processing to obtain a data structure with a uniform format.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A method of data normalization, comprising:
acquiring a message to be processed and a data source of the message to be processed;
carrying out format conversion on the message to be processed, and converting the message to be processed into a middle message with a preset format;
according to the field analysis rule of the message to be processed and the data source, carrying out field analysis and standardized processing on the intermediate message to obtain standardized data, wherein the method comprises the following steps: and processing the intermediate message according to the field analysis rule and the data source of the message to be processed to obtain the corresponding relation between the field path name and the field value, and carrying out name standardization processing on the field path name in the corresponding relation to obtain standardized data.
2. The data standardization method of claim 1, wherein the step of processing the intermediate packet according to the field parsing rule and the data source of the packet to be processed to obtain the corresponding relationship between the field path name and the field value comprises:
according to the field analysis rule and the data source of the message to be processed, carrying out field analysis and configuration on the intermediate message to obtain the content of a preset identification field;
adding the preset identification field and the content of the preset identification field into the intermediate message to obtain a target message;
and according to the preset identification field and the content of the preset identification field, carrying out field analysis on the target message, and analyzing to obtain the corresponding relation between the field path name and the field value.
3. The data normalization method according to claim 1, wherein after the name normalization processing is performed on the field path names in the correspondence relationship to obtain normalized data, the method further comprises:
storing the standardized data and setting the data validity period of the standardized data;
acquiring query time for querying the standardized data;
if the query time is within the data validity period, returning a query result comprising the standardized data;
and if the query time is not within the data validity period, returning a query result representing query failure.
4. The data normalization method of claim 2, wherein the steps of adding the preset identification field and the content of the preset identification field to the intermediate message to obtain the target message further comprise:
acquiring a plurality of different messages to be integrated;
screening out at least one message to be integrated, the content of which is the same as that of at least one preset identification field in the target message;
and integrating the screened messages to be integrated with the target messages.
5. The data standardization method of claim 2, wherein after performing field parsing on the intermediate packet according to the field parsing rule and the data source of the packet to be processed to obtain the content of the preset identification field, the method further comprises:
and identifying the content of an error field in the intermediate message, and setting the content of the error field as preset data.
6. A data normalization apparatus, comprising:
the device comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a message to be processed and a data source of the message to be processed;
the format conversion module is used for carrying out format conversion on the message to be processed and converting the message to be processed into a middle message with a preset format;
the data processing module is configured to perform field analysis and standardized processing on the intermediate packet according to the field analysis rule of the packet to be processed and the data source, so as to obtain standardized data, where the data processing module includes: and processing the intermediate message according to the field analysis rule and the data source of the message to be processed to obtain the corresponding relation between the field path name and the field value, and carrying out name standardization processing on the field path name in the corresponding relation to obtain standardized data.
7. The data normalization apparatus of claim 6, wherein the data processing module comprises:
the data processing submodule is used for carrying out field analysis and configuration on the intermediate message according to the field analysis rule and the data source of the message to be processed to obtain the content of a preset identification field;
the data adding submodule is used for adding the preset identification field and the content of the preset identification field into the intermediate message to obtain a target message;
the analysis submodule is used for carrying out field analysis on the target message according to the preset identification field and the content of the preset identification field, and analyzing to obtain the corresponding relation between the field path name and the field value;
and the standardization processing submodule is used for carrying out name standardization processing on the field path names in the corresponding relation to obtain standardized data.
8. The data normalization apparatus of claim 7, further comprising:
the data setting module is used for the standardization processing submodule to carry out name standardization processing on the field path name in the corresponding relation, storing the standardized data after the standardized data is obtained, and setting the data validity period of the standardized data;
the query acquisition module is used for acquiring query time for querying the standardized data;
the first result feedback module is used for returning a query result comprising the standardized data if the query time is within the data validity period;
and the second result feedback module is used for returning the query result representing the query failure if the query time is not within the data validity period.
9. The data normalization apparatus of claim 7, further comprising:
the message acquisition module is used for the data adding submodule to add the preset identification field and the content of the preset identification field into the intermediate message to obtain a target message and then acquire a plurality of different messages to be integrated;
the message screening module is used for screening out a message to be integrated, the content of at least one field of which is the same as the content of at least one preset identification field in the target message;
and the message integration module is used for integrating the screened messages to be integrated with the target messages.
10. An electronic device, comprising: a memory and a processor;
wherein the memory is used for storing programs;
the processor calls a program and is used to:
acquiring a message to be processed and a data source of the message to be processed;
carrying out format conversion on the message to be processed, and converting the message to be processed into a middle message with a preset format;
according to the field analysis rule of the message to be processed and the data source, carrying out field analysis and standardized processing on the intermediate message to obtain standardized data, wherein the method comprises the following steps: and processing the intermediate message according to the field analysis rule and the data source of the message to be processed to obtain the corresponding relation between the field path name and the field value, and carrying out name standardization processing on the field path name in the corresponding relation to obtain standardized data.
CN201810940191.8A 2018-08-17 2018-08-17 Data standardization method and device and electronic equipment Active CN109086444B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810940191.8A CN109086444B (en) 2018-08-17 2018-08-17 Data standardization method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810940191.8A CN109086444B (en) 2018-08-17 2018-08-17 Data standardization method and device and electronic equipment

Publications (2)

Publication Number Publication Date
CN109086444A CN109086444A (en) 2018-12-25
CN109086444B true CN109086444B (en) 2020-12-29

Family

ID=64793784

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810940191.8A Active CN109086444B (en) 2018-08-17 2018-08-17 Data standardization method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN109086444B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109802957B (en) * 2019-01-03 2022-05-31 深圳壹账通智能科技有限公司 Interface docking method and device, computer equipment and storage medium
CN109949877A (en) * 2019-03-15 2019-06-28 北京科技大学 A kind of data fusion method and system based on Experiment of Material Science
CN110377564A (en) * 2019-07-25 2019-10-25 新奥(中国)燃气投资有限公司 A kind of system and method for Internet of Things data normalization
CN111078776A (en) * 2019-12-10 2020-04-28 北京明略软件系统有限公司 Data table standardization method, device, equipment and storage medium
CN111143554B (en) * 2019-12-10 2024-03-12 中盈优创资讯科技有限公司 Data sampling method and device based on big data platform
CN111061481A (en) * 2019-12-17 2020-04-24 神州数码融信软件有限公司 Data format conversion method and device
CN111190750B (en) * 2019-12-25 2024-04-16 北京懿医云科技有限公司 Data processing method and system
CN111768301A (en) * 2020-07-10 2020-10-13 上海通联金融服务有限公司 Business modeling method for credit card wind control under multi-dimensional credit investigation data source

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101110783A (en) * 2007-09-03 2008-01-23 中国工商银行股份有限公司 Method for matching bank message
US7478112B2 (en) * 2004-12-16 2009-01-13 International Business Machines Corporation Method and apparatus for initializing data propagation execution for large database replication
CN101464874A (en) * 2007-12-17 2009-06-24 金宝电子(上海)有限公司 Method for representing electronic dictionary catalog data by XML
CN101577718A (en) * 2009-06-23 2009-11-11 用友软件股份有限公司 Multi-ebanking adaptive system
CN101625694A (en) * 2009-08-17 2010-01-13 中国科学院地理科学与资源研究所 Method and system for storing various standard geological metadata
CN103020189A (en) * 2012-12-03 2013-04-03 深圳中兴网信科技有限公司 Data processing device and method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8019793B2 (en) * 2003-02-14 2011-09-13 Accenture Global Services Limited Methodology infrastructure and delivery vehicle

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7478112B2 (en) * 2004-12-16 2009-01-13 International Business Machines Corporation Method and apparatus for initializing data propagation execution for large database replication
CN101110783A (en) * 2007-09-03 2008-01-23 中国工商银行股份有限公司 Method for matching bank message
CN101464874A (en) * 2007-12-17 2009-06-24 金宝电子(上海)有限公司 Method for representing electronic dictionary catalog data by XML
CN101577718A (en) * 2009-06-23 2009-11-11 用友软件股份有限公司 Multi-ebanking adaptive system
CN101625694A (en) * 2009-08-17 2010-01-13 中国科学院地理科学与资源研究所 Method and system for storing various standard geological metadata
CN103020189A (en) * 2012-12-03 2013-04-03 深圳中兴网信科技有限公司 Data processing device and method

Also Published As

Publication number Publication date
CN109086444A (en) 2018-12-25

Similar Documents

Publication Publication Date Title
CN109086444B (en) Data standardization method and device and electronic equipment
US20170323272A1 (en) System environment for user-specific program aggregation and non-collocated third party system extraction and deployment
US10140277B2 (en) System and method for selecting data sample groups for machine learning of context of data fields for various document types and/or for test data generation for quality assurance systems
AU2017297271B2 (en) System and method for automatic learning of functions
US20170329856A1 (en) Method and device for selecting data content to be pushed to terminal, and non-transitory computer storage medium
CN111291049A (en) Method, device, equipment and storage medium for creating table
CN111061704A (en) Financial analysis report generation method and equipment
CN106709805B (en) User income data acquisition method and system
CN116071150A (en) Data processing method, bank product popularization, wind control system, server and medium
CN113436026B (en) Method and device for processing fund data, computer equipment and storage medium
CN115827084A (en) Data processing method, device, equipment and storage medium
US11062239B2 (en) Structuring computer-mediated communication and determining relevant case type
CN110827155A (en) Information processing method, information processing device, electronic equipment and storage medium
US20210286663A1 (en) Systems and methods for data quality management
US20100076904A1 (en) Apparatus and methods for facts based trading
CN107832278A (en) A kind of method and device of real time scan taxation informatization data
CN111242758A (en) Intelligent account checking method and system
US10740314B2 (en) System and method of providing a platform for recognizing tabular data
CN115375430A (en) Method, device and equipment for batch analysis and loading of additional data of bank
CN111582754B (en) Risk investigation method, apparatus, device and computer readable storage medium
CA2924454C (en) Matching remote trading system fees and rebates
US10394951B1 (en) Automatic generation of metrics using language processing
CN113850923A (en) Attendance statistics method, device, equipment and computer readable storage medium
CN113190587A (en) Data processing method and device for realizing service data processing
CN116228433B (en) Method, apparatus, device and readable storage medium for returning bond combination performance

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant