CN113705188B

CN113705188B - Intelligent evaluation method for customs import and export commodity specification declaration

Info

Publication number: CN113705188B
Application number: CN202110956040.3A
Authority: CN
Inventors: 张强; 张鹏; 车超; 周东生
Original assignee: Dalian University
Current assignee: Dalian University
Priority date: 2021-08-19
Filing date: 2021-08-19
Publication date: 2023-06-06
Anticipated expiration: 2041-08-19
Also published as: CN113705188A

Abstract

The invention discloses a customs import and export commodity specification reporting intelligent evaluation method, which comprises the following steps: step 1: preprocessing a customs import and export commodity declaration text, extracting key elements in a column of commodity specification and model, and taking element names, corresponding element words and contents under the belonged commodity chapter numbers as judging contents of the import and export commodity declaration specification text; step 2: performing word segmentation processing on the import and export commodity declaration specification text, and removing punctuation marks and stop words; step 3: learning semantic knowledge of the segmented text in an unsupervised mode by using a Word2vec model, and representing semantic information of the words in a Word vector mode; and obtaining a word vector matrix of each text. Step 4: the word vector matrix is sent into a canonical declaration intelligent evaluation model for training; and selecting and loading a model with the best classification effect, and sending commodity reporting text to be checked into the model to judge whether reporting information is standard or not. The evaluation accuracy is remarkably improved.

Description

Intelligent evaluation method for customs import and export commodity specification declaration

Technical Field

The invention relates to the technical field of natural language processing, in particular to a customs import and export commodity specification reporting intelligent assessment method based on a deep learning model.

Background

The standard declaration refers to filling in the specific requirements of different declaration elements of the commodity when filling in the commodity content of the customs import and export commodity declaration form. The specification declaration is used for adapting to trade development and customs supervision requirements, standardizing declaration behaviors of import and export enterprises, improving declaration data quality, accelerating clearance and facilitating trade. The customs import and export commodity specification declaration is one of important contents of customs local tax payer management, is an important way for constructing a novel quotation relationship and improving the tax compliance of enterprises, is a foundation for ensuring tax administration quality, import and export goods implementation inspection supervision, internal law enforcement supervision and low-level check, and has important significance on customs office efficiency and execution of national policies if the result is correct.

Currently, customs mainly rely on business specialists to judge whether commodity declaration texts are standard or not. Because the manual judgment is time-consuming and labor-consuming, and the daily import and export commodity quantity of customs is huge, only the declaration text of a very small quantity of commodity can be extracted for inspection every year, and the efficiency is low and the comprehensiveness is lacking.

Disclosure of Invention

Aiming at the problems in the prior art, the intelligent assessment of customs import and export commodity specification reporting is converted into a text classification problem in natural language processing, and the characteristics of customs import and export commodity reporting text are combined, so that an end-to-end deep learning model is provided for automatically performing specification assessment on the reporting text.

In order to achieve the above purpose, the technical scheme of the application is as follows: a customs import and export commodity specification declaring intelligent evaluation method comprises the following steps:

step 1: the commodity declaration text is a text composed of a series of elements capable of reflecting objective conditions of commodities, such as customs numbers, commodity specification models, actual tariff tax rates and the like. The enterprise fills in corresponding declaration element information according to element names in a column of commodity specification and model, and the first two commodity numbers represent chapters of commodities, namely the category. Preprocessing a customs import and export commodity declaration text, extracting key elements in a column of commodity specification and model, and taking element names, corresponding element words and contents under the belonged commodity chapter numbers as judging contents of the import and export commodity declaration specification text;

step 2: performing word segmentation processing on the import and export commodity declaration specification text by utilizing the Jieba word segmentation in python, and removing punctuation marks and stop words;

step 3: learning semantic knowledge of the segmented text in an unsupervised mode by using a Word2vec model, and representing semantic information of the words in a Word vector mode; since the declared text is short text data, 75% of the declared data has a length of about 20 words, when training is performed by using the Word2vec model, the length is set to 20, the length is cut off more than the length, the insufficient filling is performed, and the dimension is set to 300, so that a Word vector matrix of each text is obtained.

Step 4: the word vector matrix is sent into a standard declaration intelligent evaluation model for training, wherein the parameter learning rate is set to 0.001, the batch is set to 64, the iteration number is set to 500, an optimizer uses Adam, the accuracy and the F1 value are used as evaluation indexes, and the trained model and the trained evaluation indexes are saved; and selecting and loading a model with the best classification effect, and sending commodity reporting text to be checked into the model to judge whether reporting information is standard or not.

Further, the specific implementation manner of the step 4 is as follows:

step 41, the word vector matrix is sent into a bidirectional long and short time memory network (Bidirectional Long Short Term Memory, biLSTM) with an attention mechanism, and relations among commodity text contexts are extracted. BiLSTM Forward read L _C1 To L _C300 Is a backward reading L _C300 To L _C1 Is a characteristic sequence of (a). In general, the output of BiLSTM is expressed as follows:

from the forward hidden state

And a backward hidden state->

Obtaining a given feature text Lc _n Is a comment of (1). The attention mechanism may focus on the features of keywords to reduce the impact of non-keywords on the context text and may be considered a fully connected layer. Feature text Lc _n The characteristic is obtained by a layer of sensor>

For each term, where w and b represent weights and biases in neurons, tanh () is an activation function:

by characteristics of

And word context vector->

Acquiring weight of word normalization>

M is the number of words in the feature, exp () is an exponential function:

thereafter, based on the weights

Is characterized by H _C ：

Step 42, the word vector matrix is sent to an acceptance module, and the word discrete relation is extracted by utilizing convolution kernels with different sizes, wherein the BatchNorm algorithm is used, so that the model learning speed is greatly improved, the gradient disappearance problem is solved to a certain extent, the convergence process is greatly accelerated, and the classification effect is also improved. Taking the mean value and variance of one batch as the estimation of the mean value and variance of the whole data set, introducing the learnable parameters gamma and beta, and learning and recovering the feature distribution to be learned of the original network, wherein m is the size of the batch, namely the number of samples in each batchQuantity, x _i Training data for the ith mini-batch:

first calculate the average

Sum of variances->

The effect of e is then normalized, equation (8), to prevent variance 0 from producing an invalid calculation. The normalization aims to normalize the data to a unified interval, reduce the divergence degree of the data, reduce the learning difficulty of a network and keep the distribution of the original data to a certain extent. After normalization, a linear change operation, namely formula (9), is performed, in order to ensure nonlinear acquisition, scale plus shift operation is performed on x which satisfies 0 and 1 variance after transformation, namely, each element is multiplied by gamma and then beta, so that equivalent transformation is realized and the distribution information of original input characteristics is reserved. During training, batchNorm can be adjusted for activation values according to a plurality of training examples in mini-batch.

Step 43, sending the relation between commodity text contexts and the word discrete relation into a fusion classification module for training, and storing the trained model and evaluation index;

and 44, selecting and loading the model with the best classification effect, and sending the commodity declaration text into the model to judge whether declaration information is standard or not.

By adopting the technical scheme, the invention can obtain the following technical effects: the invention adopts a deep learning model, utilizes the special corpus resource of customs and combines the characteristics of customs texts, and automatically judges the normalization of the filled content according to a normalized language library.

Drawings

FIG. 1 is a flow chart of a method for intelligent assessment of customs import and export commodity specification declaration;

FIG. 2 is a framework diagram of a specification declaration intelligent assessment model.

Detailed Description

The embodiment of the invention is implemented on the premise of the technical scheme of the invention, and a detailed implementation mode and a specific operation process are provided, but the protection scope of the invention is not limited to the following embodiment.

Example 1

Referring to fig. 1, based on characteristics of customs texts, the application provides a method for intelligent assessment of customs import-export commodity specification declaration: firstly, data preprocessing is carried out on customs import and export commodity declaration text, then Word segmentation is carried out on text data, word vectors are trained through a Word2vec model, and finally the text data are sent into a deep learning model for classification. The method effectively solves the problem of specification reporting in the customs commodity specification reporting intelligent evaluation system, and the accuracy is remarkably improved compared with other mainstream methods at present.

The present invention will be described in detail below with reference to examples and drawings so as to enable one of ordinary skill in the art to practice the same, with reference to the present description.

In this embodiment, pycharm is used as a development platform, and Python is used as a development language. The method is carried out on 30520 sentence corpus of customs real data. The method comprises the following specific processes:

step 1: and preprocessing the customs import and export commodity text to obtain big word names, chapters and element codes.

Step 2: the long text obtained in the step 1 is accurately split by utilizing the Jieba word segmentation in python, and a new text document with punctuation marks and stop words removed is generated, specifically:

step 21: the text is Jieba-segmented, for example:

data: "spin-on connector Instructions for use |39|0000"

Word segmentation data: "Instructions for use of screw-on connector 390000"

Step 3: word2vec model is utilized to train Word vector of the segmented text, and the Word vector training method specifically comprises the following steps:

step 31: the Word2vec model is utilized to unify the short text into a Word vector with the length of 20 and the dimension of 300;

step 4: sending the word vector obtained in the step 3 into a model for classification operation, so as to obtain a standard declaration result, wherein the method specifically comprises the following steps:

step 41: sending the generated word vector into a BiLSTM+attribute module, and extracting element relations between short text contexts;

step 42: sending the generated word vector into an acceptance module, and extracting discrete word relations by utilizing convolution kernels with different sizes;

step 43: and sending the features extracted by the two modules into a fusion classification module, and then performing classification operation to obtain a final result.

According to the above steps, the invention compares the classification effect with a logistic regression (Logistic Regression, LR) model, a support vector machine (SupportVectorMachines, SVM) model, a Convolutional neural network (Convolutional NeuralNetworks, CNN) model, a TextCNN model, and a BERT model. As can be seen from table 1, the method proposed by the present invention is significantly superior to other methods in terms of classification accuracy and F1 value.

Table 1 comparison of different models for customs import and export commodity classification effect

The foregoing descriptions of specific exemplary embodiments of the present invention are presented for purposes of illustration and description. It is not intended to limit the invention to the precise form disclosed, and obviously many modifications and variations are possible in light of the above teaching. The exemplary embodiments were chosen and described in order to explain the specific principles of the invention and its practical application to thereby enable one skilled in the art to make and utilize the invention in various exemplary embodiments and with various modifications as are suited to the particular use contemplated. It is intended that the scope of the invention be defined by the claims and their equivalents.

Claims

1. The method for intelligently evaluating the customs import and export commodity specification declaration is characterized by comprising the following steps of:

step 1: preprocessing a customs import and export commodity declaration text, extracting key elements in a column of commodity specification and model, and taking element names, corresponding element words and contents under the belonged commodity chapter numbers as judging contents of the import and export commodity declaration specification text;

step 3: learning semantic knowledge of the segmented text in an unsupervised mode by using a Word2vec model, and representing semantic information of the words in a Word vector mode to obtain a Word vector matrix of each text;

step 4: the word vector matrix is sent into a normative declaration intelligent evaluation model for training, and the trained model and evaluation indexes are saved; selecting and loading a model with the best classification effect, and sending commodity reporting text to be checked into the model to judge whether reporting information is standard or not;

the specific implementation mode of the step 4 is as follows:

step 41, sending the word vector matrix into a bi-directional long short-time memory network BiLSTM with an attention mechanism, and extracting the contexts of commodity textsIs a relationship of (2); biLSTM Forward read L _C1 To L _C300 Is a backward reading L _C300 To L _C1 Is a characteristic sequence of (2); the output of BiLSTM is expressed as follows:

from the forward hidden state

And a backward hidden state->

Obtaining a given feature text Lc _n Is a comment of (2); feature text Lc _n The characteristic is obtained by a layer of sensor>

by using

And word context vector->

Acquiring weight of word normalization>

M is the number of words in the feature, exp () is an exponential function:

thereafter, based on the weights

Is characterized by H _C ：

Step 42, sending the word vector matrix into an acceptance module, extracting word discrete relations by using convolution kernels with different sizes, taking the mean value and variance of one batch as the estimation of the mean value and variance of the whole data set, introducing the learnable parameters gamma and beta, and learning and recovering the feature distribution to be learned of the original network, wherein m is the batch size, namely the number of samples in each batch, and x _i Training data for the ith mini-batch:

first calculate the average

Sum of variances->

Then normalizing, equation (8), the role of e is to prevent variance 0 from producing an invalid calculation; after normalization, performing a linear change operation, namely a formula (9), and then performing scale plus shift operation on x meeting the mean value of 0 and the variance of 1, namely multiplying each element by gamma and then adding beta to realize equivalent transformation and keep the distribution information of the original input characteristics;

2. The method for intelligent assessment of customs import and export commodity specification declaration according to claim 1, wherein the text after Word segmentation is short text data, 75% of the data has a length of only about 20 words, so that when training is performed by using a Word2vec model, the length is set to 20, the excess is truncated, the insufficient filling is performed, and the dimension is set to 300.

3. The method for intelligent assessment of customs import and export commodity specification declaration according to claim 1, wherein when training the specification declaration intelligent assessment model, the parameter learning rate is set to 0.001, the batch is set to 64, the iteration number is set to 500, and the optimizer uses Adam, and the accuracy and F1 value are used as evaluation indexes.