[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN113705188B - Intelligent evaluation method for customs import and export commodity specification declaration - Google Patents

Intelligent evaluation method for customs import and export commodity specification declaration Download PDF

Info

Publication number
CN113705188B
CN113705188B CN202110956040.3A CN202110956040A CN113705188B CN 113705188 B CN113705188 B CN 113705188B CN 202110956040 A CN202110956040 A CN 202110956040A CN 113705188 B CN113705188 B CN 113705188B
Authority
CN
China
Prior art keywords
text
commodity
declaration
model
import
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110956040.3A
Other languages
Chinese (zh)
Other versions
CN113705188A (en
Inventor
张强
张鹏
车超
周东生
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian University
Original Assignee
Dalian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian University filed Critical Dalian University
Priority to CN202110956040.3A priority Critical patent/CN113705188B/en
Publication of CN113705188A publication Critical patent/CN113705188A/en
Application granted granted Critical
Publication of CN113705188B publication Critical patent/CN113705188B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/186Templates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Tourism & Hospitality (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Data Mining & Analysis (AREA)
  • Development Economics (AREA)
  • Educational Administration (AREA)
  • Computing Systems (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Evolutionary Computation (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a customs import and export commodity specification reporting intelligent evaluation method, which comprises the following steps: step 1: preprocessing a customs import and export commodity declaration text, extracting key elements in a column of commodity specification and model, and taking element names, corresponding element words and contents under the belonged commodity chapter numbers as judging contents of the import and export commodity declaration specification text; step 2: performing word segmentation processing on the import and export commodity declaration specification text, and removing punctuation marks and stop words; step 3: learning semantic knowledge of the segmented text in an unsupervised mode by using a Word2vec model, and representing semantic information of the words in a Word vector mode; and obtaining a word vector matrix of each text. Step 4: the word vector matrix is sent into a canonical declaration intelligent evaluation model for training; and selecting and loading a model with the best classification effect, and sending commodity reporting text to be checked into the model to judge whether reporting information is standard or not. The evaluation accuracy is remarkably improved.

Description

Intelligent evaluation method for customs import and export commodity specification declaration
Technical Field
The invention relates to the technical field of natural language processing, in particular to a customs import and export commodity specification reporting intelligent assessment method based on a deep learning model.
Background
The standard declaration refers to filling in the specific requirements of different declaration elements of the commodity when filling in the commodity content of the customs import and export commodity declaration form. The specification declaration is used for adapting to trade development and customs supervision requirements, standardizing declaration behaviors of import and export enterprises, improving declaration data quality, accelerating clearance and facilitating trade. The customs import and export commodity specification declaration is one of important contents of customs local tax payer management, is an important way for constructing a novel quotation relationship and improving the tax compliance of enterprises, is a foundation for ensuring tax administration quality, import and export goods implementation inspection supervision, internal law enforcement supervision and low-level check, and has important significance on customs office efficiency and execution of national policies if the result is correct.
Currently, customs mainly rely on business specialists to judge whether commodity declaration texts are standard or not. Because the manual judgment is time-consuming and labor-consuming, and the daily import and export commodity quantity of customs is huge, only the declaration text of a very small quantity of commodity can be extracted for inspection every year, and the efficiency is low and the comprehensiveness is lacking.
Disclosure of Invention
Aiming at the problems in the prior art, the intelligent assessment of customs import and export commodity specification reporting is converted into a text classification problem in natural language processing, and the characteristics of customs import and export commodity reporting text are combined, so that an end-to-end deep learning model is provided for automatically performing specification assessment on the reporting text.
In order to achieve the above purpose, the technical scheme of the application is as follows: a customs import and export commodity specification declaring intelligent evaluation method comprises the following steps:
step 1: the commodity declaration text is a text composed of a series of elements capable of reflecting objective conditions of commodities, such as customs numbers, commodity specification models, actual tariff tax rates and the like. The enterprise fills in corresponding declaration element information according to element names in a column of commodity specification and model, and the first two commodity numbers represent chapters of commodities, namely the category. Preprocessing a customs import and export commodity declaration text, extracting key elements in a column of commodity specification and model, and taking element names, corresponding element words and contents under the belonged commodity chapter numbers as judging contents of the import and export commodity declaration specification text;
step 2: performing word segmentation processing on the import and export commodity declaration specification text by utilizing the Jieba word segmentation in python, and removing punctuation marks and stop words;
step 3: learning semantic knowledge of the segmented text in an unsupervised mode by using a Word2vec model, and representing semantic information of the words in a Word vector mode; since the declared text is short text data, 75% of the declared data has a length of about 20 words, when training is performed by using the Word2vec model, the length is set to 20, the length is cut off more than the length, the insufficient filling is performed, and the dimension is set to 300, so that a Word vector matrix of each text is obtained.
Step 4: the word vector matrix is sent into a standard declaration intelligent evaluation model for training, wherein the parameter learning rate is set to 0.001, the batch is set to 64, the iteration number is set to 500, an optimizer uses Adam, the accuracy and the F1 value are used as evaluation indexes, and the trained model and the trained evaluation indexes are saved; and selecting and loading a model with the best classification effect, and sending commodity reporting text to be checked into the model to judge whether reporting information is standard or not.
Further, the specific implementation manner of the step 4 is as follows:
step 41, the word vector matrix is sent into a bidirectional long and short time memory network (Bidirectional Long Short Term Memory, biLSTM) with an attention mechanism, and relations among commodity text contexts are extracted. BiLSTM Forward read L C1 To L C300 Is a backward reading L C300 To L C1 Is a characteristic sequence of (a). In general, the output of BiLSTM is expressed as follows:
Figure BDA0003220317850000031
Figure BDA0003220317850000032
from the forward hidden state
Figure BDA0003220317850000033
And a backward hidden state->
Figure BDA0003220317850000034
Obtaining a given feature text Lc n Is a comment of (1). The attention mechanism may focus on the features of keywords to reduce the impact of non-keywords on the context text and may be considered a fully connected layer. Feature text Lc n The characteristic is obtained by a layer of sensor>
Figure BDA0003220317850000035
For each term, where w and b represent weights and biases in neurons, tanh () is an activation function:
Figure BDA0003220317850000036
by characteristics of
Figure BDA0003220317850000037
And word context vector->
Figure BDA0003220317850000038
Acquiring weight of word normalization>
Figure BDA0003220317850000039
M is the number of words in the feature, exp () is an exponential function:
Figure BDA00032203178500000310
thereafter, based on the weights
Figure BDA00032203178500000311
Is characterized by H C
Figure BDA00032203178500000312
Step 42, the word vector matrix is sent to an acceptance module, and the word discrete relation is extracted by utilizing convolution kernels with different sizes, wherein the BatchNorm algorithm is used, so that the model learning speed is greatly improved, the gradient disappearance problem is solved to a certain extent, the convergence process is greatly accelerated, and the classification effect is also improved. Taking the mean value and variance of one batch as the estimation of the mean value and variance of the whole data set, introducing the learnable parameters gamma and beta, and learning and recovering the feature distribution to be learned of the original network, wherein m is the size of the batch, namely the number of samples in each batchQuantity, x i Training data for the ith mini-batch:
Figure BDA0003220317850000041
Figure BDA0003220317850000042
Figure BDA0003220317850000043
Figure BDA0003220317850000044
first calculate the average
Figure BDA0003220317850000045
Sum of variances->
Figure BDA0003220317850000046
The effect of e is then normalized, equation (8), to prevent variance 0 from producing an invalid calculation. The normalization aims to normalize the data to a unified interval, reduce the divergence degree of the data, reduce the learning difficulty of a network and keep the distribution of the original data to a certain extent. After normalization, a linear change operation, namely formula (9), is performed, in order to ensure nonlinear acquisition, scale plus shift operation is performed on x which satisfies 0 and 1 variance after transformation, namely, each element is multiplied by gamma and then beta, so that equivalent transformation is realized and the distribution information of original input characteristics is reserved. During training, batchNorm can be adjusted for activation values according to a plurality of training examples in mini-batch.
Step 43, sending the relation between commodity text contexts and the word discrete relation into a fusion classification module for training, and storing the trained model and evaluation index;
and 44, selecting and loading the model with the best classification effect, and sending the commodity declaration text into the model to judge whether declaration information is standard or not.
By adopting the technical scheme, the invention can obtain the following technical effects: the invention adopts a deep learning model, utilizes the special corpus resource of customs and combines the characteristics of customs texts, and automatically judges the normalization of the filled content according to a normalized language library.
Drawings
FIG. 1 is a flow chart of a method for intelligent assessment of customs import and export commodity specification declaration;
FIG. 2 is a framework diagram of a specification declaration intelligent assessment model.
Detailed Description
The embodiment of the invention is implemented on the premise of the technical scheme of the invention, and a detailed implementation mode and a specific operation process are provided, but the protection scope of the invention is not limited to the following embodiment.
Example 1
Referring to fig. 1, based on characteristics of customs texts, the application provides a method for intelligent assessment of customs import-export commodity specification declaration: firstly, data preprocessing is carried out on customs import and export commodity declaration text, then Word segmentation is carried out on text data, word vectors are trained through a Word2vec model, and finally the text data are sent into a deep learning model for classification. The method effectively solves the problem of specification reporting in the customs commodity specification reporting intelligent evaluation system, and the accuracy is remarkably improved compared with other mainstream methods at present.
The present invention will be described in detail below with reference to examples and drawings so as to enable one of ordinary skill in the art to practice the same, with reference to the present description.
In this embodiment, pycharm is used as a development platform, and Python is used as a development language. The method is carried out on 30520 sentence corpus of customs real data. The method comprises the following specific processes:
step 1: and preprocessing the customs import and export commodity text to obtain big word names, chapters and element codes.
Step 2: the long text obtained in the step 1 is accurately split by utilizing the Jieba word segmentation in python, and a new text document with punctuation marks and stop words removed is generated, specifically:
step 21: the text is Jieba-segmented, for example:
data: "spin-on connector Instructions for use |39|0000"
Word segmentation data: "Instructions for use of screw-on connector 390000"
Step 3: word2vec model is utilized to train Word vector of the segmented text, and the Word vector training method specifically comprises the following steps:
step 31: the Word2vec model is utilized to unify the short text into a Word vector with the length of 20 and the dimension of 300;
step 4: sending the word vector obtained in the step 3 into a model for classification operation, so as to obtain a standard declaration result, wherein the method specifically comprises the following steps:
step 41: sending the generated word vector into a BiLSTM+attribute module, and extracting element relations between short text contexts;
step 42: sending the generated word vector into an acceptance module, and extracting discrete word relations by utilizing convolution kernels with different sizes;
step 43: and sending the features extracted by the two modules into a fusion classification module, and then performing classification operation to obtain a final result.
According to the above steps, the invention compares the classification effect with a logistic regression (Logistic Regression, LR) model, a support vector machine (SupportVectorMachines, SVM) model, a Convolutional neural network (Convolutional NeuralNetworks, CNN) model, a TextCNN model, and a BERT model. As can be seen from table 1, the method proposed by the present invention is significantly superior to other methods in terms of classification accuracy and F1 value.
Table 1 comparison of different models for customs import and export commodity classification effect
Figure BDA0003220317850000061
Figure BDA0003220317850000071
The foregoing descriptions of specific exemplary embodiments of the present invention are presented for purposes of illustration and description. It is not intended to limit the invention to the precise form disclosed, and obviously many modifications and variations are possible in light of the above teaching. The exemplary embodiments were chosen and described in order to explain the specific principles of the invention and its practical application to thereby enable one skilled in the art to make and utilize the invention in various exemplary embodiments and with various modifications as are suited to the particular use contemplated. It is intended that the scope of the invention be defined by the claims and their equivalents.

Claims (3)

1. The method for intelligently evaluating the customs import and export commodity specification declaration is characterized by comprising the following steps of:
step 1: preprocessing a customs import and export commodity declaration text, extracting key elements in a column of commodity specification and model, and taking element names, corresponding element words and contents under the belonged commodity chapter numbers as judging contents of the import and export commodity declaration specification text;
step 2: performing word segmentation processing on the import and export commodity declaration specification text by utilizing the Jieba word segmentation in python, and removing punctuation marks and stop words;
step 3: learning semantic knowledge of the segmented text in an unsupervised mode by using a Word2vec model, and representing semantic information of the words in a Word vector mode to obtain a Word vector matrix of each text;
step 4: the word vector matrix is sent into a normative declaration intelligent evaluation model for training, and the trained model and evaluation indexes are saved; selecting and loading a model with the best classification effect, and sending commodity reporting text to be checked into the model to judge whether reporting information is standard or not;
the specific implementation mode of the step 4 is as follows:
step 41, sending the word vector matrix into a bi-directional long short-time memory network BiLSTM with an attention mechanism, and extracting the contexts of commodity textsIs a relationship of (2); biLSTM Forward read L C1 To L C300 Is a backward reading L C300 To L C1 Is a characteristic sequence of (2); the output of BiLSTM is expressed as follows:
Figure FDA0004189936690000011
Figure FDA0004189936690000012
from the forward hidden state
Figure FDA0004189936690000013
And a backward hidden state->
Figure FDA0004189936690000014
Obtaining a given feature text Lc n Is a comment of (2); feature text Lc n The characteristic is obtained by a layer of sensor>
Figure FDA0004189936690000015
For each term, where w and b represent weights and biases in neurons, tanh () is an activation function:
Figure FDA0004189936690000021
by using
Figure FDA0004189936690000022
And word context vector->
Figure FDA0004189936690000023
Acquiring weight of word normalization>
Figure FDA0004189936690000024
M is the number of words in the feature, exp () is an exponential function:
Figure FDA0004189936690000025
thereafter, based on the weights
Figure FDA0004189936690000026
Is characterized by H C
Figure FDA0004189936690000027
Step 42, sending the word vector matrix into an acceptance module, extracting word discrete relations by using convolution kernels with different sizes, taking the mean value and variance of one batch as the estimation of the mean value and variance of the whole data set, introducing the learnable parameters gamma and beta, and learning and recovering the feature distribution to be learned of the original network, wherein m is the batch size, namely the number of samples in each batch, and x i Training data for the ith mini-batch:
Figure FDA0004189936690000028
Figure FDA0004189936690000029
Figure FDA00041899366900000210
Figure FDA00041899366900000211
first calculate the average
Figure FDA00041899366900000212
Sum of variances->
Figure FDA00041899366900000213
Then normalizing, equation (8), the role of e is to prevent variance 0 from producing an invalid calculation; after normalization, performing a linear change operation, namely a formula (9), and then performing scale plus shift operation on x meeting the mean value of 0 and the variance of 1, namely multiplying each element by gamma and then adding beta to realize equivalent transformation and keep the distribution information of the original input characteristics;
step 43, sending the relation between commodity text contexts and the word discrete relation into a fusion classification module for training, and storing the trained model and evaluation index;
and 44, selecting and loading the model with the best classification effect, and sending the commodity declaration text into the model to judge whether declaration information is standard or not.
2. The method for intelligent assessment of customs import and export commodity specification declaration according to claim 1, wherein the text after Word segmentation is short text data, 75% of the data has a length of only about 20 words, so that when training is performed by using a Word2vec model, the length is set to 20, the excess is truncated, the insufficient filling is performed, and the dimension is set to 300.
3. The method for intelligent assessment of customs import and export commodity specification declaration according to claim 1, wherein when training the specification declaration intelligent assessment model, the parameter learning rate is set to 0.001, the batch is set to 64, the iteration number is set to 500, and the optimizer uses Adam, and the accuracy and F1 value are used as evaluation indexes.
CN202110956040.3A 2021-08-19 2021-08-19 Intelligent evaluation method for customs import and export commodity specification declaration Active CN113705188B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110956040.3A CN113705188B (en) 2021-08-19 2021-08-19 Intelligent evaluation method for customs import and export commodity specification declaration

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110956040.3A CN113705188B (en) 2021-08-19 2021-08-19 Intelligent evaluation method for customs import and export commodity specification declaration

Publications (2)

Publication Number Publication Date
CN113705188A CN113705188A (en) 2021-11-26
CN113705188B true CN113705188B (en) 2023-06-06

Family

ID=78653849

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110956040.3A Active CN113705188B (en) 2021-08-19 2021-08-19 Intelligent evaluation method for customs import and export commodity specification declaration

Country Status (1)

Country Link
CN (1) CN113705188B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116187342A (en) * 2023-03-03 2023-05-30 北京青萌数海科技有限公司 Method and system for extracting commodity label
CN116308689B (en) * 2023-05-26 2023-07-21 厦门触网科技有限公司 Bid insurance application processing device

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111985204A (en) * 2020-07-29 2020-11-24 大连大学 Customs import and export commodity tax number prediction method

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111985204A (en) * 2020-07-29 2020-11-24 大连大学 Customs import and export commodity tax number prediction method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
一种基于注意力机制的中文短文本关键词提取模型;杨丹浩;吴岳辛;范春晓;;计算机科学(第01期);全文 *

Also Published As

Publication number Publication date
CN113705188A (en) 2021-11-26

Similar Documents

Publication Publication Date Title
WO2021027533A1 (en) Text semantic recognition method and apparatus, computer device, and storage medium
CN110427623A (en) Semi-structured document Knowledge Extraction Method, device, electronic equipment and storage medium
US20030236662A1 (en) Sequential conditional generalized iterative scaling
CN112434535B (en) Element extraction method, device, equipment and storage medium based on multiple models
CN109344234A (en) Machine reads understanding method, device, computer equipment and storage medium
CN113705188B (en) Intelligent evaluation method for customs import and export commodity specification declaration
CN116245107B (en) Electric power audit text entity identification method, device, equipment and storage medium
Fu et al. A sentiment-aware trading volume prediction model for P2P market using LSTM
CN114358014A (en) Work order intelligent diagnosis method, device, equipment and medium based on natural language
CN113591971B (en) User individual behavior prediction method based on DPI time sequence word embedded vector
CN111709225B (en) Event causal relationship discriminating method, device and computer readable storage medium
CN112417852A (en) Method and device for judging importance of code segment
CN113869054B (en) Deep learning-based power field project feature recognition method
KR102506778B1 (en) Method and apparatus for analyzing risk of contract
CN116522912B (en) Training method, device, medium and equipment for package design language model
CN113220885A (en) Text processing method and system
CN114691836B (en) Text emotion tendentiousness analysis method, device, equipment and medium
CN113821571B (en) Food safety relation extraction method based on BERT and improved PCNN
CN113222471B (en) Asset wind control method and device based on new media data
Zhu English lexical analysis system of machine translation based on simple recurrent neural network
Xia et al. Analysis and prediction of telecom customer churn based on machine learning
CN114020901A (en) Financial public opinion analysis method combining topic mining and emotion analysis
Handayani et al. Sentiment Analysis of Bank BNI User Comments Using the Support Vector Machine Method
Xu et al. [Retracted] The Dissemination and Evaluation of Campus Ideological and Political Public Opinion Based on Internet of Things Monitoring
CN118070775B (en) Performance evaluation method and device of abstract generation model and computer equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant