CN113705188B - Intelligent evaluation method for customs import and export commodity specification declaration - Google Patents
Intelligent evaluation method for customs import and export commodity specification declaration Download PDFInfo
- Publication number
- CN113705188B CN113705188B CN202110956040.3A CN202110956040A CN113705188B CN 113705188 B CN113705188 B CN 113705188B CN 202110956040 A CN202110956040 A CN 202110956040A CN 113705188 B CN113705188 B CN 113705188B
- Authority
- CN
- China
- Prior art keywords
- text
- commodity
- declaration
- model
- import
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000008676 import Effects 0.000 title claims abstract description 28
- 238000011156 evaluation Methods 0.000 title claims abstract description 12
- 239000013598 vector Substances 0.000 claims abstract description 20
- 238000012549 training Methods 0.000 claims abstract description 13
- 230000000694 effects Effects 0.000 claims abstract description 10
- 239000011159 matrix material Substances 0.000 claims abstract description 10
- 230000011218 segmentation Effects 0.000 claims abstract description 9
- 238000007781 pre-processing Methods 0.000 claims abstract description 5
- 238000013210 evaluation model Methods 0.000 claims abstract description 3
- 238000012545 processing Methods 0.000 claims abstract description 3
- 238000000034 method Methods 0.000 claims description 17
- 238000010606 normalization Methods 0.000 claims description 6
- 230000006870 function Effects 0.000 claims description 4
- 230000004913 activation Effects 0.000 claims description 3
- 230000004927 fusion Effects 0.000 claims description 3
- 230000007246 mechanism Effects 0.000 claims description 3
- 230000009466 transformation Effects 0.000 claims description 3
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 claims description 2
- 238000004364 calculation method Methods 0.000 claims description 2
- 230000008859 change Effects 0.000 claims description 2
- 230000015654 memory Effects 0.000 claims description 2
- 210000002569 neuron Anatomy 0.000 claims description 2
- 238000013136 deep learning model Methods 0.000 description 4
- 238000013527 convolutional neural network Methods 0.000 description 3
- 238000011161 development Methods 0.000 description 3
- 230000018109 developmental process Effects 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 230000002457 bidirectional effect Effects 0.000 description 2
- 238000007689 inspection Methods 0.000 description 2
- 238000007477 logistic regression Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000003058 natural language processing Methods 0.000 description 2
- 238000012706 support-vector machine Methods 0.000 description 2
- 230000006399 behavior Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000008034 disappearance Effects 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/166—Editing, e.g. inserting or deleting
- G06F40/186—Templates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/26—Government or public services
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/30—Computing systems specially adapted for manufacturing
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Tourism & Hospitality (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Data Mining & Analysis (AREA)
- Development Economics (AREA)
- Educational Administration (AREA)
- Computing Systems (AREA)
- Economics (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- General Business, Economics & Management (AREA)
- Evolutionary Computation (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a customs import and export commodity specification reporting intelligent evaluation method, which comprises the following steps: step 1: preprocessing a customs import and export commodity declaration text, extracting key elements in a column of commodity specification and model, and taking element names, corresponding element words and contents under the belonged commodity chapter numbers as judging contents of the import and export commodity declaration specification text; step 2: performing word segmentation processing on the import and export commodity declaration specification text, and removing punctuation marks and stop words; step 3: learning semantic knowledge of the segmented text in an unsupervised mode by using a Word2vec model, and representing semantic information of the words in a Word vector mode; and obtaining a word vector matrix of each text. Step 4: the word vector matrix is sent into a canonical declaration intelligent evaluation model for training; and selecting and loading a model with the best classification effect, and sending commodity reporting text to be checked into the model to judge whether reporting information is standard or not. The evaluation accuracy is remarkably improved.
Description
Technical Field
The invention relates to the technical field of natural language processing, in particular to a customs import and export commodity specification reporting intelligent assessment method based on a deep learning model.
Background
The standard declaration refers to filling in the specific requirements of different declaration elements of the commodity when filling in the commodity content of the customs import and export commodity declaration form. The specification declaration is used for adapting to trade development and customs supervision requirements, standardizing declaration behaviors of import and export enterprises, improving declaration data quality, accelerating clearance and facilitating trade. The customs import and export commodity specification declaration is one of important contents of customs local tax payer management, is an important way for constructing a novel quotation relationship and improving the tax compliance of enterprises, is a foundation for ensuring tax administration quality, import and export goods implementation inspection supervision, internal law enforcement supervision and low-level check, and has important significance on customs office efficiency and execution of national policies if the result is correct.
Currently, customs mainly rely on business specialists to judge whether commodity declaration texts are standard or not. Because the manual judgment is time-consuming and labor-consuming, and the daily import and export commodity quantity of customs is huge, only the declaration text of a very small quantity of commodity can be extracted for inspection every year, and the efficiency is low and the comprehensiveness is lacking.
Disclosure of Invention
Aiming at the problems in the prior art, the intelligent assessment of customs import and export commodity specification reporting is converted into a text classification problem in natural language processing, and the characteristics of customs import and export commodity reporting text are combined, so that an end-to-end deep learning model is provided for automatically performing specification assessment on the reporting text.
In order to achieve the above purpose, the technical scheme of the application is as follows: a customs import and export commodity specification declaring intelligent evaluation method comprises the following steps:
step 1: the commodity declaration text is a text composed of a series of elements capable of reflecting objective conditions of commodities, such as customs numbers, commodity specification models, actual tariff tax rates and the like. The enterprise fills in corresponding declaration element information according to element names in a column of commodity specification and model, and the first two commodity numbers represent chapters of commodities, namely the category. Preprocessing a customs import and export commodity declaration text, extracting key elements in a column of commodity specification and model, and taking element names, corresponding element words and contents under the belonged commodity chapter numbers as judging contents of the import and export commodity declaration specification text;
step 2: performing word segmentation processing on the import and export commodity declaration specification text by utilizing the Jieba word segmentation in python, and removing punctuation marks and stop words;
step 3: learning semantic knowledge of the segmented text in an unsupervised mode by using a Word2vec model, and representing semantic information of the words in a Word vector mode; since the declared text is short text data, 75% of the declared data has a length of about 20 words, when training is performed by using the Word2vec model, the length is set to 20, the length is cut off more than the length, the insufficient filling is performed, and the dimension is set to 300, so that a Word vector matrix of each text is obtained.
Step 4: the word vector matrix is sent into a standard declaration intelligent evaluation model for training, wherein the parameter learning rate is set to 0.001, the batch is set to 64, the iteration number is set to 500, an optimizer uses Adam, the accuracy and the F1 value are used as evaluation indexes, and the trained model and the trained evaluation indexes are saved; and selecting and loading a model with the best classification effect, and sending commodity reporting text to be checked into the model to judge whether reporting information is standard or not.
Further, the specific implementation manner of the step 4 is as follows:
step 41, the word vector matrix is sent into a bidirectional long and short time memory network (Bidirectional Long Short Term Memory, biLSTM) with an attention mechanism, and relations among commodity text contexts are extracted. BiLSTM Forward read L C1 To L C300 Is a backward reading L C300 To L C1 Is a characteristic sequence of (a). In general, the output of BiLSTM is expressed as follows:
from the forward hidden stateAnd a backward hidden state->Obtaining a given feature text Lc n Is a comment of (1). The attention mechanism may focus on the features of keywords to reduce the impact of non-keywords on the context text and may be considered a fully connected layer. Feature text Lc n The characteristic is obtained by a layer of sensor>For each term, where w and b represent weights and biases in neurons, tanh () is an activation function:
by characteristics ofAnd word context vector->Acquiring weight of word normalization>M is the number of words in the feature, exp () is an exponential function:
Step 42, the word vector matrix is sent to an acceptance module, and the word discrete relation is extracted by utilizing convolution kernels with different sizes, wherein the BatchNorm algorithm is used, so that the model learning speed is greatly improved, the gradient disappearance problem is solved to a certain extent, the convergence process is greatly accelerated, and the classification effect is also improved. Taking the mean value and variance of one batch as the estimation of the mean value and variance of the whole data set, introducing the learnable parameters gamma and beta, and learning and recovering the feature distribution to be learned of the original network, wherein m is the size of the batch, namely the number of samples in each batchQuantity, x i Training data for the ith mini-batch:
first calculate the averageSum of variances->The effect of e is then normalized, equation (8), to prevent variance 0 from producing an invalid calculation. The normalization aims to normalize the data to a unified interval, reduce the divergence degree of the data, reduce the learning difficulty of a network and keep the distribution of the original data to a certain extent. After normalization, a linear change operation, namely formula (9), is performed, in order to ensure nonlinear acquisition, scale plus shift operation is performed on x which satisfies 0 and 1 variance after transformation, namely, each element is multiplied by gamma and then beta, so that equivalent transformation is realized and the distribution information of original input characteristics is reserved. During training, batchNorm can be adjusted for activation values according to a plurality of training examples in mini-batch.
Step 43, sending the relation between commodity text contexts and the word discrete relation into a fusion classification module for training, and storing the trained model and evaluation index;
and 44, selecting and loading the model with the best classification effect, and sending the commodity declaration text into the model to judge whether declaration information is standard or not.
By adopting the technical scheme, the invention can obtain the following technical effects: the invention adopts a deep learning model, utilizes the special corpus resource of customs and combines the characteristics of customs texts, and automatically judges the normalization of the filled content according to a normalized language library.
Drawings
FIG. 1 is a flow chart of a method for intelligent assessment of customs import and export commodity specification declaration;
FIG. 2 is a framework diagram of a specification declaration intelligent assessment model.
Detailed Description
The embodiment of the invention is implemented on the premise of the technical scheme of the invention, and a detailed implementation mode and a specific operation process are provided, but the protection scope of the invention is not limited to the following embodiment.
Example 1
Referring to fig. 1, based on characteristics of customs texts, the application provides a method for intelligent assessment of customs import-export commodity specification declaration: firstly, data preprocessing is carried out on customs import and export commodity declaration text, then Word segmentation is carried out on text data, word vectors are trained through a Word2vec model, and finally the text data are sent into a deep learning model for classification. The method effectively solves the problem of specification reporting in the customs commodity specification reporting intelligent evaluation system, and the accuracy is remarkably improved compared with other mainstream methods at present.
The present invention will be described in detail below with reference to examples and drawings so as to enable one of ordinary skill in the art to practice the same, with reference to the present description.
In this embodiment, pycharm is used as a development platform, and Python is used as a development language. The method is carried out on 30520 sentence corpus of customs real data. The method comprises the following specific processes:
step 1: and preprocessing the customs import and export commodity text to obtain big word names, chapters and element codes.
Step 2: the long text obtained in the step 1 is accurately split by utilizing the Jieba word segmentation in python, and a new text document with punctuation marks and stop words removed is generated, specifically:
step 21: the text is Jieba-segmented, for example:
data: "spin-on connector Instructions for use |39|0000"
Word segmentation data: "Instructions for use of screw-on connector 390000"
Step 3: word2vec model is utilized to train Word vector of the segmented text, and the Word vector training method specifically comprises the following steps:
step 31: the Word2vec model is utilized to unify the short text into a Word vector with the length of 20 and the dimension of 300;
step 4: sending the word vector obtained in the step 3 into a model for classification operation, so as to obtain a standard declaration result, wherein the method specifically comprises the following steps:
step 41: sending the generated word vector into a BiLSTM+attribute module, and extracting element relations between short text contexts;
step 42: sending the generated word vector into an acceptance module, and extracting discrete word relations by utilizing convolution kernels with different sizes;
step 43: and sending the features extracted by the two modules into a fusion classification module, and then performing classification operation to obtain a final result.
According to the above steps, the invention compares the classification effect with a logistic regression (Logistic Regression, LR) model, a support vector machine (SupportVectorMachines, SVM) model, a Convolutional neural network (Convolutional NeuralNetworks, CNN) model, a TextCNN model, and a BERT model. As can be seen from table 1, the method proposed by the present invention is significantly superior to other methods in terms of classification accuracy and F1 value.
Table 1 comparison of different models for customs import and export commodity classification effect
The foregoing descriptions of specific exemplary embodiments of the present invention are presented for purposes of illustration and description. It is not intended to limit the invention to the precise form disclosed, and obviously many modifications and variations are possible in light of the above teaching. The exemplary embodiments were chosen and described in order to explain the specific principles of the invention and its practical application to thereby enable one skilled in the art to make and utilize the invention in various exemplary embodiments and with various modifications as are suited to the particular use contemplated. It is intended that the scope of the invention be defined by the claims and their equivalents.
Claims (3)
1. The method for intelligently evaluating the customs import and export commodity specification declaration is characterized by comprising the following steps of:
step 1: preprocessing a customs import and export commodity declaration text, extracting key elements in a column of commodity specification and model, and taking element names, corresponding element words and contents under the belonged commodity chapter numbers as judging contents of the import and export commodity declaration specification text;
step 2: performing word segmentation processing on the import and export commodity declaration specification text by utilizing the Jieba word segmentation in python, and removing punctuation marks and stop words;
step 3: learning semantic knowledge of the segmented text in an unsupervised mode by using a Word2vec model, and representing semantic information of the words in a Word vector mode to obtain a Word vector matrix of each text;
step 4: the word vector matrix is sent into a normative declaration intelligent evaluation model for training, and the trained model and evaluation indexes are saved; selecting and loading a model with the best classification effect, and sending commodity reporting text to be checked into the model to judge whether reporting information is standard or not;
the specific implementation mode of the step 4 is as follows:
step 41, sending the word vector matrix into a bi-directional long short-time memory network BiLSTM with an attention mechanism, and extracting the contexts of commodity textsIs a relationship of (2); biLSTM Forward read L C1 To L C300 Is a backward reading L C300 To L C1 Is a characteristic sequence of (2); the output of BiLSTM is expressed as follows:
from the forward hidden stateAnd a backward hidden state->Obtaining a given feature text Lc n Is a comment of (2); feature text Lc n The characteristic is obtained by a layer of sensor>For each term, where w and b represent weights and biases in neurons, tanh () is an activation function:
by usingAnd word context vector->Acquiring weight of word normalization>M is the number of words in the feature, exp () is an exponential function:
Step 42, sending the word vector matrix into an acceptance module, extracting word discrete relations by using convolution kernels with different sizes, taking the mean value and variance of one batch as the estimation of the mean value and variance of the whole data set, introducing the learnable parameters gamma and beta, and learning and recovering the feature distribution to be learned of the original network, wherein m is the batch size, namely the number of samples in each batch, and x i Training data for the ith mini-batch:
first calculate the averageSum of variances->Then normalizing, equation (8), the role of e is to prevent variance 0 from producing an invalid calculation; after normalization, performing a linear change operation, namely a formula (9), and then performing scale plus shift operation on x meeting the mean value of 0 and the variance of 1, namely multiplying each element by gamma and then adding beta to realize equivalent transformation and keep the distribution information of the original input characteristics;
step 43, sending the relation between commodity text contexts and the word discrete relation into a fusion classification module for training, and storing the trained model and evaluation index;
and 44, selecting and loading the model with the best classification effect, and sending the commodity declaration text into the model to judge whether declaration information is standard or not.
2. The method for intelligent assessment of customs import and export commodity specification declaration according to claim 1, wherein the text after Word segmentation is short text data, 75% of the data has a length of only about 20 words, so that when training is performed by using a Word2vec model, the length is set to 20, the excess is truncated, the insufficient filling is performed, and the dimension is set to 300.
3. The method for intelligent assessment of customs import and export commodity specification declaration according to claim 1, wherein when training the specification declaration intelligent assessment model, the parameter learning rate is set to 0.001, the batch is set to 64, the iteration number is set to 500, and the optimizer uses Adam, and the accuracy and F1 value are used as evaluation indexes.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110956040.3A CN113705188B (en) | 2021-08-19 | 2021-08-19 | Intelligent evaluation method for customs import and export commodity specification declaration |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110956040.3A CN113705188B (en) | 2021-08-19 | 2021-08-19 | Intelligent evaluation method for customs import and export commodity specification declaration |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113705188A CN113705188A (en) | 2021-11-26 |
CN113705188B true CN113705188B (en) | 2023-06-06 |
Family
ID=78653849
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110956040.3A Active CN113705188B (en) | 2021-08-19 | 2021-08-19 | Intelligent evaluation method for customs import and export commodity specification declaration |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113705188B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116187342A (en) * | 2023-03-03 | 2023-05-30 | 北京青萌数海科技有限公司 | Method and system for extracting commodity label |
CN116308689B (en) * | 2023-05-26 | 2023-07-21 | 厦门触网科技有限公司 | Bid insurance application processing device |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111985204A (en) * | 2020-07-29 | 2020-11-24 | 大连大学 | Customs import and export commodity tax number prediction method |
-
2021
- 2021-08-19 CN CN202110956040.3A patent/CN113705188B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111985204A (en) * | 2020-07-29 | 2020-11-24 | 大连大学 | Customs import and export commodity tax number prediction method |
Non-Patent Citations (1)
Title |
---|
一种基于注意力机制的中文短文本关键词提取模型;杨丹浩;吴岳辛;范春晓;;计算机科学(第01期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN113705188A (en) | 2021-11-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021027533A1 (en) | Text semantic recognition method and apparatus, computer device, and storage medium | |
CN110427623A (en) | Semi-structured document Knowledge Extraction Method, device, electronic equipment and storage medium | |
US20030236662A1 (en) | Sequential conditional generalized iterative scaling | |
CN112434535B (en) | Element extraction method, device, equipment and storage medium based on multiple models | |
CN109344234A (en) | Machine reads understanding method, device, computer equipment and storage medium | |
CN113705188B (en) | Intelligent evaluation method for customs import and export commodity specification declaration | |
CN116245107B (en) | Electric power audit text entity identification method, device, equipment and storage medium | |
Fu et al. | A sentiment-aware trading volume prediction model for P2P market using LSTM | |
CN114358014A (en) | Work order intelligent diagnosis method, device, equipment and medium based on natural language | |
CN113591971B (en) | User individual behavior prediction method based on DPI time sequence word embedded vector | |
CN111709225B (en) | Event causal relationship discriminating method, device and computer readable storage medium | |
CN112417852A (en) | Method and device for judging importance of code segment | |
CN113869054B (en) | Deep learning-based power field project feature recognition method | |
KR102506778B1 (en) | Method and apparatus for analyzing risk of contract | |
CN116522912B (en) | Training method, device, medium and equipment for package design language model | |
CN113220885A (en) | Text processing method and system | |
CN114691836B (en) | Text emotion tendentiousness analysis method, device, equipment and medium | |
CN113821571B (en) | Food safety relation extraction method based on BERT and improved PCNN | |
CN113222471B (en) | Asset wind control method and device based on new media data | |
Zhu | English lexical analysis system of machine translation based on simple recurrent neural network | |
Xia et al. | Analysis and prediction of telecom customer churn based on machine learning | |
CN114020901A (en) | Financial public opinion analysis method combining topic mining and emotion analysis | |
Handayani et al. | Sentiment Analysis of Bank BNI User Comments Using the Support Vector Machine Method | |
Xu et al. | [Retracted] The Dissemination and Evaluation of Campus Ideological and Political Public Opinion Based on Internet of Things Monitoring | |
CN118070775B (en) | Performance evaluation method and device of abstract generation model and computer equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |