[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN111026870A - ICT system fault analysis method integrating text classification and image recognition - Google Patents

ICT system fault analysis method integrating text classification and image recognition Download PDF

Info

Publication number
CN111026870A
CN111026870A CN201911264526.XA CN201911264526A CN111026870A CN 111026870 A CN111026870 A CN 111026870A CN 201911264526 A CN201911264526 A CN 201911264526A CN 111026870 A CN111026870 A CN 111026870A
Authority
CN
China
Prior art keywords
classification
fault
data
image
preprocessing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911264526.XA
Other languages
Chinese (zh)
Inventor
俞学豪
孙瑨一
郑蓉蓉
李国栋
赵子岩
王晨辉
韩笑
冯显时
李雅西
袁洲
高金京
陈亮
王玮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Corp of China SGCC
State Grid Information and Telecommunication Co Ltd
North China Electric Power University
Information and Telecommunication Branch of State Grid Shandong Electric Power Co Ltd
Original Assignee
State Grid Corp of China SGCC
State Grid Information and Telecommunication Co Ltd
North China Electric Power University
Information and Telecommunication Branch of State Grid Shandong Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Corp of China SGCC, State Grid Information and Telecommunication Co Ltd, North China Electric Power University, Information and Telecommunication Branch of State Grid Shandong Electric Power Co Ltd filed Critical State Grid Corp of China SGCC
Priority to CN201911264526.XA priority Critical patent/CN111026870A/en
Publication of CN111026870A publication Critical patent/CN111026870A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/217Validation; Performance evaluation; Active pattern learning techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/20Administration of product repair or maintenance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • Evolutionary Computation (AREA)
  • Tourism & Hospitality (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Business, Economics & Management (AREA)
  • Evolutionary Biology (AREA)
  • Databases & Information Systems (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Strategic Management (AREA)
  • Marketing (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Water Supply & Treatment (AREA)
  • Primary Health Care (AREA)
  • Public Health (AREA)
  • Computational Linguistics (AREA)
  • Animal Behavior & Ethology (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an ICT system fault analysis method integrating text classification and image recognition, and belongs to the technical field of fault analysis through neural network learning. The method collects fault data recorded by customer service, and manually preprocesses the data, wherein the data comprises two parallel processes of text classification and image identification, and the two processes are finally classified by a classifier; then establishing a fault analysis model for fault analysis through fault judgment, and returning to the initial data for manual preprocessing after the model is updated; the method and the system realize reasonable allocation of ICT system resources, relieve huge pressure brought to customer service operation and maintenance by increasing number of ICT systems, solve the problems that knowledge cannot be shared, internal resources cannot be efficiently coordinated and orderly operated and the like because ICT customer service of the state network at present only depends on personal knowledge storage and experience, and improve the intelligent level of ICT operation and maintenance.

Description

ICT system fault analysis method integrating text classification and image recognition
Technical Field
The invention belongs to the technical field of fault analysis through neural network learning, and particularly relates to an ICT system fault analysis method integrating text classification and image recognition.
Background
With the increasing complexity of the power grid system, the analysis and judgment of the fault of the power grid system cannot be performed by only depending on the knowledge reserve and personal experience of a single worker. Therefore, the system fault analysis technology is also an important direction for the research of the current computer-aided decision system. The fault analysis and judgment technology is mainly realized by two methods at present: and constructing a fault information database and performing fault analysis by a machine learning-based method.
The method of constructing the fault information database is generally accomplished by constructing a knowledge graph to store the fault information. The concept of a knowledge graph was first proposed by Google in 2012 to address the upgrade of traditional search models. The main objective of the knowledge graph is to describe various entities and concepts existing in the real world and strong relationships between them, and we use relationships to describe the association between two entities. The knowledge graph is used in the fault analysis technology, because of the logical property during construction, fault analysis has strong logical property and is convenient for customers to use, but the effect is poor for certain faults and reasons which have no logical relation on the surface, and if new fault information needs to be accumulated by experience, the method needs to update the knowledge graph, needs professionals to update a fault analysis system, and has large workload.
The method for analyzing the faults based on the machine learning mainly comprises the steps of training a deep learning network model by using historical fault record information as a training set and classifying the faults by using new fault information as a test set through a text classification method. Being a machine learning approach, this approach easily discovers the logical relationship of some potential, artificially difficult-to-discover faults and causes, and the new fault records will "build up experience" for the system as a training set. But again this method suffers from a number of disadvantages. As a specific application of a machine learning algorithm, the method has the main difficulty and disadvantages that the method has high requirements on the quantity of historical data, and a large amount of historical data represents higher training precision; secondly, the quality of the historical data is also closely related to the final classification result, which depends on the awareness level of the data logger to the fault at the time; finally, the method needs to classify the fault in advance, the classification standard is established on the experience of manually handling the fault, and the classification cannot be too much or too little, which is also a big difficulty of the method.
The invention pertains generally to improved optimization of fault analysis based on neural network learning. Mainly aiming at the specific application field, in practice, fault records of the national power grid ICT (information and Communication technology) system not only have text records, but also have a large amount of fault image information, such as system anomaly detection screenshots and the like, and the auxiliary judgment of fault analysis through image identification is added.
Disclosure of Invention
The invention aims to provide an ICT system fault analysis method integrating text classification and image recognition, which is characterized by comprising the following steps of:
the method comprises the following steps: collecting fault data recorded by customer service, and manually preprocessing the data, wherein the data comprises two parallel processes of text classification and image identification, and the two processes are classified by a classifier finally; then establishing a fault analysis model for fault analysis through fault judgment, and returning to the initial data for manual preprocessing after the model is updated;
step two: model training is respectively carried out in the two parallel processes of text classification and image recognition; the text classification comprises the following main steps: text preprocessing, tf-idf calculation, label digitalization, data extraction and classifier classification; the image recognition main steps are as follows: image preprocessing, neural network model construction, parameter optimization and classifier classification;
step three: judging by classification, namely comparing the text classification result and the image classification result obtained in the step two with the actual result, and determining a text classification weight value and an image classification weight value so that the integrated result is consistent with the actual classification;
step four: analyzing faults, inputting a newly generated fault report as a test set, judging fault classification according to the fault report by a training model, and overhauling a system according to the fault classification by operation and maintenance personnel;
step five: and updating the model, namely after the operation and maintenance personnel finish the fault maintenance, putting the fault report into a historical fault report set, updating the data of the training set, and training the model through a new training set at regular intervals, so as to improve the operation and maintenance personnel.
The fault data of the customer service record is collected, fault data reports of the customer service record in the actual ICT system are all unstructured data stored in WORD documents, and as python has richer operation functions on EXCLE, scattered WORD document contents are stored in an EXCLEL table, secondly, report contents need to be manually simplified in preprocessing, and finally, for each data item, classification is carried out by staff with rich experience.
The first step of data preprocessing is to adopt a mode of generating a picture list and a label list, corresponding the picture name and the label, and reading and manufacturing an iterator, wherein in the data preprocessing process, the picture name comprises picture classification information; and is mainly composed of two functions: get _ files and get _ batch.
The text preprocessing in the second step is different from the preprocessing in the first step, and is mainly finished by a computer through a jieba Chinese language processing toolkit, and the main tasks are word segmentation and word deactivation.
The image preprocessing in the second step is the same as the text preprocessing, and is also completed through computer processing. The method mainly cuts the image of the picture in the document and processes the picture into a proper size.
Image preprocessing and neural network model construction in the second step; the convolutional neural network of the system is: convolution + pooling layer x2, full join layer x2, and last softmax layer. The specific implementation of the process is mainly realized by calling a TensorFlow related function.
The classification judgment of the third step is that the probability obtained by text classification and the probability obtained by image recognition are multiplied by the weighted probability respectively and summed to obtain a comprehensive probability value,
P=P1w1+P2w2
wherein w1、w2The weights of the image and the text are respectively, according to multiple tests, the image weight is 0.19, and the text weight is 0.81; p1Probability, P, obtained by classification2The probability obtained by image recognition is P, and the final comprehensive probability is P.
The method has the beneficial effects that the specialization and the assimilation level of the ICT customer service are improved by constructing the knowledge map in the ICT field. In order to further effectively exert the potential value of the ICT operation and maintenance mass data; through knowledge processing and big data analysis, the potential association requirements of ICT system users are met, hot events are tracked, ICT system faults are researched and judged in an auxiliary mode, value-added service capability of ICT services is created, reasonable allocation of ICT system resources is achieved, huge pressure brought to customer service operation and maintenance by increasing number of ICT systems is relieved, the problems that knowledge cannot be shared, internal resources cannot efficiently cooperate with one another and operate orderly due to the fact that existing state network ICT customer service only depends on personal knowledge storage and experience are solved, and the intelligent level of ICT operation and maintenance is improved.
Drawings
Fig. 1 is a flow chart of ICT system fault analysis.
Fig. 2 is a flow chart of text classification.
Fig. 3 is a flowchart of image recognition.
Detailed Description
The invention provides an ICT system fault analysis method integrating text classification and image recognition, which specifically comprises the following steps: collecting fault data recorded by customer service, and manually preprocessing the data, wherein the data comprises two parallel processes of text classification and image identification, and the two processes are classified by a classifier finally; and then establishing a fault analysis model for fault analysis through fault judgment, and returning to the initial data for manual preprocessing after the model is updated. The present invention will be described with reference to the accompanying drawings.
Fig. 1 shows a flow chart of fault analysis of an ICT system. The method comprises the following steps:
the method comprises the following steps: and (5) manually processing the data. Fault data reports recorded by customer service in an actual ICT system are all unstructured data stored in WORD documents, and the steps mainly comprise converting the unstructured data into structured data, simplifying texts and completing data item label labeling.
The system adopts the mode of generating the picture list and the label list, corresponding the picture name and the label, and reading and manufacturing the iterator, wherein the picture name comprises the picture classification information in the data preprocessing process.
The process consists essentially of two functions: get _ files and get _ batch.
The goal of the Get _ files function is to return the determined folder path to the disordered pictures and labels through function operation.
Listdir converts the name into a list expression form, the name array caches the list result, and the name [0] is the classification label of the picture, so that the image can be judged to belong to the classification, and the classification can be added into the array corresponding to the classification. For example, a file named NetworkError0001.jpg, where name [0] ═ NetworkError is a network fault problem, we store the name and path of a NetworkError0001.jpg picture in NetworkError, and add a 1 value in label _ NetworkError as a flag (different classifications have different values as flags).
Then, the classified file data need to be disordered in sequence, in order to eliminate errors caused by the specific sequence of the data set on the training result, the picture and the label are integrated together through an np.hstack () method, the label is also integrated together and is dumped into a two-dimensional array, then np.random.shuffle is used for carrying out random disorder on the integrated array, and finally the first row of the result after random disorder is taken as the picture array, and the second row is taken as the label array.
D:/Python/data/NetworkError0001.jpg
1
TABLE 1 random arrangement of temporary storage arrays
The Get _ batch function aims to generate batches of the same size.
First, the type python.list generated in the previous step needs to be converted into a format that tf can recognize. The method of tf.cast is used for forced type conversion. Next, a queue is generated, we use slice _ input _ producer () to create a queue, putting image and label into a list as parameters to pass to the function. And then needs to be decoded according to the picture format. In this routine the training data is in jpg format, so a decode _ jpeg () decoder is used. Finally, the picture size is unified, the main method is image, resize _ image _ with _ crop _ or _ pad (image, image _ W, image _ H), and the method parameters mainly include the picture to be processed, the target image width and the target image height.
Step two: the method mainly comprises two parallel steps of text classification and image recognition.
The text classification comprises the following main steps: text preprocessing, tf-idf calculation, label digitalization, data extraction and classifier classification. As shown in fig. 2.
The text preprocessing is different from the preprocessing in the step one, and is mainly finished by a computer through a jieba Chinese language processing toolkit, and the main tasks are word segmentation and word stop. The word segmentation is to divide a complete and coherent sentence into words, and the full mode is set to be used for word segmentation through a parameter cut _ all as Ture; the stop word is a word which does not affect the sentence semantics, such as a word of ' in ', ' and the like, a Chinese stop word.
TF-IDF is the word frequency inverse text frequency index. The main idea of TFIDF is: if a word or phrase appears in an article frequently and rarely, TF is high, the word or phrase is considered to represent the text, so that the word or phrase can be judged to have good category distinguishing capability and is suitable for classification. TF-IDF is actually: TF, IDF, TF Term Frequency (Term Frequency), IDF Inverse file Frequency (Inverse document Frequency). TF represents the frequency with which terms appear in document d. The main idea of IDF is: if the documents containing the entry t are fewer, that is, the smaller n is, the larger IDF is, the entry t has good category distinguishing capability. If the number of documents containing the entry t in a certain class of document C is m, and the total number of documents containing the entry t in other classes is k, it is obvious that the number of documents containing t is m + k, when m is large, n is also large, and the value of the IDF obtained according to the IDF formula is small, which means that the category distinguishing capability of the entry t is not strong. In practice, however, if a term frequently appears in a document of a class, it indicates that the term can well represent the characteristics of the text of the class, and such terms should be given higher weight and selected as characteristic words of the text of the class to distinguish the document from other classes. In a given document, the Term Frequency (TF) refers to the frequency with which a given word appears in the document. This number is a normalization of the number of words (termcount) to prevent it from biasing towards long files. (the same word may have a higher number of words in a long document than in a short document, regardless of whether the word is important or not.) for a word in a particular document, its importance may be expressed as:
Figure BDA0002312464010000071
the numerator in the above equation is the number of occurrences of the word in the document, and the denominator is the sum of the number of occurrences of all words in the document. Inverse Document Frequency (IDF) is a measure of the general importance of a word. The IDF for a particular term may be obtained by dividing the total number of documents by the number of documents that contain that term, and taking the resulting quotient to be a base-10 logarithm:
Figure BDA0002312464010000081
the numerator is the total number of documents in the corpus, and the denominator is the number of documents containing words. And finally, calculating the product of the TF and the IDF. A high word frequency within a particular document, and a low document frequency for that word across the document collection, may result in a high-weighted TF-IDF. At this point, the keywords can be filtered out.
The tag digitization step is achieved by calling pd. Categorical in the pandas toolkit, which converts text into Categories objects and automatically into digital storage. Wherein the "()" internal information is mainly the input parameters of the function. Sometimes when a function is used, if the parameter is empty, we can delete the parenthesis and only keep the function name. For example, "()" may be deleted here
The extracted data mainly breaks up the training set data sequence, and avoids errors caused by the training set data in multiple training.
Classifier classification is mainly accomplished by logistic regression algorithm derived from skleran. Logistic regression is usually only used as a regression function of a binary problem for prediction, and is applied to multi-classification, one of the classes is marked as 1, the other classes are 0, a group of corresponding parameter theta values are trained by using a logistic regression algorithm, the work is repeated for k times (k is the number of the classes), different classes are set as 1 successively, k groups of theta prediction data x are respectively used, k prediction results can be obtained, and the class corresponding to the group of theta corresponding to the maximum prediction value is selected as the prediction class.
The image recognition main steps are as follows: image preprocessing, neural network model construction, parameter optimization and classifier classification. As shown in fig. 3.
Image preprocessing is also accomplished by computer processing, as is text preprocessing. The method mainly cuts the image of the picture in the document and processes the picture into a proper size. Two important functions are used: queue _ slice _ input _ producer ([ image, label ]) and image _ batch, label _ batch ═ tf _ batch [ image, label ], batch _ size ═ batch _ size, [ num _ threads ═ 64, capacity ═ capacity ], where tf. The processed image length and width are determined by resize _ w and resize _ h.
The convolutional neural network of the system is as follows: convolution + pooling layer x2, full join layer x2, and last softmax layer. The specific implementation of the process is mainly realized by calling a TensorFlow related function.
a. The convolutional layer 1: 16 convolution kernels (3 channels) of 3 × 3, padding ═ SAME', a graph showing the convolution after padding is in agreement with the original size, and a function relu (), is activated; wherein "()" internal information is mainly input parameters of the function; when the function is used, if the input parameter is null, the brackets are deleted, and only the function name is reserved.
b. A pooling layer 1: the 3x3 maximum pooling, the step length strides being 2, after pooling, lrn () operation is performed, and the local response normalization is beneficial to training.
c. And (3) convolutional layer 2: 16 convolution kernels of 3 × 3 (16 channels) ('SAME'), a graph showing convolution after padding is identical to the original size, and a function relu ()
d. And (3) a pooling layer 2: 3x3 max pooling with stride of 2, executing lrn () operation after pooling
e. Full connection layer 3: 128 neurons, aligning the outputs reshape of the previous pool layer, activating the function relu ()
f. Full connection layer 4: 128 neurons, activation function relu ()
Softmax regression layer: the previous FC layer output is subjected to a linear regression to calculate the score for each class, here class 2, so this layer outputs two scores.
Loss of h.loss calculation
And (3) inputting parameters: logits, the network computes the output value. labels, true value, here 0 or 1
Returning parameters: loss, loss value
Loss value optimization
Inputting parameters: loss. learning _ rate, learning rate.
Returning parameters: run _ op, this parameter is input into the sess.
j. Evaluation/accuracy calculation
Inputting parameters: logits, network computed values. labels, the label, i.e. the true value, is here 0 or 1.
Returning parameters: accuracy, the average accuracy of the current step, i.e. how many pictures in these batchs are correctly classified.
In LeNet, the accuracy of training precision is mainly realized by adjusting the following three parameters:
the batch _ size can be understood as a batch parameter, the limit value of which is the total number of samples in the training set, and when the data volume is small, the batch _ size value can be set to be a Full data set (Full batch coring);
the Learning rate (Learning rate) is an important super-parameter in supervised Learning and deep Learning, and determines whether and when the objective function can converge to a local minimum. The proper learning rate can enable the objective function to converge to a local minimum value in a proper time, if lr is too small, the model does not converge or converges too slowly, and if lr is too large, the model can oscillate.
max _ step is the maximum number of training steps, and when the value is small, complete learning cannot be performed on the training set, and when the value is large, the result may be overfit.
The whole algorithm thought is from the first step to the large loop of max _ step, the training loss value is accumulated in each loop, the training is completed according to the sess.run method in the neural network defined in the previous section, and the trained model.
Step three: and (4) judging by classification, namely multiplying the probability obtained by classifying the text and the probability obtained by identifying the image by the weighted probability respectively and summing to obtain a comprehensive probability value.
P=P1w1+P2w2
Wherein, P1Probability, P, obtained by classification2The probability obtained by image recognition is P, and the final comprehensive probability is P.
w is the weight of the image and the text respectively, and according to multiple experiments, the weight of the image is 0.19, and the weight of the text is 0.81.
Step four: the fault analysis system is constructed, when a new fault is generated, fault data are input as a test set, the system can complete fault judgment according to a trained model, operation and maintenance personnel can overhaul the fault according to the fault judgment, and overhaul conditions are recorded in a fault report.
Step five: and recording a newly generated fault report in the fourth step into a historical fault record, and regularly retraining the model by taking the updated historical fault set as a training set, so that the effect of updating the model is achieved, and the model continuously accumulates experience.

Claims (7)

1. An ICT system fault analysis method integrating text classification and image recognition is characterized by comprising the following steps:
the method comprises the following steps: collecting fault data recorded by customer service, and manually preprocessing the data, wherein the data comprises two parallel processes of text classification and image identification, and the two processes are classified by a classifier finally; then establishing a fault analysis model for fault analysis through fault judgment, and returning to the initial data for manual preprocessing after the model is updated;
step two: model training is respectively carried out in the two parallel processes of text classification and image recognition; the text classification comprises the following main steps: text preprocessing, tf-idf calculation, label digitalization, data extraction and classifier classification; the image recognition main steps are as follows: image preprocessing, neural network model construction, parameter optimization and classifier classification;
step three: judging by classification, namely comparing the text classification result and the image classification result obtained in the step two with the actual result, and determining a text classification weight value and an image classification weight value so that the integrated result is consistent with the actual classification;
step four: analyzing faults, inputting a newly generated fault report as a test set, judging fault classification according to the fault report by a training model, and overhauling a system according to the fault classification by operation and maintenance personnel;
step five: and updating the model, namely after the operation and maintenance personnel finish the fault maintenance, putting the fault report into a historical fault report set, updating the data of the training set, and training the model through a new training set at regular intervals, so as to improve the operation and maintenance personnel.
2. The ICT system fault analysis method of integrated text classification and image recognition as claimed in claim 1, wherein the fault data collected from customer service records and fault data reports from customer service records in actual ICT systems are unstructured data stored in WORD documents, which stores scattered WORD document contents into an EXCLE table because python has richer operation functions for the EXCLE, secondly, manually reduces report contents in preprocessing, and finally, for each data item, classification is labeled by an experienced worker.
3. The ICT system fault analysis method according to claim 1, wherein the step-data preprocessing is performed by generating a picture list and a label list, associating the picture name with the label, and reading and making an iterator, because the picture name includes the picture classification information during the data preprocessing; and mainly comprises two functions of get _ files and get _ batch.
4. The ICT system fault analysis method based on comprehensive text classification and image recognition according to claim 1, characterized in that the text preprocessing in the second step is different from the preprocessing in the first step, and is mainly completed by a computer through a jieba Chinese language processing toolkit, and the main tasks are word segmentation and word deactivation.
5. The ICT system fault analysis method according to claim 1, characterized in that the image preprocessing in step two is the same as the text preprocessing, and is also completed by computer processing; the method mainly cuts the image of the picture in the document and processes the picture into a proper size.
6. The ICT system fault analysis method based on comprehensive text classification and image identification according to claim 1, characterized in that in the second step, image preprocessing and neural network model construction are performed; the convolutional neural network of the system is: a convolution + pooling layer x2, a full connection layer x2 and a last softmax layer are classified; the specific implementation of the process is mainly realized by calling a TensorFlow related function.
7. The ICT system fault analysis method for integrated text classification and image recognition according to claim 1, wherein the classification judgment of step three is to multiply the probability obtained by text classification and the probability obtained by image recognition by the weighted probabilities respectively and sum them to obtain an integrated probability value,
P=P1w1+P2w2
wherein w1、w2The weights of the image and the text are respectively, according to multiple tests, the image weight is 0.19, and the text weight is 0.81; p1Probability, P, obtained by classification2The probability obtained by image recognition is P, and the final comprehensive probability is P.
CN201911264526.XA 2019-12-11 2019-12-11 ICT system fault analysis method integrating text classification and image recognition Pending CN111026870A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911264526.XA CN111026870A (en) 2019-12-11 2019-12-11 ICT system fault analysis method integrating text classification and image recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911264526.XA CN111026870A (en) 2019-12-11 2019-12-11 ICT system fault analysis method integrating text classification and image recognition

Publications (1)

Publication Number Publication Date
CN111026870A true CN111026870A (en) 2020-04-17

Family

ID=70205799

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911264526.XA Pending CN111026870A (en) 2019-12-11 2019-12-11 ICT system fault analysis method integrating text classification and image recognition

Country Status (1)

Country Link
CN (1) CN111026870A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111667278A (en) * 2020-04-27 2020-09-15 北京国网信通埃森哲信息技术有限公司 ICT system fault analysis recommendation method and system based on user portrait
CN112131096A (en) * 2020-05-07 2020-12-25 北京国网信通埃森哲信息技术有限公司 Automatic generation method and device for ICT system fault analysis and auxiliary study and judgment test cases
CN112214603A (en) * 2020-10-26 2021-01-12 Oppo广东移动通信有限公司 Image-text resource classification method, device, terminal and storage medium
CN112232339A (en) * 2020-10-15 2021-01-15 中国民航大学 Flight display equipment fault detection method and monitoring device based on convolutional neural network
CN113554065A (en) * 2021-06-30 2021-10-26 广联达科技股份有限公司 Three-dimensional building model component classification method and device
CN116468458A (en) * 2023-03-15 2023-07-21 深圳优钱信息技术有限公司 Accurate marketing white list extraction method based on artificial intelligence and neural network

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105809293A (en) * 2016-03-29 2016-07-27 国网青海省电力公司 Multi-model combined prediction method for short-term power of wind farm
CN108647716A (en) * 2018-05-09 2018-10-12 北京理工大学 A kind of diagnosing failure of photovoltaic array method based on composite information
CN108960073A (en) * 2018-06-05 2018-12-07 大连理工大学 Cross-module state image steganalysis method towards Biomedical literature
CN109614488A (en) * 2018-12-04 2019-04-12 广西大学 Distribution network live line work condition distinguishing method based on text classification and image recognition
US20190147304A1 (en) * 2017-11-14 2019-05-16 Adobe Inc. Font recognition by dynamically weighting multiple deep learning neural networks

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105809293A (en) * 2016-03-29 2016-07-27 国网青海省电力公司 Multi-model combined prediction method for short-term power of wind farm
US20190147304A1 (en) * 2017-11-14 2019-05-16 Adobe Inc. Font recognition by dynamically weighting multiple deep learning neural networks
CN108647716A (en) * 2018-05-09 2018-10-12 北京理工大学 A kind of diagnosing failure of photovoltaic array method based on composite information
WO2019214268A1 (en) * 2018-05-09 2019-11-14 北京理工大学 Photovoltaic array fault diagnosis method based on composite information
CN108960073A (en) * 2018-06-05 2018-12-07 大连理工大学 Cross-module state image steganalysis method towards Biomedical literature
CN109614488A (en) * 2018-12-04 2019-04-12 广西大学 Distribution network live line work condition distinguishing method based on text classification and image recognition

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
GUODONG LI 等: "《An ICT System Fault Analysis Technology Based on Text Classification and Image Recognition》" *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111667278A (en) * 2020-04-27 2020-09-15 北京国网信通埃森哲信息技术有限公司 ICT system fault analysis recommendation method and system based on user portrait
CN112131096A (en) * 2020-05-07 2020-12-25 北京国网信通埃森哲信息技术有限公司 Automatic generation method and device for ICT system fault analysis and auxiliary study and judgment test cases
CN112131096B (en) * 2020-05-07 2024-05-24 北京国网信通埃森哲信息技术有限公司 ICT system fault analysis and auxiliary research and judgment test case automatic generation method and device
CN112232339A (en) * 2020-10-15 2021-01-15 中国民航大学 Flight display equipment fault detection method and monitoring device based on convolutional neural network
CN112232339B (en) * 2020-10-15 2023-04-07 中国民航大学 Aviation display equipment fault detection method and monitoring device based on convolutional neural network
CN112214603A (en) * 2020-10-26 2021-01-12 Oppo广东移动通信有限公司 Image-text resource classification method, device, terminal and storage medium
CN113554065A (en) * 2021-06-30 2021-10-26 广联达科技股份有限公司 Three-dimensional building model component classification method and device
CN116468458A (en) * 2023-03-15 2023-07-21 深圳优钱信息技术有限公司 Accurate marketing white list extraction method based on artificial intelligence and neural network

Similar Documents

Publication Publication Date Title
CN111026870A (en) ICT system fault analysis method integrating text classification and image recognition
JP7090936B2 (en) ESG-based corporate evaluation execution device and its operation method
CN112581006B (en) Public opinion information screening and enterprise subject risk level monitoring public opinion system and method
CN110059181B (en) Short text label method, system and device for large-scale classification system
CN105469096B (en) A kind of characteristic bag image search method based on Hash binary-coding
CN111445028A (en) AI-driven transaction management system
CN108573031A (en) A kind of complaint sorting technique and system based on content
CN106447066A (en) Big data feature extraction method and device
CN106445988A (en) Intelligent big data processing method and system
CN110399339A (en) File classifying method, device, equipment and the storage medium of knowledge base management system
CN111680225B (en) WeChat financial message analysis method and system based on machine learning
CN104834940A (en) Medical image inspection disease classification method based on support vector machine (SVM)
CN112597283B (en) Notification text information entity attribute extraction method, computer equipment and storage medium
KR20190110084A (en) Esg based enterprise assessment device and operating method thereof
CN111581193A (en) Data processing method, device, computer system and storage medium
CN110222192A (en) Corpus method for building up and device
CN111797267A (en) Medical image retrieval method and system, electronic device and storage medium
CN113177643A (en) Automatic modeling system based on big data
CN114491034B (en) Text classification method and intelligent device
Gao et al. An improved XGBoost based on weighted column subsampling for object classification
CN116629716A (en) Intelligent interaction system work efficiency analysis method
CN117573876A (en) Service data classification and classification method and device
CN110737700A (en) purchase, sales and inventory user classification method and system based on Bayesian algorithm
CN107577690B (en) Recommendation method and recommendation device for mass information data
CN113408546B (en) Single-sample target detection method based on mutual global context attention mechanism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200417