CN113971757A

CN113971757A - Image classification method, computer terminal and storage medium

Info

Publication number: CN113971757A
Application number: CN202111131995.1A
Authority: CN
Inventors: 李威; 陈伟涛; 王志斌; 李�昊
Original assignee: Alibaba Damo Institute Hangzhou Technology Co Ltd
Current assignee: Alibaba Damo Institute Hangzhou Technology Co Ltd
Priority date: 2021-09-26
Filing date: 2021-09-26
Publication date: 2022-01-25

Abstract

The application discloses an image classification method, a computer terminal and a storage medium. Wherein, the method comprises the following steps: acquiring a target image; processing the target image by using the image classification model to obtain a classification result corresponding to the target image, wherein the image classification model is obtained by adjusting the pre-training model through the image sample and the target sample, the target sample is a preset sample corresponding to the image sample in the preset sample set, and the pre-training model is obtained by training through the preset sample set. The method and the device solve the technical problem that in the related technology, the accuracy of image classification is low due to the fact that the data sample size is small and the accuracy of an image classification model is low.

Description

Image classification method, computer terminal and storage medium

Technical Field

The application relates to the field of model training, in particular to an image classification method, a computer terminal and a storage medium.

Background

Predicting the progress of a construction project is an important task in urban planning, and the construction project is generally divided into three categories, namely unmoved work, worked work and worked construction according to the progress of the construction. Compared with field research, the method for predicting the progress of the construction project based on the aerial images has the advantages of convenience in implementation, low labor cost and high real-time performance. Specifically, the image classification model is used for extracting and classifying the characteristics of the designated area in the aerial image, so that the construction progress of the designated area is obtained. However, training of the image classification model requires a large amount of labeled data, and a smaller data set can cause a severe overfitting problem of the deep neural network, which easily causes poor generalization capability of the model, and is difficult to train to obtain the image classification model with higher accuracy.

In view of the above problems, no effective solution has been proposed.

Disclosure of Invention

The embodiment of the application provides an image classification method, a computer terminal and a storage medium, which are used for at least solving the technical problem that the accuracy of image classification is low due to the fact that the data sample size is small and the accuracy of an image classification model is low in the related technology.

According to an aspect of an embodiment of the present application, there is provided an image classification method including: acquiring a target image; processing the target image by using the image classification model to obtain a classification result corresponding to the target image, wherein the image classification model is obtained by adjusting the pre-training model through the image sample and the target sample, the target sample is a preset sample corresponding to the image sample in the preset sample set, and the pre-training model is obtained by training through the preset sample set.

According to another aspect of the embodiments of the present application, there is also provided an image classification method, including: acquiring a building image; and processing the building image by using the image classification model to obtain a classification result of the building contained in the building image, wherein the image classification model is obtained by adjusting a pre-training model through an image sample and a target sample, the target sample is a spot classification sample corresponding to the image sample in a spot classification sample set, and the pre-training model is obtained by training the spot classification sample set.

According to another aspect of the embodiments of the present application, there is also provided an image classification method, including: the cloud server receives a target image uploaded by a client; the cloud server processes the target image by using the image classification model to obtain a classification result corresponding to the target image, wherein the image classification model is obtained by adjusting a pre-training model through image samples and target samples, the target samples are preset samples corresponding to the image samples in a preset sample set, and the pre-training model is obtained by training through the preset sample set; and the cloud server feeds back the classification result to the client.

According to another aspect of the embodiments of the present application, there is also provided an image classification apparatus including: the first acquisition module is used for acquiring a target image; the first processing module is used for processing the target image by using the image classification model to obtain a classification result corresponding to the target image, wherein the image classification model is obtained by adjusting a pre-training model through an image sample and the target sample, the target sample is a preset sample corresponding to the image sample in a preset sample set, and the pre-training model is obtained by training through the preset sample set.

According to another aspect of the embodiments of the present application, there is also provided an image classification apparatus including: the third acquisition module is used for acquiring a building image; and the third processing module is used for processing the building image by using the image classification model to obtain a classification result of the building contained in the building image, wherein the image classification model is obtained by adjusting the pre-training model through an image sample and a target sample, the target sample is a spot classification sample corresponding to the image sample in the spot classification sample set, and the pre-training model is obtained by training the spot classification sample set.

According to another aspect of the embodiments of the present application, there is also provided an image classification apparatus including: the first uploading module is used for receiving a target image uploaded by the client; the fourth processing module is used for processing the target image by using the image classification model to obtain a classification result corresponding to the target image, wherein the image classification model is obtained by adjusting a pre-training model through an image sample and the target sample, the target sample is a preset sample corresponding to the image sample in a preset sample set, and the pre-training model is obtained by training through the preset sample set; and the first feedback module is used for feeding back the image classification model to the client.

According to another aspect of the embodiments of the present application, there is also provided a computer-readable storage medium, which includes a stored program, wherein when the program runs, the apparatus where the computer-readable storage medium is located is controlled to execute the image classification method in any one of the above embodiments.

According to another aspect of the embodiments of the present application, there is also provided a computer terminal, including: a processor and a memory, the processor being configured to execute a program stored in the memory, wherein the program is configured to perform the image classification method in any of the above embodiments when executed.

In the embodiment of the application, after the target image is obtained, the image classification model is used for processing the target image to obtain the classification result corresponding to the target image, so that the purpose of improving the accuracy of the image classification model is achieved. It is easy to note that, because the target sample is obtained from the preset sample set based on the target image, and the similarity between the target sample and the target image is high, the target sample can be used as the extended data of the training image classification model, so that the data volume for training the image classification model is large enough, and therefore, the accuracy of the image classification model obtained by training is high, and the classification accuracy of the target image can be improved, thereby solving the technical problem that the accuracy of image classification is low due to the small data sample volume and the low accuracy of the image classification model in the related technology.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

fig. 1 is a block diagram of a hardware structure of a computer terminal (or mobile device) for an image classification method according to an embodiment of the present application.

Fig. 2 is a flowchart of an image classification method according to embodiment 1 of the present application;

FIG. 3 is a schematic illustration of an interactive interface according to an embodiment of the present application;

FIG. 4 is a schematic illustration of an alternative interactive interface according to an embodiment of the present application;

FIG. 5 is a flow chart of a method for performing residual error network refinement according to an embodiment of the present application;

fig. 6 is a flowchart of another image classification method according to embodiment 1 of the present application;

fig. 7 is a flowchart of an image classification method according to embodiment 2 of the present application;

fig. 8 is a flowchart of an image classification method according to embodiment 3 of the present application;

FIG. 9 is a flowchart of an image classification method according to embodiment 4 of the present application;

fig. 10 is a schematic view of an image classification apparatus according to embodiment 5 of the present application;

fig. 11 is a schematic view of an image classification apparatus according to embodiment 6 of the present application;

fig. 12 is a schematic view of an image classification apparatus according to embodiment 7 of the present application;

fig. 13 is a schematic view of an image classification apparatus according to embodiment 8 of the present application;

fig. 14 is a flowchart of an image classification method according to embodiment 9 of the present application;

FIG. 15 is a flowchart of an image classification method according to embodiment 10 of the present application

Fig. 16 is a block diagram of a computer terminal according to an embodiment of the present application.

Detailed Description

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

First, some terms or terms appearing in the description of the embodiments of the present application are applicable to the following explanations:

pattern spot: it may refer to a single land parcel, and a single land parcel divided by a land property boundary line or a linear feature.

Covering a plate: the method can convert different gray color values into different transparencies and apply the transparencies to the image layer where the gray color values are located, so that the transparencies of different parts of the image layer are correspondingly changed, wherein the white color of the mask can represent a selected area, and the black color of the mask can represent a non-selected area.

With the development of satellites and airborne sensors, remote sensing and aerial image and video data are more and more convenient to acquire, and the data are widely applied to the aspects of city planning and the like. Therefore, it is increasingly important to analyze information in remote sensed and aerial images. However, labeling these data consumes a lot of human resources, which greatly increases the utilization cost of the remote sensing data. In addition, when the remote sensing data is applied to different scenes, due to different task requirements, the labeling rules for the data are different, so that the multiplexing of the existing labeled data is extremely difficult.

Predicting the progress of a construction project is an important task in urban planning, and the construction project is generally divided into three categories, namely unmoved work, worked work and worked construction according to the progress of the construction. Compared with field research, the method for predicting the progress of the construction project based on the aerial images has the advantages of convenience in implementation, low labor cost and high real-time performance. Specifically, the designated area in the aerial image is classified by using an image classification model, so that the construction progress of the designated area is obtained.

Convolutional Neural Networks (CNN), a typical deep learning model, are a type of feed-forward Neural network that includes convolution calculations and has a deep structure. With the continuous progress of the deep learning technology, the CNN makes a great breakthrough in the field of image classification, and the classification of aerial images by using the CNN has important academic research value. However, training of the deep neural network requires a large amount of labeled data, and a small data set may cause the deep neural network to have a serious overfitting problem, which may easily result in poor generalization capability of the model. To address this problem, a small sample Learning (FSL) problem is proposed. Small sample learning aims at achieving good generalization performance with only a small amount of data, so small sample learning can help alleviate the difficulty of collecting large-scale supervised data or manual labeling, making deep learning more convenient to use in an industrial environment.

The feature information extracted by the convolutional neural network under the same scene is approximately the same, so that the layer parameters of the shallow layer features such as attention outlines, colors, shapes and the like can be multiplexed. And only gradient back-transmission is carried out on the deep layer and the difference layer and the parameters are updated, and the parameter updating method is called fine tuning (finetune). The fine tuning is one of the most common means for small sample learning, on one hand, a pre-training model with good feature extraction capability is obtained through a large-batch data training model, the generalization capability of the model can be greatly improved by fine tuning the pre-training model on a distribution similar scene with small data volume, and on the other hand, the time and resource cost of large-scale data set training is large, and the fine tuning becomes one of important links for algorithm application. A common trimming method generally comprises the following steps: (1) constructing a large-scale data set; (2) training a model on a large-scale data set to obtain a pre-training model; (3) constructing a small-scale data set with distribution similar to that of the large-scale data set; (4) and (3) freezing partial low-level parameters by using the parameters of the pre-training model as initialization parameters, and training, namely fine-tuning, on a small sample data set by using a smaller learning rate. But the effect of using a fine-tune between two datasets with widely differing distributions is significantly reduced.

The progress situation classification of the construction project has larger distribution difference with the conventional surface feature scene classification, and how to use large-scale conventional surface feature scene classification data to assist the progress classification of the construction project of a small sample is a content worth researching. But currently there is little work in this regard.

In order to solve the above problems, the present application provides an image classification method, which extends training data used for training a model by supplementing auxiliary data, and further achieves the purpose of training the image classification model by using a large amount of training data.

Example 1

There is also provided, in accordance with an embodiment of the present application, an image classification method embodiment, it should be noted that the steps illustrated in the flowchart of the accompanying drawings may be performed in a computer system, such as a set of computer-executable instructions, and that, although a logical order is illustrated in the flowchart, in some cases, the steps illustrated or described may be performed in an order different than here.

The method provided by the embodiment of the application can be executed in a mobile terminal, a computer terminal or a similar operation device. Fig. 1 shows a hardware configuration block diagram of a computer terminal (or mobile device) for implementing the image classification method. As shown in fig. 1, the computer terminal 10 (or mobile device 10) may include one or more (shown as 102a, 102b, … …, 102 n) processors 102 (the processors 102 may include, but are not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA, etc.), a memory 104 for storing data, and a transmission device 106 for communication functions. Besides, the method can also comprise the following steps: a display, an input/output interface (I/O interface), a Universal Serial BUS (USB) port (which may be included as one of the ports of the BUS), a network interface, a power source, and/or a camera. It will be understood by those skilled in the art that the structure shown in fig. 1 is only an illustration and is not intended to limit the structure of the electronic device. For example, the computer terminal 10 may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.

It should be noted that the one or more processors 102 and/or other data processing circuitry described above may be referred to generally herein as "data processing circuitry". The data processing circuitry may be embodied in whole or in part in software, hardware, firmware, or any combination thereof. Further, the data processing circuit may be a single stand-alone processing module, or incorporated in whole or in part into any of the other elements in the computer terminal 10 (or mobile device). As referred to in the embodiments of the application, the data processing circuit acts as a processor control (e.g. selection of a variable resistance termination path connected to the interface).

The memory 104 may be used to store software programs and modules of application software, such as program instructions/data storage devices corresponding to the image classification method in the embodiment of the present application, and the processor 102 executes various functional applications and data processing by running the software programs and modules stored in the memory 104, so as to implement the image classification method described above. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory located remotely from the processor 102, which may be connected to the computer terminal 10 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission device 106 is used for receiving or transmitting data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the computer terminal 10. In one example, the transmission device 106 includes a Network adapter (NIC) that can be connected to other Network devices through a base station to communicate with the internet. In one example, the transmission device 106 can be a Radio Frequency (RF) module, which is used to communicate with the internet in a wireless manner.

The display may be, for example, a touch screen type Liquid Crystal Display (LCD) that may enable a user to interact with a user interface of the computer terminal 10 (or mobile device).

Fig. 1 shows a block diagram of a hardware structure, which may be taken as an exemplary block diagram of the computer terminal 10 (or the mobile device) and also taken as an exemplary block diagram of the server, in an alternative embodiment, the server may be a locally deployed server or a cloud server, and is connected to one or more clients via a data network or electronically. The data network connection may be a local area network connection, a wide area network connection, an internet connection, or other type of data network connection.

In the above operating environment, the present application provides an image classification method as shown in fig. 2. Fig. 2 is a flowchart of an image classification method according to embodiment 1 of the present application. As shown in fig. 2, the method may include the steps of:

step S202, a target image is acquired.

The target image may be a remote sensing image, an aerial image shot by an unmanned aerial vehicle, or a radar image, but is not limited thereto. In the application scene of a construction project, the target image can be a construction image of an unmoved worker, a construction image of an operated worker, an established construction image and other building images; in the field of agriculture and forestry scenes, target images can be uncultivated farmland images, cultivated farmland images and cultivated farmland images; in a disaster application scenario, the target image may be a city image before a disaster, a city image during a disaster, or a city image after a disaster.

In an alternative embodiment, the target image may be captured by a satellite or a drone, transmitted to a server through a network, processed by the server, and displayed to the user, as shown in fig. 3, where the target image may be displayed in an image capturing area; in another alternative embodiment, the target image may be captured by a satellite or a drone, and actively uploaded to a server by a user, and processed by the server, as shown in fig. 4, the user may complete the uploading of the target image to the server by clicking an "upload image" button in an interactive interface, or by dragging the target image directly into a dashed frame, and the image uploaded by the user may be displayed in an image capturing area. The server may be a server deployed locally or a server deployed in the cloud.

And step S204, processing the target image by using the image classification model to obtain a classification result corresponding to the target image.

The image classification model is obtained by adjusting a pre-training model through image samples and target samples, the target samples are preset samples corresponding to the image samples in a preset sample set, and the image classification model is obtained by training the preset sample set.

The image classification model may be a Convolutional Neural Network (CNN), which is a type of feed-forward Neural network that includes convolution calculation and has a deep structure as a typical deep learning model.

In an alternative embodiment, the preset sample set may be extracted from various types of tasks previously processed. Optionally, the field of remote sensing image processing includes many types of tasks, for example, tasks such as change detection, surface feature classification, building extraction, and the like, and the sample suitable for these types of tasks has higher universality and larger sample label amount. In this way, the preset sample set can be created by extracting the image of the dataset in the task, the mask information corresponding to the image, and the label information corresponding to the image to construct a large-scale dataset.

The preset sample set can classify samples for a large number of fine-grained image spots.

In another alternative embodiment, the features of the target image and the features of the samples in the preset sample set may be extracted by using an image classification model, and by comparing the similarity between the features of the target image and the features of the samples, the sample corresponding to the feature with higher similarity may be determined from the preset sample set as the target sample. Furthermore, the target image and the target sample can be mixed, and the characteristic extraction module is corrected by using a smaller learning rate, so that the image classification model with stronger practicability and robustness is obtained.

In another optional embodiment, after the target image and the preset sample set are processed by the image classification model in the server and the target sample corresponding to the target image in the preset sample set is determined, the target sample can be directly displayed to the user for viewing, as shown in fig. 3, the target sample can be displayed in the result feedback area; in another optional embodiment, the target image and the preset sample set are processed in the server through the image classification model, after the target sample corresponding to the target image in the preset sample set is determined, the target sample can be fed back to the client of the user through the network, and is displayed to the user by the client for viewing, as shown in fig. 4, the target sample can be displayed in the result feedback area. Further, after the target sample is displayed to the user, whether the target sample is correct or not can be verified through user feedback, if the user considers that the target sample is incorrect, the target sample can be selected from a preset sample set for feedback, as shown in fig. 3 and 4, the user can feed back the correct target sample in a result feedback area and upload the correct target sample to the server, and therefore the server can correct the image classification model according to the user feedback and the target image.

For example, taking the construction process of a building construction project as an example for explanation, after a satellite, an unmanned aerial vehicle or a radar acquires a construction image of an unmoved worker, a construction image of a moved worker or an established construction image, the construction images of each stage can be directly sent to a server to determine a target sample, or can be transmitted to a client for selection by a user, and the construction images of the target sample to be determined are uploaded to the server. After the server acquires the construction image, the server can utilize the pre-training model to process the construction image and the preset sample set, determine a target sample corresponding to the construction image, wherein the similarity between the target sample and the construction image is greater than a certain value, and meanwhile, the pre-training model can be adjusted by utilizing the construction image and the target sample to obtain an image classification model, so that the accuracy of the image classification model for extracting the characteristics of the construction image is improved. After the target sample is obtained, the server can directly display the target sample to a user for checking, or the server sends the target sample to the client for displaying to the user for checking, so that the user can see whether the target sample is a sample with high similarity to the construction image, and can optimize the image classification model through a feedback result of the user, thereby improving the performance of the server.

For another example, taking a farming process of an agriculture and forestry scene as an example for explanation, after a satellite, an unmanned aerial vehicle or a radar collects an uncultivated farmland image, a farming image under farming or a farming image after farming is completed, the farming images of each stage can be directly sent to a server to determine a target sample, or can be transmitted to a client, selected by a user, and uploaded to the server with the farming images of the target sample to be determined. After the server acquires the farming images, the farming images and the preset sample set can be processed by using the pre-training model to determine target samples corresponding to the farming images, the similarity between the target samples and the farming images is greater than a certain value, and meanwhile, the pre-training model can be adjusted by using the farming images and the target samples to obtain an image classification model so as to improve the accuracy of the image classification model for feature extraction of the farming images. After the target sample is obtained, the server can directly display the target sample to a user for checking, or the server sends the target sample to the client for displaying by the client for checking, so that the user can see whether the target sample is a sample with high similarity to a cultivated image, and the image classification model can be optimized through a feedback result of the user, thereby improving the performance of the server.

For example, after the satellite, the unmanned aerial vehicle, or the radar acquires the city image before the disaster, the city image in the disaster, or the city image after the disaster, the city image at each stage may be directly transmitted to the server to determine the target sample, or may be transmitted to the client, selected by the user, and the city image of the target sample to be determined is uploaded to the server. After the server acquires the urban image, the urban image and the preset sample set can be processed by using the pre-training model, the target sample corresponding to the urban image is determined, the similarity between the target sample and the urban image is greater than a certain value, and meanwhile, the pre-training model can be adjusted by using the urban image and the target sample to obtain an image classification model, so that the accuracy of the image classification model for extracting urban image features is improved. After the target sample is obtained, the server can directly display the target sample to a user for checking, or the server sends the target sample to the client for displaying to the user for checking, so that the user can see whether the target sample is a sample with high similarity to the urban image, and can optimize the image classification model through a feedback result of the user, thereby improving the performance of the server.

According to the scheme provided by the embodiment of the application, after the target image is obtained, the image classification model is used for processing the target image to obtain the classification result corresponding to the target image, and the purpose of improving the accuracy of the image classification model is achieved. It is easy to note that, because the target sample is obtained from the preset sample set based on the target image, and the similarity between the target sample and the target image is high, the target sample can be used as the extended data of the training image classification model, so that the data volume for training the image classification model is large enough, and therefore, the accuracy of the image classification model obtained by training is high, and the classification accuracy of the target image can be improved, thereby solving the technical problem that the accuracy of image classification is low due to the small data sample volume and the low accuracy of the image classification model in the related technology.

In the above embodiment of the present application, the method further includes: processing a target image and a plurality of preset samples contained in a preset sample set by using an image classification model to obtain a first characteristic of the target image and second characteristics of the plurality of preset samples; and determining the target sample based on the similarity of the first characteristic and the plurality of second characteristics.

Optionally, determining the target sample based on the similarity between the first feature and the plurality of second features includes: obtaining the similarity of the first characteristic and each second characteristic to obtain a plurality of similarities; and determining a preset sample corresponding to the second feature with the similarity greater than the preset similarity as a target sample.

In an alternative embodiment, in order to obtain a target sample most similar to the target image, the cosine similarity may be used as a metric, and the target sample may be determined by calculating the cosine similarity between the first feature of the target image and the second features of the plurality of preset samples. The remaining chord similarity can be calculated as follows:

wherein d is_ABMay represent a cosine similarity between the first feature and the second feature, a may be the first feature and B may be the second feature.

For example, a data empty set that needs to be supplemented may be set, S { }, and according to the cosine similarity obtained above, for the target image x_iThe method can select 200 samples from a preset sample set and a preset sample S_iGiven the status label of its target image as I_piAdded to the supplementary data set, i.e. S ═ S + S_iAnd correcting the pre-training model through the S to improve the accuracy of the image classification model.

In the above embodiment of the present application, the method further includes: and correcting the network parameters of the target layer of the pre-training model by using the target image and the target sample, wherein the network parameters of other layers except the target layer in the pre-training model are kept unchanged.

The target layer may be a fully connected layer for feature recognition, or may be a residual block.

In an optional embodiment, the target image and the target sample may be mixed to obtain mixed training data, the training data is used to train a target layer in the feature extraction network, and the network parameters of the target layer are updated, so that the purpose of correcting the network parameters of the target layer is achieved, and the accuracy of the image classification model is improved.

In the above embodiment of the present application, the method further includes: obtaining a preset sample set, wherein the preset sample set comprises: the method comprises the following steps that a plurality of preset samples are obtained, wherein each preset sample corresponds to a mask and a category label, the mask is used for representing a target area in the corresponding preset sample, and the category label is a category label of the target area; performing feature extraction on a plurality of preset samples and masks corresponding to the plurality of preset samples by using an initial classification model, and performing prediction processing on fusion features corresponding to the plurality of preset samples to obtain class identification results corresponding to the plurality of preset samples, wherein the fusion features are obtained by fusing third features of the corresponding preset samples and fourth features of the corresponding masks, and the class identification results are class identification results of a target area; obtaining a loss value of the initial classification model based on the class identification result and the class label corresponding to the plurality of preset samples; and adjusting the initial classification model based on the comparison result of the loss value and the preset loss value to obtain a pre-training model.

The mask can convert different gray color values into different transparencies and apply the transparencies to the layer where the mask is located, so that the transparencies of different parts of the layer are correspondingly changed, wherein the white color of the mask can represent a selected area, and the black color of the mask can represent a non-selected area. The mask described above may be a single-channel mask.

In an optional embodiment, an image corresponding to the target task in the preset sample may be selected through a mask, for example, if the target task is to obtain the construction progress, an image corresponding to the construction area in the preset sample may be selected through the mask, that is, information of the construction area is represented by the mask.

In another optional embodiment, a mask may be combined to identify a plurality of preset samples through an initial classification model, so as to obtain class identification results corresponding to the plurality of preset samples, a loss value of the initial classification model is obtained by comparing the class identification results with class labels, whether the initial classification model has high accuracy or not may be determined through the loss value, if the loss value is less than a certain value (i.e., the preset loss value), the initial classification model has high accuracy, and at this time, the initial classification model does not need to be trained continuously, so as to obtain a pre-training model; if the loss value is greater than a certain value, it indicates that the initial classification model has low accuracy, and at this time, the initial classification model needs to be trained continuously to adjust the network parameters of the initial classification model.

In the above embodiment of the present application, the method further includes: processing each preset sample by utilizing the first two residual blocks in the image classification model to obtain the sample characteristics of each preset sample; processing the mask corresponding to each preset sample by utilizing the convolution layer in the image classification model to obtain the mask characteristic of each preset sample; superposing the sample characteristics and the mask characteristics to obtain first image characteristics of each preset sample; processing the first image characteristics by utilizing the last two residual blocks in the image classification model to obtain second image characteristics of each preset sample; and processing the second image characteristics by using an output layer in the image classification model to obtain a category identification result of each preset sample.

In an optional embodiment, in order to enable information corresponding to the target task to be fused into the image classification model, each preset sample may be processed by using the first two residual blocks in the image classification model to obtain the feature of each preset sample, so that the mask feature corresponding to the target task is fused at a later stage; then, processing the mask corresponding to each preset sample by utilizing the convolution layer in the image classification model, specifically, adjusting the corresponding mask according to the characteristic size of the preset sample to obtain the mask characteristic of each preset sample; and finally, the sample features and the mask features can be superposed to obtain first image features of each preset sample, wherein the first image features are features containing target task information, global pooling operation can be performed on the first image features by utilizing the last two residual errors in the image classification model, so that one layer of full connection is mapped into a class space, and a classifier (softmax) is used for obtaining the probability corresponding to each class feature, namely the class identification result.

The target task may be to obtain construction area information, and the first image feature may include the construction area information.

For example, in order to fuse the construction area information into the feature extraction network, a mask information fusion branch can be added on the basis of a residual convolutional neural network (ResNeSt50), firstly, a nearest neighbor method is used for aligning the mask size with the output feature of a second residual block, then, a convolution operation is used for expanding the mask of a single channel to 512 dimensions to obtain a mask feature, and finally, the mask feature and the sample feature are added and input into a subsequent network.

As shown in fig. 5, improvements may be made on the basis of resenestt 50. In order to integrate the construction area information into the network, a single-channel mask can be used for representing the construction area information, the nearest neighbor difference value is used for adjusting the mask to be the same as the output feature of the second residual block in size, then a layer of convolution operation with the convolution kernel size of 3 x 3 is used for mapping the mask of a single channel to 512 channels, the obtained mask feature and the sample feature are added to obtain a second image feature containing the area information, the obtained second image feature is sent to the subsequent operation, then the image feature obtained by the fourth residual block is subjected to the global pooling operation, a layer of full connection is used for mapping to a category space, and softmax is used for obtaining the probability corresponding to each category, namely the category identification result.

In the above embodiment of the present application, before the mask corresponding to the plurality of preset samples and the plurality of preset samples is processed by using the image classification model to obtain the category identification results corresponding to the plurality of preset samples, the method further includes: adjusting the size of each preset sample to be a first preset size by utilizing a bilinear interpolation algorithm; and adjusting the size of the mask corresponding to each preset sample to be a second preset size by using a nearest neighbor difference algorithm, wherein the second preset size is the same as the size of the sample feature of each preset sample.

The first preset size and the second preset size may be set by themselves, for example, the first preset size may be set to 600 × 600, and the second preset size may be set to 600 × 600.

In an alternative embodiment, bilinear interpolation may be used to adjust the size of the preset samples to 600 × 600, and correspondingly, the mask size is also adjusted to 600 × 600 using the nearest neighbor difference, and the feature extraction network is trained using the resized preset samples and the resized mask.

In another alternative embodiment, the preset samples may be divided into a training set and a test set according to a ratio of 4:1, where the training set is used to train the feature extraction network, and the test set is used to test the feature extraction network. Specifically, in the process of training the feature extraction network by using the training set, the parameters of the network can be adjusted by adopting a cross entropy loss function, so that the accuracy of classification of the parameters of the image classification model can be higher when the parameters of the image classification model are verified, and a better image classification model can be obtained.

For example, a preset sample may be first divided into a training set and a test set according to a ratio of 4:1, and model parameters are adjusted according to the performance of the image classification model on the test set, so as to obtain a better image classification model. Because the spot classification data and the construction project data belong to remote sensing data, the extraction capability of the image classification model obtained from the spot classification data on shallow features can be well generalized to the construction project data.

In the above embodiments of the present application, obtaining the preset sample set includes: determining a target size of a target image; and screening a preset sample set from the data set based on the target size, wherein the size of a plurality of preset samples contained in the preset sample set is the same as the target size.

In an optional embodiment, the target size of the target image is determined, the preset sample set is obtained by screening from the data set, the obtained preset sample set and the target image can keep similarity, the data volume of feature similarity in subsequent calculation can be reduced, and the image classification model is corrected through the preset sample set, so that the obtained image classification model has higher accuracy.

Because there are a sufficient number of classes in the dataset, the features extracted by the image classification model obtained in the dataset contain information that distinguishes most of the classes in the field. A plurality of preset samples, i.e., a preset sample set, corresponding to features having the same size as the features of the target image in the data set may be extracted using the convolution of the image classification model and the global pooling layer, wherein the preset samples in the preset sample set are mapped to the target image as 2048-dimensional vectors.

Illustratively, the target image may be represented as P ═ { P ═ P₁,p₂,…,p_nIn which p is_i＝{I_pi,M_pi,l_piContains image data I_piMask data M_piAnd a category label l_pi. Using image classification model F (.)And a preset sample set extraction feature, x_i＝F(p_i)，y_j＝F(a_j) Wherein x is_iAnd y_jAre vectors of 2048 dimensions.

In the above embodiment of the present application, the method further includes: outputting a target sample; receiving a feedback result of the target sample, wherein the feedback result is obtained by modifying the target sample; and correcting the image classification model by using the target image and the feedback result.

In an optional embodiment, since the selection result of the target sample may affect the accuracy of the final image classification model, in order to ensure the accuracy of the image classification model, in an optional embodiment, the server may directly display the target sample for the user to view, that is, display the target sample on the interactive interface, and in another optional embodiment, the server may issue the target sample to the client via the network, and the client displays the target sample for the user to view, that is, display the target sample on the interactive interface. Further, the target sample can be confirmed by a user, and if the user confirms that the target sample is correct, the image classification model can be directly corrected based on the target sample and the target image; if the user confirms that the error exists, the user can adjust the target sample on the interactive interface to obtain a corresponding adjustment result, and the adjustment result is fed back to the server, so that the image classification model of the target sample and the target image can be corrected based on the adjustment result, the image classification model can be optimized according to the adjustment result, and the performance of the server is improved.

In the above embodiment of the present application, the method further includes: displaying a plurality of task types; receiving a target task type selected from a plurality of task types; displaying a pre-training model corresponding to the target task type; and receiving an image sample, and adjusting the pre-training model by adopting the image sample and the target sample to obtain an image classification model corresponding to the target task type.

The plurality of task types in the above steps are respectively for different application scenarios, such as, but not limited to, a terrain classification task, a building detection task, a building extraction task, a building change detection task, and the like.

As an optional embodiment, task types of multiple application scenarios may be displayed on an interactive interface, a user may operate on the interactive interface, one task type is selected from the multiple task types as a target task type, a server may construct a corresponding model based on a selection of the user, or a corresponding model is searched based on a selection of the user and is displayed to the user for viewing, an image sample provided by the user is further received, the image sample and the target sample are finally fused to obtain a final training sample, and a network parameter of a pre-training model is adjusted through the training sample to obtain an image classification model corresponding to the target task type.

By the scheme, the model building application under the specific scene is provided for the outside, a user can select the used model training mode, data under the corresponding scene is provided for training the model, and the trained model is output and fed back to the user.

A preferred embodiment of the present application is described in detail below with reference to fig. 6, and the method may be performed by a computer terminal or a server.

Step S601, introducing a fine-grained pattern spot classification sample with sufficient sample size as auxiliary data.

The fine-grained pattern classification sample can be a preset sample set.

Optionally, the size distribution of construction project pattern spots may be counted, pattern spot samples with similar sizes are selected from large-scale ground feature classification, change detection and building extraction data sets, synonyms between different data sets are merged to obtain 34 categories, and auxiliary data a ═ is₁,a₂,…,a_nIn which a is_i＝{I_ai,M_ai,l_aiContains image data I_aiMask data M_aiAnd a category label l_ai。

And step S602, building a deep neural network capable of fusing construction area information and image information.

As shown in fig. 5, an improvement can be made on the basis of resenestt. In order to merge the construction region information into the image classification network, the construction region information can be represented by using a single-channel mask, the mask is adjusted to the same size as the output feature of the second residual block by using the nearest neighbor difference value, then the mask of a single channel is mapped to 512 channels by using the convolution operation with the convolution kernel size of 3 x 3, the obtained mask feature and the image feature are added to obtain the image feature containing the region information, and the obtained feature is sent to the subsequent operation. And performing global pooling operation on the image features obtained by the fourth residual block, mapping the image features to a category space by using a layer of full connection, and obtaining the probability corresponding to each category by using softmax.

And step S603, pre-training the network by using the auxiliary data to obtain a pre-training model.

The pre-training model described above may be an image classification model.

Optionally, the deep neural network built in step S602 may be pre-trained by using the fine-grained speckle classification samples in step S601. The image size is first adjusted to 600 x 600 using bilinear interpolation, and correspondingly, the mask size is also adjusted to 600 x 600 using nearest neighbor differences, and the auxiliary data a is divided into training and test sets in a 4:1 ratio. And training the network by using a training set, and adopting a cross entropy loss function. And adjusting the hyperparameters of the model so that the classification accuracy of the model on the verification set is higher, and further obtaining a pre-training model F ().

And step S604, extracting the characteristics of the construction project sample and the fine-grained pattern classification sample by using the pre-training model.

Optionally, the construction project data is expressed as P ═ { P ═ P₁,p₂,…,p_nIn which p is_i＝{I_pi,M_pi,l_piContains image data I_piMask data M_piAnd a category label l_pi. Extracting features, namely x, from the construction project data P and the pattern spot classification data A by using a model F_i＝F(p_i)，y_j＝F(a_j) Wherein x is_iAnd y_jAre vectors of 2048 dimensions.

And step S605, calculating cosine similarity between the characteristics of the construction project sample and the fine-grained pattern spot classification sample.

Alternatively, for any pair { x_i,y_jAnd the calculation method of the similarity of the other chords comprises the following steps:

and step S606, selecting a fine-grained pattern spot classification sample most similar to the construction project sample as supplementary data.

Optionally, a supplementary data empty set may be set, where S { } is set, and the cosine similarity D calculated in step S505 is used to determine the sample feature x of each construction item_iWe select 200 samples S most similar to the image spot classification data_iGiven its construction status label as I_piAdded to the supplementary data set, i.e. S ═ S + S_i。

And step S607, correcting the pre-training model by using the construction project sample and the supplementary data.

Optionally, the supplementary data can maintain approximate data distribution while expanding the construction project data, so that the supplementary data and the original construction project data can be mixed, the first 3 residual blocks of the model are fixed by using the pre-training model obtained in step S503, the fourth residual block and the full connection layer are finely adjusted by using a smaller learning rate, and finally, the construction project progress prediction model with stronger practicability and robustness is obtained.

Through the steps, complementary data can be selected according to the characteristic distance of the sample by using a fine-tuning method in small sample learning, the problem that the data distribution is different is solved, the regional range information of the construction project is used, a convolutional neural network structure fusing regional information and image information is provided, and the classification accuracy rate on the task of predicting the progress of the construction project is improved.

The effects of the present invention can be further illustrated by the following experiments:

the experimental conditions are as follows: in the experiment, a scene data set is adopted, a pytorch (deep learning framework) and a GPU (Graphics Processing Unit) are adopted to be configured into an NVIDIA Tesla P100.

The experimental contents are as follows: the experiment utilizes 300 construction project pattern spot data and 31,436 fine-grained pattern spot classification data. In order to accurately evaluate the classification accuracy of the model in the case of a small sample, a five-fold cross validation method is used as an evaluation criterion. The method comprises the steps of randomly dividing 300 construction project pattern spot data into 5 equal parts, sequentially selecting one set as a verification set, using the other sets as training sets, and taking the average classification precision of five models as an evaluation standard.

The comparison method comprises the following steps: to verify the superiority of the proposed method, the following four comparison methods are provided:

the method 1, training a classification model by directly using construction project pattern spot data;

the method 2 is characterized in that a model is pre-trained by using fine-grained pattern spot classification data, and only a construction project pattern spot data fine-tuning model is used;

the method 3, using the fine-grained pattern classification data to pre-train the model, directly using the data matched with the construction project category in the fine-grained pattern data as the supplementary data, and finely tuning the model;

method 4, remove mask convolution, training procedure consistent with the method presented herein.

The experimental results are as follows:

the average classification accuracy (meanOA) of the five-fold cross validation can be used as an evaluation index, and the specific results are shown in table 1 below:

TABLE 1

As can be seen from the table, the method can be well applied to the task of predicting the construction project state, and has higher classification precision compared with a comparison method.

The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are also included in the scope of the present invention.

Example 2

There is also provided, in accordance with an embodiment of the present application, a method of image classification, noting that the steps illustrated in the flowchart of the figure may be performed in a computer system, such as a set of computer-executable instructions, and that, while a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be performed in an order different than here.

Fig. 7 is a flowchart of an image classification method according to embodiment 2 of the present application. As shown in fig. 7, the method may include the steps of:

step S702, a building image is acquired.

Step S704, the building image is processed by using the image classification model, so as to obtain a classification result of the building included in the building image.

The image classification model is obtained by adjusting a pre-training model through an image sample and a target sample, the target sample is a spot classification sample corresponding to the image sample in a spot classification sample set, and the pre-training model is obtained by training the spot classification sample set.

In the above embodiment of the present application, the method further includes: determining a marking mode of the building based on the classification result; marking a building according to a marking mode to obtain a first marking image; the first marker image is displayed.

The marking manner in the above steps may be, but is not limited to, colors, lines, etc. of different areas in the image.

In an optional embodiment, in order to facilitate the user to check the classification result of the buildings, different area division modes can be adopted for marking according to different types of buildings, and the marking is displayed for the user to check, for example, the area where the building of the non-construction type is located can be green; the area of the building where the construction category is completed may be blue, but is not limited thereto.

In the above embodiment of the present application, the method further includes: marking the building based on the classification result to obtain a second marked image; the second marker image is displayed.

In an alternative embodiment, the classification results may be marked alongside the buildings in the building image in order to facilitate the user in determining the categories of the different buildings.

In the above embodiment of the present application, the method further includes: displaying the classification result; receiving feedback information corresponding to the classification result, wherein the feedback information is obtained by modifying the classification result; the image classification model is adjusted based on the feedback information.

In an optional embodiment, the classification result may be displayed in an interactive interface, so that a user may view the classification result of each building, further, the user may confirm the classification result, if the user finds that the classification result is wrong, the classification result may be modified on the interactive interface, and feedback information is uploaded to a server, and the server performs model optimization according to the feedback of the user, thereby further improving the accuracy of the image classification model.

In the above embodiment of the present application, the method further includes: processing a plurality of pattern spot classification samples contained in the building image and pattern spot classification sample set by using an image classification model to obtain a first characteristic of the building image and a second characteristic of the plurality of pattern spot classification samples; and determining the target sample based on the similarity of the first characteristic and the plurality of second characteristics.

Optionally, determining the target sample based on the similarity between the first feature and the plurality of second features includes: obtaining the similarity of the first characteristic and each second characteristic to obtain a plurality of similarities; and determining the pattern spot classification sample corresponding to the second feature with the similarity greater than the preset similarity as a target sample.

Optionally, the method further includes: and correcting the network parameters of the target layer of the pre-trained model by using the building image and the target sample, wherein the network parameters of other layers except the target layer in the pre-trained model are kept unchanged.

Optionally, the method further includes: acquiring a pattern spot classification sample set, wherein the pattern spot classification sample set comprises: the method comprises the following steps that a plurality of preset samples are obtained, wherein each preset sample corresponds to a mask and a category label, the mask is used for representing a target area in the corresponding preset sample, and the category label is a category label of the target area; performing feature extraction on a plurality of preset samples and masks corresponding to the plurality of preset samples by using an initial classification model, and performing prediction processing on fusion features corresponding to the plurality of preset samples to obtain class identification results corresponding to the plurality of preset samples, wherein the fusion features are obtained by fusing third features of the corresponding preset samples and fourth features of the corresponding masks, and the class identification results are class identification results of a target area; obtaining a loss value of the initial classification model based on the class identification result and the class label corresponding to the plurality of preset samples; and adjusting the initial classification model based on the comparison result of the loss value and the preset loss value to obtain a pre-training model.

Optionally, the method further includes: outputting a target sample; receiving a feedback result of the target sample, wherein the feedback result is obtained by modifying the target sample; and correcting the image classification model by using the building image and the feedback result.

It should be noted that the preferred embodiments described in the above examples of the present application are the same as the schemes, application scenarios, and implementation procedures provided in example 1, but are not limited to the schemes provided in example 1.

Example 3

There is also provided, in accordance with an embodiment of the present application, a method of image classification, noting that the steps illustrated in the flowchart of the figure may be performed in a computer system, such as a set of computer-executable instructions, and that, while a logical order is illustrated in the flowchart, in some cases, the steps illustrated or described may be performed in an order different than here.

Fig. 8 is a flowchart of an image classification method according to embodiment 3 of the present application. As shown in fig. 8, the method may include the steps of:

step S802, the cloud server receives the target image uploaded by the client.

Step S804, the cloud server processes the target image by using the image classification model to obtain a classification result corresponding to the target image.

The image classification model is obtained by adjusting the pre-training model through image samples and target samples, the target samples are preset samples corresponding to the image samples in the preset sample set, and the image classification model is obtained through training of the preset sample set.

Step S806, the cloud server feeds back the classification result to the client.

Optionally, the method further includes: the cloud server processes the target image and a plurality of preset samples contained in a preset sample set by using an image classification model to obtain a first characteristic of the target image and second characteristics of the plurality of preset samples; the cloud server determines a target sample based on the similarity of the first feature and the plurality of second features.

Optionally, the determining, by the cloud server, the target sample based on the similarity between the first feature and the plurality of second features includes: the cloud server acquires the similarity of the first characteristic and each second characteristic to obtain a plurality of similarities; and the cloud server determines a preset sample corresponding to the second feature with the similarity greater than the preset similarity as a target sample.

Optionally, the method further includes: and the cloud server corrects the network parameters of the target layer of the pre-training model by using the target image and the target sample, wherein the network parameters of other layers except the target layer in the pre-training model are kept unchanged.

Optionally, the method further includes: the cloud server acquires a preset sample set, wherein the preset sample set comprises: the method comprises the following steps that a plurality of preset samples are obtained, wherein each preset sample corresponds to a mask and a category label, the mask is used for representing a target area in the corresponding preset sample, and the category label is a category label of the target area; the cloud server extracts features of a plurality of preset samples and masks corresponding to the preset samples by using an initial classification model, and performs prediction processing on fusion features corresponding to the preset samples to obtain class identification results corresponding to the preset samples, wherein the fusion features are obtained by fusing third features of the corresponding preset samples and fourth features of the corresponding masks, and the class identification results are class identification results of a target area; the cloud server obtains a loss value of the initial classification model based on the class identification results and the class labels corresponding to the plurality of preset samples; and the cloud server adjusts the initial classification model based on the comparison result of the loss value and the preset loss value to obtain a pre-training model.

Optionally, the method further includes: the cloud server outputs a plurality of task types; the method comprises the steps that a cloud server receives a target task type selected from a plurality of task types; the cloud server outputs a pre-training model corresponding to the target task type; and the cloud server receives the image sample, and adjusts the pre-training model by adopting the image sample and the target sample to obtain an image classification model corresponding to the target task type.

Example 4

Fig. 9 is a flowchart of an image classification method according to embodiment 4 of the present application. As shown in fig. 9, the method may include the steps of:

step S902, the cloud server receives the model training request uploaded by the client.

Step S904, the cloud server obtains a target image corresponding to the model training request.

Step S906, the cloud server processes the target image and the preset sample set by using the pre-training model, and determines a target sample corresponding to the target image in the preset sample set.

The pre-training model is obtained through training of a preset sample set.

Step S908, the cloud server adjusts the pre-training model by using the target image and the target sample, so as to obtain an image classification model.

In step S910, the cloud server feeds back the image classification model to the client.

Optionally, the processing, by the cloud server, the target image and the preset sample set by using the image classification model, and determining the target sample corresponding to the target image in the preset sample set includes: the cloud server processes the target image and a plurality of preset samples contained in a preset sample set by using an image classification model to obtain a first characteristic of the target image and second characteristics of the plurality of preset samples; the cloud server determines a target sample based on the similarity of the first feature and the plurality of second features.

Example 5

According to an embodiment of the present application, there is also provided an image classification apparatus for implementing the above image classification method, as shown in fig. 10, the apparatus 1000 includes: a first obtaining module 1002 and a first processing module 1004.

The first acquisition module is used for acquiring a target image; the first processing module is used for processing the target image by using the image classification model to obtain a classification result corresponding to the target image, wherein the image classification model is obtained by adjusting a pre-training model through an image sample and a target sample, the target sample is a preset sample corresponding to the image sample in a preset sample set, and the pre-training model is obtained by training through the preset sample set.

It should be noted here that the first acquiring module 1002 and the first processing module 1004 correspond to steps S202 to S204 in embodiment 1, and the two modules are the same as the corresponding steps in the implementation example and application scenario, but are not limited to the disclosure in embodiment 1. It should be noted that the above modules may be operated in the computer terminal 10 provided in embodiment 1 as a part of the apparatus.

In the above embodiment of the present application, the apparatus further includes: the device comprises a first processing unit and a first determining unit.

The first processing unit is used for processing a target image and a plurality of preset samples contained in a preset sample set by using an image classification model to obtain a first characteristic of the target image and second characteristics of the plurality of preset samples; the first determining unit is used for determining the target sample based on the similarity of the first characteristic and the plurality of second characteristics.

In the foregoing embodiment of the present application, the first determining unit is further configured to obtain a similarity between the first feature and each of the second features, obtain multiple similarities, and determine, as the target sample, the preset sample corresponding to the second feature with the similarity greater than the preset similarity.

In the above embodiment of the present application, the apparatus further includes: a first correcting unit.

The first correcting unit is used for correcting the network parameters of the target layer of the pre-training model by using the target image and the target sample, wherein the network parameters of other layers except the target layer in the pre-training model are kept unchanged.

In the above embodiment of the present application, the apparatus further includes: the device comprises a second acquisition module, a second processing module, a loss module and a judgment module.

The second obtaining module is configured to obtain a preset sample set, where the preset sample set includes: the method comprises the following steps that a plurality of preset samples are obtained, wherein each preset sample corresponds to a mask and a category label, the mask is used for representing a target area in the corresponding preset sample, and the category label is a category label of the target area; the second processing module is used for extracting the characteristics of the masks corresponding to the preset samples and the initial classification model, predicting the fusion characteristics corresponding to the preset samples and obtaining the category identification results corresponding to the preset samples, wherein the fusion characteristics are obtained by fusing the third characteristics of the corresponding preset samples and the fourth characteristics of the corresponding masks, and the category identification results are the category identification results of the target area; the loss module is used for obtaining a loss value of the initial classification model based on the class identification result and the class label corresponding to the plurality of preset samples; the judgment module is used for adjusting the initial classification model based on the comparison result of the loss value and the preset loss value to obtain a pre-training model.

In the above embodiments of the present application, the second processing module includes: the device comprises a second processing unit, a third processing unit, a superposition unit, a fourth processing unit and a fifth processing unit.

The second processing unit is used for processing each preset sample by using the first two residual blocks in the image classification model to obtain the sample characteristics of each preset sample; the third processing unit is used for processing the mask corresponding to each preset sample by using the convolution layer in the image classification model to obtain the mask characteristic of each preset sample; the superposition unit is used for superposing the sample characteristics and the mask characteristics to obtain first image characteristics of each preset sample; the fourth processing unit is used for processing the first image characteristics by utilizing the last two residual blocks in the image classification model to obtain second image characteristics of each preset sample; the fifth processing unit is used for processing the second image characteristics by using an output layer in the image classification model to obtain a category identification result of each preset sample.

In the above embodiment of the present application, the apparatus further includes: the device comprises a first adjusting unit and a second adjusting unit.

The first adjusting unit is used for adjusting the size of each preset sample to be a first preset size by utilizing a bilinear interpolation algorithm; the second adjusting unit is used for adjusting the size of the mask corresponding to each preset sample to a second preset size by using a nearest neighbor difference algorithm, wherein the second preset size is the same as the size of the sample feature of each preset sample.

In the above embodiments of the present application, the second obtaining module includes: the second determining unit and the first screening unit.

The second determining unit is used for determining the target size of the target image; the first screening unit is used for screening a preset sample set from the data set based on the target size, wherein the size of a plurality of preset samples contained in the preset sample set is the same as the target size.

In the above embodiment of the present application, the apparatus further includes: the device comprises a first output module, a first receiving module and a first correcting module.

The first output module is used for outputting a target sample; the first receiving module is used for receiving a feedback result of the target sample, wherein the feedback result is obtained by modifying the target sample; the first correction module is used for correcting the image classification model by using the target image and the feedback result.

In the above embodiment of the present application, the apparatus further includes: the display device comprises a first display module, a second receiving module, a second display module and a second correction module.

The first display module is used for displaying a plurality of task types; the second receiving module is used for receiving a target task type selected from the plurality of task types; the second display module is used for displaying the pre-training model corresponding to the target task type; and the second correction module is used for receiving the image sample, and adjusting the pre-training model by adopting the image sample and the target sample to obtain an image classification model corresponding to the target task type.

Example 6

According to an embodiment of the present application, there is also provided an image classification apparatus for implementing the above-mentioned image classification method, as shown in fig. 11, the apparatus 1100 includes: a third obtaining module 1102 and a third processing module 1104.

The third acquisition module is used for acquiring a building image; and the third processing module is used for processing the building image by using the image classification model to obtain a classification result of the building contained in the building image.

It should be noted here that the third acquiring module 1102 and the third processing module 1104 correspond to steps S702 to S704 in embodiment 2, and the two modules are the same as the corresponding steps in the implementation example and the application scenario, but are not limited to the disclosure in embodiment 2. It should be noted that the above modules may be operated in the computer terminal 10 provided in embodiment 1 as a part of the apparatus.

In the above embodiment of the present application, the apparatus further includes: the device comprises a first determination module, a first marking module and a third display module.

The first determination module is used for determining the marking mode of the building based on the classification result; the first marking module is used for marking the building according to a marking mode to obtain a first marking image; the third display module is used for displaying the first mark image.

In the above embodiment of the present application, the apparatus further includes: the second marking module and the fourth display module.

The second marking module is used for marking the building based on the classification result to obtain a second marking image; and the fourth display module is used for displaying the second mark image.

In the above embodiment of the present application, the apparatus further includes: the device comprises a fifth display module, a third receiving module and a third correction module.

The fifth display module is used for displaying the classification result; the third receiving module is used for receiving feedback information corresponding to the classification result, wherein the feedback information is obtained by modifying the classification result; and the third correction module is used for adjusting the image classification model based on the feedback information.

Example 7

According to an embodiment of the present application, there is also provided an image classification apparatus for implementing the above-mentioned image classification method, as shown in fig. 12, the apparatus 1200 includes: a first upload module 1202, a fourth processing module 1204, and a first feedback module 1206.

The first uploading module is used for receiving a target image uploaded by a client; the fourth processing module is used for processing the target image by using the image classification model to obtain a classification result corresponding to the target image, wherein the image classification model is obtained by adjusting the pre-training model through an image sample and the target sample, the target sample is a preset sample corresponding to the image sample in the preset sample set, and the image classification model is obtained by training the preset sample set; the first feedback module is used for feeding back the image classification model to the client.

It should be noted that the first upload module 1202, the fourth processing module 1204, and the first feedback module 1206 correspond to steps S802 to S806 in embodiment 3, and the three modules are the same as the corresponding steps in the implementation example and application scenario, but are not limited to the disclosure in embodiment 3. It should be noted that the above modules may be operated in the computer terminal 10 provided in embodiment 1 as a part of the apparatus.

Example 8

According to an embodiment of the present application, there is also provided an image classification apparatus for implementing the above-mentioned image classification method, as shown in fig. 13, the apparatus 1300 includes: a second uploading module 1302, a fourth obtaining module 1304, a fifth processing module 1306, a fourth correcting module 1308, and a second feedback module 1310.

The second uploading module is used for the cloud server to receive the model training request uploaded by the client; the fourth acquisition module is used for acquiring a target image corresponding to the model training request; the fifth processing module is used for processing the target image and the preset sample set by using the pre-training model and determining a target sample corresponding to the target image in the preset sample set, wherein the pre-training model is obtained by training the preset sample set; the fourth correction module is used for adjusting the pre-training model by using the target image and the target sample to obtain an image classification model; the second feedback module is used for feeding back the image classification model to the client.

It should be noted that the second uploading module 1302, the fourth obtaining module 1304, the fifth processing module 1306, the fourth correcting module 1308, and the second feedback module 1310 correspond to steps S902 to S910 in embodiment 4, and the five modules are the same as the corresponding steps in the implementation example and the application scenario, but are not limited to the disclosure in embodiment 4. It should be noted that the above modules may be operated in the computer terminal 10 provided in embodiment 1 as a part of the apparatus.

Example 9

Fig. 14 is a flowchart of an image classification method according to embodiment 9 of the present application. As shown in fig. 14, the method may include the steps of:

step S1402, a natural resource image is acquired.

Step S1404, processing the natural resource image by using the ground feature classification model to obtain the ground feature type corresponding to the natural resource image.

The ground feature classification model is obtained by adjusting a pre-training model through image samples and target samples, the target samples are preset samples corresponding to the image samples in a preset sample set, and the pre-training model is obtained by training the preset sample set.

It should be noted that the embodiment of the present invention may be, but not limited to, applied to a ground feature classification practical application scenario, and is configured to analyze the natural resource image by using a ground feature classification model to obtain a ground feature type corresponding to the natural resource image. Such as cultivated land, urban areas, water areas, sea areas, mineral products, forest vegetation, deserts, tropical rain forests, and the like.

For example, it can also be applied to the following technical fields: the meteorological field (e.g., cloud extraction, weather forecasting, weather forewarning, etc.); natural resource and ecological environment fields (e.g., weather forecast, change detection, ecological redline change detection, multi-classification change detection, ground feature classification, greenhouse extraction, road network extraction, building change detection (satellite, unmanned aerial vehicle), etc.), water conservancy fields (e.g., water area change detection, greenhouse extraction, water body extraction (optical, radar), sheet forest extraction, cage culture extraction, sand pit extraction, river house extraction, barrage extraction, photovoltaic power plant extraction, etc.), agroforestry fields (e.g., crop extraction (wheat, rice, potato, etc.), unmanned aerial vehicle crop identification (corn, flue-cured tobacco, myotonia, etc.), land parcel identification, growth monitoring (index calculation), agricultural assessment, pest monitoring, planting suggestion pushing, etc.), secondary disaster fields (e.g., disaster monitoring, travel disaster warning, etc.), life services, Take-out, logistics) areas (e.g., travel route planning, travel advice pushing, personnel mobilization, price adjustment, etc.); the city planning field (e.g., road network extraction (satellite, drone), building extraction, building change detection (satellite, drone), fire protection, etc.).

In the above embodiment of the present invention, the method further includes: displaying the ground feature type; receiving feedback information corresponding to the ground feature type, wherein the feedback information is obtained by modifying the ground feature type; and adjusting the ground feature classification model based on the feedback information.

Example 10

Fig. 15 is a flowchart of an image classification method according to embodiment 10 of the present application. As shown in fig. 15, the method may include the steps of:

in step S1502, a building image is acquired.

Step S1504, the building image is processed by using the change detection model to obtain a detection result of whether the building image changes.

The change detection model is obtained by adjusting a pre-training model through an image sample and a target sample, the target sample is a preset sample corresponding to the image sample in a preset sample set, and the pre-training model is obtained by training through the preset sample set.

It should be noted that the embodiment of the present invention may be applied to, but not limited to, a ground object classification actual application scenario, for example, in a building ground object change detection scenario, and is configured to detect whether a change of a building ground object occurs by using a change detection model, so as to obtain a detection result whether a building image changes. For example, from the presence of a building to a wasteland, indicating that the building in the building image is removed, or from the wasteland to the presence of the building, indicating that the building in the building image is a newly created building.

Example 11

The embodiment of the application can provide a computer terminal, and the computer terminal can be any one computer terminal device in a computer terminal group. Optionally, in this embodiment, the computer terminal may also be replaced with a terminal device such as a mobile terminal.

Optionally, in this embodiment, the computer terminal may be located in at least one network device of a plurality of network devices of a computer network.

In this embodiment, the computer terminal may execute program codes of the following steps in the image classification method: acquiring a target image; processing the target image by using the image classification model to obtain a classification result corresponding to the target image, wherein the image classification model is obtained by adjusting the pre-training model through the image sample and the target sample, the target sample is a preset sample corresponding to the image sample in the preset sample set, and the pre-training model is obtained by training through the preset sample set.

Alternatively, fig. 16 is a block diagram of a computer terminal according to an embodiment of the present application. As shown in fig. 16, the computer terminal a may include: one or more processors 1602 (only one of which is shown), and a memory 1604.

The memory may be configured to store software programs and modules, such as program instructions/modules corresponding to the image classification method and apparatus in the embodiments of the present application, and the processor executes various functional applications and data processing by running the software programs and modules stored in the memory, so as to implement the image classification method. The memory may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory may further include memory remotely located from the processor, and these remote memories may be connected to terminal a through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The processor can call the information and application program stored in the memory through the transmission device to execute the following steps: acquiring a target image; processing the target image by using the image classification model to obtain a classification result corresponding to the target image, wherein the image classification model is obtained by adjusting the pre-training model through the image sample and the target sample, the target sample is a preset sample corresponding to the image sample in the preset sample set, and the pre-training model is obtained by training through the preset sample set.

Optionally, the processor may further execute the program code of the following steps: processing a target image and a plurality of preset samples contained in a preset sample set by using an image classification model to obtain a first characteristic of the target image and second characteristics of the plurality of preset samples; and determining the target sample based on the similarity of the first characteristic and the plurality of second characteristics.

Optionally, the processor may further execute the program code of the following steps: obtaining the similarity of the first characteristic and each second characteristic to obtain a plurality of similarities; and determining a preset sample corresponding to the second feature with the similarity greater than the preset similarity as a target sample.

Optionally, the processor may further execute the program code of the following steps: and correcting the network parameters of the target layer of the pre-training model by using the target image and the target sample, wherein the network parameters of other layers except the target layer in the pre-training model are kept unchanged.

Optionally, the processor may further execute the program code of the following steps: obtaining a preset sample set, wherein the preset sample set comprises: the method comprises the following steps that a plurality of preset samples are obtained, wherein each preset sample corresponds to a mask and a category label, the mask is used for representing a target area in the corresponding preset sample, and the category label is a category label of the target area; performing feature extraction on a plurality of preset samples and masks corresponding to the plurality of preset samples by using an initial classification model, and performing prediction processing on fusion features corresponding to the plurality of preset samples to obtain class identification results corresponding to the plurality of preset samples, wherein the fusion features are obtained by fusing third features of the corresponding preset samples and fourth features of the corresponding masks, and the class identification results are class identification results of a target area; obtaining a loss value of the initial classification model based on the class identification result and the class label corresponding to the plurality of preset samples; and adjusting the initial classification model based on the comparison result of the loss value and the preset loss value to obtain a pre-training model.

Optionally, the processor may further execute the program code of the following steps: processing each preset sample by utilizing the first two residual blocks in the image classification model to obtain the sample characteristics of each preset sample; processing the mask corresponding to each preset sample by utilizing the convolution layer in the image classification model to obtain the mask characteristic of each preset sample; superposing the sample characteristics and the mask characteristics to obtain first image characteristics of each preset sample; processing the first image characteristics by utilizing the last two residual blocks in the image classification model to obtain second image characteristics of each preset sample; and processing the second image characteristics by using an output layer in the image classification model to obtain a category identification result of each preset sample.

Optionally, the processor may further execute the program code of the following steps: adjusting the size of each preset sample to be a first preset size by utilizing a bilinear interpolation algorithm; and adjusting the size of the mask corresponding to each preset sample to be a second preset size by using a nearest neighbor difference algorithm, wherein the second preset size is the same as the size of the sample feature of each preset sample.

Optionally, the processor may further execute the program code of the following steps: determining a target size of a target image; and screening a preset sample set from the data set based on the target size, wherein the size of a plurality of preset samples contained in the preset sample set is the same as the target size.

Optionally, the processor may further execute the program code of the following steps: outputting a target sample; receiving a feedback result of the target sample, wherein the feedback result is obtained by modifying the target sample; and correcting the image classification model by using the target image and the feedback result.

Optionally, the processor may further execute the program code of the following steps: displaying a plurality of task types; receiving a target task type selected from a plurality of task types; displaying a pre-training model corresponding to the target task type; and receiving an image sample, and adjusting the pre-training model by adopting the image sample and the target sample to obtain an image classification model corresponding to the target task type.

The processor can call the information and application program stored in the memory through the transmission device to execute the following steps: acquiring a building image; and processing the building image by using the image classification model to obtain a classification result of the building contained in the building image, wherein the image classification model is obtained by adjusting a pre-training model through an image sample and a target sample, the target sample is a spot classification sample corresponding to the image sample in a spot classification sample set, and the pre-training model is obtained by training the spot classification sample set.

Optionally, the processor may further execute the program code of the following steps: determining a marking mode of the building based on the classification result; marking a building according to a marking mode to obtain a first marking image; the first marker image is displayed.

Optionally, the processor may further execute the program code of the following steps: marking the building based on the classification result to obtain a second marked image; the second marker image is displayed.

Optionally, the processor may further execute the program code of the following steps: displaying the classification result; receiving feedback information corresponding to the classification result, wherein the feedback information is obtained by modifying the classification result; the image classification model is adjusted based on the feedback information.

The processor can call the information and application program stored in the memory through the transmission device to execute the following steps: the cloud server receives a target image uploaded by a client; the cloud server processes the target image by using the image classification model to obtain a classification result corresponding to the target image, wherein the image classification model is obtained by adjusting the pre-training model through image samples and target samples, the target samples are preset samples corresponding to the image samples in a preset sample set, and the image classification model is obtained by training the preset sample set; and the cloud server feeds back the classification result to the client.

The processor can call the information and application program stored in the memory through the transmission device to execute the following steps: the method comprises the steps that a cloud server receives a model training request uploaded by a client; the cloud server acquires a target image corresponding to the model training request; the cloud server processes the target image and the preset sample set by using the pre-training model, and determines a target sample corresponding to the target image in the preset sample set, wherein the pre-training model is obtained by training the preset sample set; the cloud server adjusts the pre-training model by using the target image and the target sample to obtain an image classification model; and the cloud server feeds the image classification model back to the client.

It can be understood by those skilled in the art that the structure shown in fig. 16 is only an illustration, and the computer terminal may also be a terminal device such as a smart phone (e.g., an Android phone, an iOS phone, etc.), a tablet computer, a palm computer, a Mobile Internet Device (MID), a PAD, etc. Fig. 16 is a diagram illustrating a structure of the electronic device. For example, the computer terminal a may also include more or fewer components (e.g., network interfaces, display devices, etc.) than shown in fig. 16, or have a different configuration than shown in fig. 16.

Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by a program instructing hardware associated with the terminal device, where the program may be stored in a computer-readable storage medium, and the storage medium may include: flash disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.

Example 12

Embodiments of the present application also provide a storage medium. Optionally, in this embodiment, the storage medium may be configured to store program codes executed by the image classification method provided in the foregoing embodiment.

Optionally, in this embodiment, the storage medium may be located in any one of computer terminals in a computer terminal group in a computer network, or in any one of mobile terminals in a mobile terminal group.

Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: acquiring a target image; processing the target image by using the image classification model to obtain a classification result corresponding to the target image, wherein the image classification model is obtained by adjusting the pre-training model through the image sample and the target sample, the target sample is a preset sample corresponding to the image sample in the preset sample set, and the pre-training model is obtained by training through the preset sample set.

Optionally, the storage medium is further configured to store program codes for performing the following steps: processing a target image and a plurality of preset samples contained in a preset sample set by using an image classification model to obtain a first characteristic of the target image and second characteristics of the plurality of preset samples; and determining the target sample based on the similarity of the first characteristic and the plurality of second characteristics.

Optionally, the storage medium is further configured to store program codes for performing the following steps: the similarity of the first characteristic and each second characteristic is obtained, and a plurality of similarities are obtained; and determining a preset sample corresponding to the second feature with the similarity greater than the preset similarity as a target sample.

Optionally, the storage medium is further configured to store program codes for performing the following steps: and correcting the network parameters of the target layer of the pre-training model by using the target image and the target sample, wherein the network parameters of other layers except the target layer in the pre-training model are kept unchanged.

Optionally, the storage medium is further configured to store program codes for performing the following steps: obtaining a preset sample set, wherein the preset sample set comprises: the method comprises the following steps that a plurality of preset samples are obtained, wherein each preset sample corresponds to a mask and a category label, the mask is used for representing a target area in the corresponding preset sample, and the category label is a category label of the target area; performing feature extraction on a plurality of preset samples and masks corresponding to the plurality of preset samples by using an initial classification model, and performing prediction processing on fusion features corresponding to the plurality of preset samples to obtain class identification results corresponding to the plurality of preset samples, wherein the fusion features are obtained by fusing third features of the corresponding preset samples and fourth features of the corresponding masks, and the class identification results are class identification results of a target area; obtaining a loss value of the initial classification model based on the class identification result and the class label corresponding to the plurality of preset samples; and adjusting the initial classification model based on the comparison result of the loss value and the preset loss value to obtain a pre-training model.

Optionally, the storage medium is further configured to store program codes for performing the following steps: processing each preset sample by utilizing the first two residual blocks in the image classification model to obtain the sample characteristics of each preset sample; processing the mask corresponding to each preset sample by utilizing the convolution layer in the image classification model to obtain the mask characteristic of each preset sample; superposing the sample characteristics and the mask characteristics to obtain first image characteristics of each preset sample; processing the first image characteristics by utilizing the last two residual blocks in the image classification model to obtain second image characteristics of each preset sample; and processing the second image characteristics by using an output layer in the image classification model to obtain a category identification result of each preset sample.

Optionally, the storage medium is further configured to store program codes for performing the following steps: adjusting the size of each preset sample to be a first preset size by utilizing a bilinear interpolation algorithm; and adjusting the size of the mask corresponding to each preset sample to be a second preset size by using a nearest neighbor difference algorithm, wherein the second preset size is the same as the size of the sample feature of each preset sample.

Optionally, the storage medium is further configured to store program codes for performing the following steps: determining a target size of a target image; and screening a preset sample set from the data set based on the target size, wherein the size of a plurality of preset samples contained in the preset sample set is the same as the target size.

Optionally, the storage medium is further configured to store program codes for performing the following steps: outputting a target sample; receiving a feedback result of the target sample, wherein the feedback result is obtained by modifying the target sample; and correcting the image classification model by using the target image and the feedback result.

Optionally, the storage medium is further configured to store program codes for performing the following steps: displaying a plurality of task types; receiving a target task type selected from a plurality of task types; displaying a pre-training model corresponding to the target task type; and receiving an image sample, and adjusting the pre-training model by adopting the image sample and the target sample to obtain an image classification model corresponding to the target task type.

Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: acquiring a building image; and processing the building image by using the image classification model to obtain a classification result of the building contained in the building image, wherein the image classification model is obtained by adjusting a pre-training model through an image sample and a target sample, the target sample is a spot classification sample corresponding to the image sample in a spot classification sample set, and the pre-training model is obtained by training the spot classification sample set.

Optionally, the storage medium is further configured to store program codes for performing the following steps: determining a marking mode of the building based on the classification result; marking a building according to a marking mode to obtain a first marking image; the first marker image is displayed.

Optionally, the storage medium is further configured to store program codes for performing the following steps: marking the building based on the classification result to obtain a second marked image; the second marker image is displayed.

Optionally, the storage medium is further configured to store program codes for performing the following steps: displaying the classification result; receiving feedback information corresponding to the classification result, wherein the feedback information is obtained by modifying the classification result; the image classification model is adjusted based on the feedback information.

Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: the cloud server receives a target image uploaded by a client; the cloud server processes the target image by using the image classification model to obtain a classification result corresponding to the target image, wherein the image classification model is obtained by adjusting the pre-training model through image samples and target samples, the target samples are preset samples corresponding to the image samples in a preset sample set, and the image classification model is obtained by training the preset sample set; and the cloud server feeds back the classification result to the client.

Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: the method comprises the steps that a cloud server receives a model training request uploaded by a client; the cloud server acquires a target image corresponding to the model training request; the cloud server processes the target image and the preset sample set by using the pre-training model, and determines a target sample corresponding to the target image in the preset sample set, wherein the pre-training model is obtained by training the preset sample set; the cloud server adjusts the pre-training model by using the target image and the target sample to obtain an image classification model; and the cloud server feeds the image classification model back to the client.

It should be noted that the processes of collecting, storing, using, processing, transmitting, providing, disclosing and the like of the images involved in the above embodiments of the present application are all in accordance with the regulations of the relevant laws and regulations, and do not violate the customs of the official order.

The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.

In the above embodiments of the present application, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

The foregoing is only a preferred embodiment of the present application and it should be noted that those skilled in the art can make several improvements and modifications without departing from the principle of the present application, and these improvements and modifications should also be considered as the protection scope of the present application.

Claims

1. An image classification method, comprising:

acquiring a target image;

processing the target image by using an image classification model to obtain a classification result corresponding to the target image, wherein the image classification model is obtained by adjusting a pre-training model through an image sample and a target sample, the target sample is a preset sample corresponding to the image sample in a preset sample set, and the pre-training model is obtained by training through the preset sample set.

2. The method of claim 1, further comprising:

processing the image sample and a plurality of preset samples contained in the preset sample set by using the image classification model to obtain a first characteristic of the image sample and a second characteristic of the plurality of preset samples;

determining the target sample based on a similarity of the first feature to a plurality of second features.

3. The method of claim 2, wherein determining the target sample based on the similarity of the first feature to a plurality of second features comprises:

obtaining the similarity of the first characteristic and each second characteristic to obtain a plurality of similarities;

and determining a preset sample corresponding to the second feature with the similarity greater than the preset similarity as the target sample.

4. The method of claim 1, further comprising:

and adjusting the network parameters of the target layer of the pre-training model by using the image sample and the target sample, wherein the network parameters of other layers except the target layer in the pre-training model are kept unchanged.

5. The method of claim 1, further comprising:

obtaining the preset sample set, wherein the preset sample set comprises: the method comprises the following steps that a mask and a category label are corresponding to each preset sample, the mask is used for representing a target area in the corresponding preset sample, and the category label is the category label of the target area;

performing feature extraction on the plurality of preset samples and the masks corresponding to the plurality of preset samples by using an initial classification model, and performing prediction processing on fusion features corresponding to the plurality of preset samples to obtain class identification results corresponding to the plurality of preset samples, wherein the fusion features are obtained by fusing third features of corresponding preset samples and fourth features of corresponding masks, and the class identification results are class identification results of the target area;

obtaining a loss value of the initial classification model based on the class identification results and the class labels corresponding to the plurality of preset samples;

and adjusting the initial classification model based on the comparison result of the loss value and a preset loss value to obtain the pre-training model.

6. The method according to any one of claims 1 to 5, further comprising:

displaying a plurality of task types;

receiving a target task type selected from the plurality of task types;

displaying the pre-training model corresponding to the target task type;

and receiving the image sample, and adjusting the pre-training model by adopting the image sample and the target sample to obtain the image classification model corresponding to the target task type.

7. An image classification method, comprising:

acquiring a building image;

and processing the building image by using an image classification model to obtain a classification result of the building contained in the building image, wherein the image classification model is obtained by adjusting a pre-training model through an image sample and a target sample, the target sample is a pattern spot classification sample corresponding to the image sample in a pattern spot classification sample set, and the pre-training model is obtained by training the pattern spot classification sample set.

8. The method of claim 7, further comprising:

determining a marking mode of the building based on the classification result;

marking the building according to the marking mode to obtain a first marking image;

displaying the first marker image.

9. The method of claim 8, further comprising:

marking the building based on the classification result to obtain a second marked image;

displaying the second marker image.

10. The method according to any one of claims 7 to 9, further comprising:

displaying the classification result;

receiving feedback information corresponding to the classification result, wherein the feedback information is obtained by modifying the classification result;

adjusting the image classification model based on the feedback information.

11. An image classification method, comprising:

the cloud server receives a target image uploaded by a client;

the cloud server processes the target image by using an image classification model to obtain a classification result corresponding to the target image, wherein the image classification model is obtained by adjusting a pre-training model through an image sample and a target sample, the target sample is a preset sample corresponding to the image sample in a preset sample set, and the pre-training model is obtained by training the preset sample set;

and the cloud server feeds the classification result back to the client.

12. A computer-readable storage medium, comprising a stored program, wherein the program, when executed, controls an apparatus in which the computer-readable storage medium is located to perform the image classification method according to any one of claims 1 to 11.

13. A computer terminal, comprising: a processor and a memory, the processor being configured to execute a program stored in the memory, wherein the program when executed performs the image classification method of any of claims 1 to 9.