CN109284687B

CN109284687B - Scene recognition method and device based on indoor opportunity signal enhancement

Info

Publication number: CN109284687B
Application number: CN201810972177.6A
Authority: CN
Inventors: 呙维; 吴然; 陈艳华; 朱欣焰
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2018-08-24
Filing date: 2018-08-24
Publication date: 2020-08-07
Anticipated expiration: 2038-08-24
Also published as: CN109284687A

Abstract

The invention provides a scene recognition method and a scene recognition device based on indoor opportunistic signal enhancement.

Description

Scene recognition method and device based on indoor opportunity signal enhancement

Technical Field

The invention relates to the technical field of indoor scene identification, in particular to a scene identification method and device based on indoor opportunity signal enhancement.

Background

The scene recognition problem belongs to a very challenging subject in the field of computer vision, and is applied to various fields such as automatic driving, robots and the like.

The existing scene recognition method usually performs recognition and classification in an image feature analysis manner, for example, the research related to traditional image processing and pattern recognition performs classification and marking of scene images, but a large amount of manual operations are required and the algorithm is complex. Some deep learning methods are utilized on large-scale data sets, however, for an indoor environment, scenes of the large-scale data sets contain complex decoration and layout, so that the problem of low recognition accuracy rate still exists when the indoor scene recognition is solved by utilizing the deep learning.

From the above, the scene identification method in the prior art has the technical problem of low identification accuracy.

Disclosure of Invention

The invention provides a scene recognition method and a scene recognition device based on indoor opportunity signal enhancement, which are used for solving or at least partially solving the technical problem of low recognition accuracy rate of the scene recognition method in the prior art.

The invention provides a scene identification method based on indoor opportunity signal enhancement, which comprises the following steps:

step S1: acquiring a typical scene image set aiming at a preset application scene;

step S2: acquiring positioning information and image information of mobile equipment on a main road of a scene to be researched by combining a scene base map corresponding to the scene to be researched, wherein the positioning information and the image information correspond to positioning points;

step S3: inputting the scene image set into a preset feature fusion neural network, and performing model fine tuning on a convolutional neural network module of the preset feature fusion neural network to obtain a convolutional neural network model migrated to the scene to be researched, wherein the preset feature fusion neural network comprises an image feature extraction module and a feature fusion decision module, the image feature extraction module is realized by the convolutional neural network module, and image information corresponding to the positioning point is converted into an image feature vector;

step S4: superposing the positioning points and the scene base map to obtain position characteristic vectors of the positioning points in the scene;

step S5: superposing an error circle which takes a positioning point as a center and takes a preset positioning error as a radius with the scene base map to obtain the intersection area of the error circle and each scene, and obtaining a relation characteristic vector of the positioning point and each scene according to the intersection area;

step S6: splicing a position feature vector of a positioning point in a scene and a relation feature vector of the positioning point and each scene into a positioning feature vector, splicing the positioning feature vector and an image feature vector corresponding to the positioning point, inputting the spliced image feature vector into a feature fusion decision module, and training parameters of the feature fusion decision module;

step S7: fixing parameters of an image feature extraction module obtained after model fine tuning and a feature fusion decision module trained in the step to obtain a combined feature fusion neural network;

step S8: and inputting the image information corresponding to the positioning points and the relation characteristic vectors of the positioning points and each scene into the combined characteristic fusion neural network to obtain scene prediction characteristic vectors, and taking the scene type corresponding to the item with the highest probability in the prediction characteristic vectors as a scene recognition result.

Further, step S3 specifically includes:

step S3.1: inputting the scene image set into a preset feature fusion neural network, training the convolutional neural network module, updating full-connection layer parameters of the convolutional neural network module, reserving parameters of a convolutional layer, obtaining a convolutional neural network module after model fine tuning, and taking the convolutional neural network module as an image feature extraction module after model fine tuning;

step S3.2: and inputting the image information corresponding to the positioning points into the model fine-tuned convolutional neural network module to obtain an output tensor, wherein the output tensor is used as the image characteristic vector, and the image characteristic vector corresponds to the positioning points.

Further, step S4 specifically includes:

step S4.1, the position characteristic vector is initialized and is assigned as

The position feature vector element number is N_category+1, wherein, N_categoryTaking the initial item as a characteristic expression item of which the positioning point is positioned outside all scenes as the number of the scene categories;

step S4.2, judging the relation between the positioning point and each scene, if the positioning point falls into the kth scene, assigning the (k + 1) th item of the position feature vector to be 1, specifically:

if the anchor point is not in any scene, assigning the first term of the position feature vector to be 1, specifically:

and S4.3, storing the position characteristic vector, and corresponding the position characteristic vector to a specific positioning point.

Further, step S5 specifically includes:

step S5.1, initializing a relation feature vector with each scene, wherein the feature vector has N_categoryBit elements corresponding to the significance of the relationship between each scene and the anchor point, wherein N is_categoryTo the total number of scene categories, assign it as

S5.2, combining the scene boundary with the preset positioning error R by positioning the point location center_noiseOverlapping error circles with the radius to calculate the intersection area, traversing each scene, accumulating the area values according to the scene types, assigning the area values to corresponding elements of the relation characteristic vector, and normalizing the characteristic vector to obtain the relation characteristic vector, wherein the form is as follows:

{S_i/1N_categoryS_i}

wherein S is_iThe intersection area obtained by superposing the scene of the category i and the error circle, N_categoryThe total number of scene categories;

and S5.3, storing the relation characteristic vectors of the positioning points and each scene, and corresponding the relation characteristic vectors to specific positioning points.

Further, step S6 specifically includes:

step S6.1: the position feature vector V of each positioning point_locationAnd each positioning point and each scene relation feature vector V_relationSplicing into a positioning feature vector V_positioningThe shape is as follows:

step S6.2: splicing the image feature vector and the positioning feature vector to form [1, 3 x N ]_category+1]Characteristic vector V of_fuseThe shape is as follows:

step S6.3: splicing the characteristic vector V_fuseInputting the feature fusion decision module, and outputting the shape of [1, 3N ] by the feature fusion full-connection layer of the feature fusion decision module_category+1]And then inputting the fusion feature vector through a final prediction full-link layer, training parameters of the feature fusion decision module, and obtaining the trained feature fusion decision module.

Further, step S8 specifically includes:

inputting the image information bit feature vector corresponding to the positioning point into an image feature extraction module obtained after the model is finely adjusted, inputting the relation feature vector of the positioning point and each scene into the trained feature fusion decision module, outputting the probability value of the positioning point belonging to each scene, and taking the scene type corresponding to the highest probability value as the scene recognition result.

Based on the same inventive concept, the second aspect of the present invention provides a scene recognition apparatus based on indoor opportunistic signal enhancement, comprising:

the scene image set acquisition module is used for acquiring a typical scene image set aiming at a preset application scene;

the mobile equipment comprises a positioning information and image information acquisition module, a positioning information and image information acquisition module and a data processing module, wherein the positioning information and image information acquisition module is used for acquiring positioning information and image information of the mobile equipment on a main road of a scene to be researched by combining a scene base map corresponding to the scene to be researched, and the positioning information and the image information correspond to the positioning points;

the migration learning module is used for inputting the scene image set into a preset feature fusion neural network, performing model fine tuning on a convolutional neural network module of the preset feature fusion neural network, and obtaining a convolutional neural network model migrated to the scene to be researched, wherein the preset feature fusion neural network comprises an image feature extraction module and a feature fusion decision module, the image feature extraction module is realized by the convolutional neural network module, and image information corresponding to the positioning point is converted into an image feature vector;

the position feature vector calculation module is used for superposing the positioning points and the scene base map to obtain position feature vectors of the positioning points in the scene;

the relation characteristic vector calculation module is used for superposing an error circle which takes a positioning point as a center and takes a preset positioning error as a radius with the scene base map to obtain the intersection area of the error circle and each scene, and obtaining the relation characteristic vector of the positioning point and each scene according to the intersection area;

the feature fusion decision module training module is used for splicing a position feature vector of a positioning point in a scene and a relation feature vector of the positioning point and each scene into a positioning feature vector, splicing the positioning feature vector and an image feature vector corresponding to the positioning point, inputting the image feature vector into the feature fusion decision module, and training parameters of the feature fusion decision module;

the merging module is used for fixing the parameters of the image feature extraction module obtained after model fine tuning and the feature fusion decision module trained in the step to obtain a merged feature fusion neural network;

and the prediction module is used for inputting the image information corresponding to the positioning points and the relation characteristic vectors of the positioning points and each scene into the combined characteristic fusion neural network to obtain scene prediction characteristic vectors, and taking the scene type corresponding to the item with the highest probability in the prediction characteristic vectors as a scene recognition result.

Further, the migration learning module is specifically configured to:

inputting the scene image set into a preset feature fusion neural network, training the convolutional neural network module, updating full-connection layer parameters of the convolutional neural network module, reserving parameters of a convolutional layer, obtaining a convolutional neural network module after model fine tuning, and taking the convolutional neural network module as an image feature extraction module after model fine tuning;

and inputting the image information corresponding to the positioning points into the model fine-tuned convolutional neural network module to obtain an output tensor, wherein the output tensor is used as the image characteristic vector, and the image characteristic vector corresponds to the positioning points.

Based on the same inventive concept, a third aspect of the present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed, performs the method of the first aspect.

Based on the same inventive concept, a fourth aspect of the present invention provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of the first aspect when executing the program.

One or more technical solutions in the embodiments of the present application have at least one or more of the following technical effects:

in the method provided by the invention, a typical scene image set is collected and input into the preset feature fusion neural network, model fine tuning is carried out on the convolutional neural network module of the preset feature fusion neural network, so as to obtain a convolutional neural network model which is transferred to a scene to be researched, and the image information corresponding to the location point is converted into an image characteristic vector through the convolutional neural network after the model fine tuning, that is, the convolution neural network can be used for obtaining the scene level image characteristics of the indoor positioning points, the positioning characteristics of the positioning points are described through the positioning information of the positioning points, the scene base map and the positioning errors, the positioning feature is used to expand and enhance the image feature, a transfer learning method is adopted to combine the image feature and the positioning feature to perform scene recognition and prediction, therefore, higher scene identification accuracy can be obtained, and the technical problem that the scene identification method in the prior art is low in identification accuracy is solved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

Fig. 1 is a flowchart of a scene recognition method based on indoor opportunistic signal enhancement in an embodiment of the present invention;

FIG. 2 is a schematic diagram of the method shown in FIG. 1;

FIG. 3 is a block diagram of a merged feature fusion neural network in the method shown in FIG. 1;

FIG. 4 is a comparison diagram of scene recognition results of the prior art method and the present embodiment method;

FIG. 5 is a block diagram of a scene recognition device based on indoor signal enhancement in an embodiment of the present invention;

FIG. 6 is a block diagram of a computer-readable storage medium according to an embodiment of the present invention;

fig. 7 is a block diagram of a computer device according to an embodiment of the present invention.

Detailed Description

The embodiment of the invention provides a scene identification method and device based on indoor opportunity signal enhancement, which are used for solving the technical problem that the scene identification method in the prior art is low in identification accuracy.

In order to achieve the technical effects, the general idea of the invention is as follows:

in an indoor scene, semantic types of a current scene are described by acquiring image information and positioning information through mobile equipment, and migration learning based on a research scene image is performed on a traditional convolutional neural network model so as to extract scene-level image features of the image acquired by the mobile equipment through a convolutional neural network. Meanwhile, the positioning information of the positioning points contains rich position characteristics and relation characteristics with surrounding scenes, and the positioning given characteristics after the position characteristics and the relation characteristics are spliced can be used as another type of characteristics to expand and enhance the image characteristics. The method comprises the steps of superposing a positioning point and a scene base map to obtain position characteristics describing the position of the positioning point, constructing an error circle through a positioning error, superposing the error circle with the scene base map, counting the superposition area of various scenes to evaluate the significance degree of the relation between the positioning point and surrounding scenes, and representing the position characteristics and the relation characteristics in a vector form. After the image characteristic vector and the positioning characteristic vector are connected through the characteristic fusion full-connection layer, the output characteristic vector is continuously input into a subsequent final prediction full-connection layer, an optimal fusion and decision full-connection layer is trained through the collected indoor positioning and image data, and the possibility that the positioning point is in various scenes is finally output.

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Example one

The embodiment provides a scene recognition method based on indoor opportunistic signal enhancement, please refer to fig. 1, and the method includes:

step S1 is first executed: and acquiring a typical scene image set aiming at a preset application scene.

Specifically, the preset application scene may be selected according to the requirements of actual research or application, and a typical scene image set may be acquired by using an existing method.

Then, step S2 is executed: and acquiring positioning information and image information of the mobile equipment on a main road of the scene to be researched by combining a scene base map corresponding to the scene to be researched, wherein the positioning information and the image information correspond to the positioning points.

Specifically, the scene base map refers to an indoor base map of an application scene, and one scene base map includes a plurality of scenes. Indoor signals of opportunity are radio signals collected using radio, television, mobile communications, and other types of equipment. The preset application scene in step S1 is the same scene as the scene to be studied in step S2, but the scene image set is acquired in step S1, and the positioning information and the image information are acquired in step S2, which are different data. Real-time images and positioning information (namely opportunity signals) are acquired through the mobile equipment, and then the information acquired by the mobile equipment is acquired. The positioning points have corresponding relations with the image information and the positioning information and are used for training parameters of a subsequent feature fusion decision module. In a specific implementation process, in order to ensure that data is closer to a real situation, certain disturbance and randomness are required for image data, and interference can be added in the aspects of shooting angle, focusing, exposure and the like.

Step S3 is executed next: the method comprises the steps of inputting a scene image set into a preset feature fusion neural network, carrying out model fine tuning on a convolutional neural network module of the preset feature fusion neural network, and obtaining a convolutional neural network model which is transferred to a scene to be researched, wherein the preset feature fusion neural network comprises an image feature extraction module and a feature fusion decision module, the image feature extraction module is realized by the convolutional neural network module, and image information corresponding to a positioning point is converted into an image feature vector.

Specifically, the preset feature fusion neural network is mainly divided into an image feature extraction module and a feature fusion and feature fusion decision module, in the embodiment, a convolutional neural network is selected as an image feature acquisition module, and the output form of the convolutional neural network is [1, N [ ]_category]The probability value of each kind of scene corresponding to the image attribute, N_categorySplicing the output tensor (image eigenvector) of the convolutional neural network into a shape of [1, 2 x N ] corresponding to the total number of the research scene classes_category+1]The input tensor of the positioning characteristics (positioning characteristic vector), the common input characteristics of which are fused into a full connection layer, and the output form is [1, 3 x N ]_category+1]The fused feature vector of (1, N) is input into the final predicted fully-connected layer_category]Wherein the elements are used to characterize the probability size for the anchor point belonging to the corresponding scene.

In a specific implementation process, the image feature extraction module of the preset feature fusion neural network may select a known convolutional neural network model, such as a deep convolutional neural network model AlexNet, inclusion V2 inclusion V3, ResNet, and the like. As shown in fig. 3, the structure diagram of the preset feature fusion neural network is shown, wherein the image feature extraction module can be implemented by using inclusion V3, and specifically includes a general convolution layer (convolution layer and pooling layer), an inclusion module, an average pooling layer, and a full connection layer. The feature fusion decision module comprises a feature fusion full-link layer and a final prediction full-link layer and is used for predicting scenes.

In one embodiment, step S3 specifically includes:

step S3.1: inputting the scene image set into a preset feature fusion neural network, training a convolutional neural network module, updating full-connection layer parameters of the convolutional neural network module, keeping parameters of a convolutional layer, obtaining a convolutional neural network module after model fine tuning, and taking the convolutional neural network module as an image feature extraction module after model fine tuning;

step S3.2: and inputting the image information corresponding to the positioning points into the model fine tuning post-convolution neural network module to obtain an output tensor, wherein the output tensor is used as an image characteristic vector, and the image characteristic vector corresponds to the positioning points.

Specifically, there may be multiple full connection layers of the convolutional neural network module, and the typical scene image set acquired in step S1 may be divided into a training set and a verification set, which are used to train the convolutional neural network module to obtain the convolutional neural network model migrated to the research scene.

In a specific implementation process, the last fully-connected layer refers to a fully-connected layer connected after an activation function of the last convolutional layer in a convolutional neural network selected for fine tuning, and is used for transforming bottom layer features extracted by convolution into features of a scene level and replacing the output tensor size of the final fully-connected layer with [1, N ]_category]Therefore, the number of the categories output by the convolutional neural network is suitable for a research scene, and the transfer learning is also completed.

Step S4 is executed again: and superposing the positioning points and the scene base map to obtain the position characteristic vectors of the positioning points in the scene.

In one embodiment, step S4 specifically includes:

step S4.1, the position characteristic vector is initialized and is assigned as

The bit number of the position feature vector element is N_category+1, wherein, N_categoryTaking the initial item as a characteristic expression item of which the positioning point is positioned outside all scenes as the number of the scene categories;

Step S5 is executed next: superposing an error circle with a positioning point as a center and a preset positioning error as a radius with a scene base map to obtain the intersection area of the error circle and each scene, and obtaining the relation characteristic vector of the positioning point and each scene according to the intersection area

Specifically, the positioning characteristics of the positioning points are described through the positioning information of the positioning points, the scene base map and the positioning errors, then the positioning characteristic vectors are converted, and the image characteristics can be expanded and enhanced by using the positioning characteristics, so that the accuracy of prediction is improved.

In one embodiment, step S6 specifically includes:

step S6.1: the position feature vector V of each positioning point_locationAnd each positioning point and each scene relation feature vector V_relationSplicing into a positioning feature vector V_positoningThe shape is as follows:

step S6.2: splicing the image feature vector and the positioning feature vector to form [1, 3 x N ]_categoryA +1 eigenvector Vfuse, shaped as:

step S6.3: splicing the characteristic vector V_fuseInputting a feature fusion decision module, and outputting a feature fusion full-connection layer of the feature fusion decision module in a form of [1, 3 x N ]_category+1]And then inputting the fusion feature vector through a final prediction full-connection layer, training parameters of the feature fusion decision module, and obtaining the trained feature fusion decision module.

Specifically, the image feature vector and the positioning feature vector are connected through a feature fusion full-connection layer to obtain a spliced feature vector V_fuseAnd then the final prediction full-connection layer is continuously input into the subsequent final prediction full-connection layer, so that the feature fusion decision module can be trained.

In executing step S7: and fixing parameters of the image feature extraction module obtained after model fine tuning and the feature fusion decision module trained in the step to obtain a combined feature fusion neural network.

Finally, step S8 is executed: and inputting the image information corresponding to the positioning points and the relation characteristic vectors of the positioning points and each scene into the combined characteristic fusion neural network to obtain scene prediction characteristic vectors, and taking the scene type corresponding to the item with the highest probability in the prediction characteristic vectors as a scene recognition result.

Specifically, the combined feature fusion neural network obtained in step S7 is used for scene recognition, and after the image information and the relationship feature vector are combined, a prediction result can be obtained, because the image feature extraction module in the combined feature fusion neural network is a convolutional neural network finely tuned by a model, and the feature fusion decision module is obtained by training a feature vector obtained by splicing the image feature vector and the positioning feature vector, that is, a joint decision can be made by fully using two types of features through a recognition strategy for combining the image feature and the positioning feature, so that the number of prediction error-classification results is greatly reduced, and the accuracy of prediction is improved.

Step S8 specifically includes:

inputting the image information bit feature vector corresponding to the positioning point into an image feature extraction module obtained after fine tuning of a model, inputting the relation feature vector of the positioning point and each scene into a trained feature fusion decision module, outputting the probability value of the positioning point belonging to each scene, and taking the scene type corresponding to the highest probability value as a scene recognition result.

To more clearly illustrate the implementation process of the scene recognition method of the present invention, a specific example is introduced below, please refer to fig. 2, which is a schematic diagram of the scene recognition method provided by the present invention, in which the collected information includes positioning information and image information, and the positioning information and the image information both correspond to the positioning points, that is, each positioning point has corresponding positioning information and image information; positioning characteristic vectors can be obtained through superposition operation of the positioning points, the scene base map and the positioning errors; inputting the image information of the positioning points into a convolutional neural network to obtain image characteristic vectors; and then splicing the positioning feature vector and the image feature vector to obtain a fusion feature vector, inputting the fusion feature vector into the combined feature fusion neural network, and performing feature fusion and prediction to obtain a prediction result.

In specific implementation, the neural network parameter values are fixed and stored into a callable model, and the automatic operation of the method flow can be realized by using program call, and the flow of scene identification provided by the invention is specifically described below by taking a train station scene as an example:

step 1, selecting 8 types of scenes for acquiring a typical scene image set aiming at a railway station;

step 2, aiming at a railway station scene, combining a scene base map to collect mobile equipment positioning information and image information on a scene main road;

and 3, inputting the typical scene image set acquired in the step 1 into a convolutional neural network of a preset feature fusion neural network, carrying out model fine tuning on the convolutional neural network to obtain the convolutional neural network transferred to the research scene, and processing image information corresponding to the positioning point into image feature vectors through the convolutional neural network after model fine tuning.

The preset feature fusion neural network is mainly divided into an image feature extraction module and a feature fusion and decision module, an inclusion V3 convolutional neural network is selected as an image feature acquisition module, an output tensor of the convolutional neural network is spliced to a positioning feature input tensor of a shape [1, 2, 8+1], a feature fusion full-connection layer is jointly input, a fusion feature vector of a shape [1, 3, 8+1] is output, the fusion feature vector is input to a final decision full-connection layer, a prediction result of a shape [1, 8] is output, and an element represents the probability of a scene to a positioning point. For example, migrating to a convolutional neural network of a research scene, selecting a trained inclusion V3 model, replacing the final full-connection layer output tensor size of the inclusion V3 model with [1, 8], performing fine-tuning training on the inclusion V3 model by adopting the image set acquired in the step 1, updating parameters of the full-connection layer, and keeping original parameters of the convolutional layer. And (3) inputting the image corresponding to the positioning point acquired in the step (2) into the convolutional neural network finely tuned in the step (3.2), storing the output tensor of the network, and corresponding to the specific positioning point.

Step 4, overlapping the positioning points and the scene base map to obtain position characteristic vectors of the positioning points in the scene;

in specific implementation, the process of acquiring the position feature vector of the positioning point in the step 4 comprises the following substeps:

step 4.1, initializing a position feature vector, wherein the element bit number of the position feature vector is 9 because the situation that the positioning points are positioned outside all scenes is included, namely representing the scene category number 8 plus 1, taking the first item as a feature expression item of which the positioning points are positioned outside the scenes, and assigning the value of the feature vector to be {0}⁹；

Step 4.2, judging the relation between the positioning point and each scene, and if the positioning point falls into the kth scene, assigning the (k + 1) th item of the position feature vector to be 1, wherein the form is as follows:

if the positioning point is not in any scene, assigning the first item of the positioning feature vector to be 1, which is in the form of:

and 4.3, storing the position characteristic vector and corresponding to a specific positioning point.

Step 5, overlapping an error circle with the positioning point as the center and the positioning error as the radius with a scene base map, solving the intersection area with various scenes, and obtaining the relation characteristic vector of the positioning point and each scene;

in specific implementation, the process of obtaining the relationship feature vector between the positioning point and each scene in step 5 includes the following substeps:

step 5.1, initializing a relation feature vector with each scene, wherein the feature vector has 8-bit elements and is assigned to {0} according to the significance degree of the relation between each scene and the positioning point⁸；

And 5.2, overlapping the scene boundary with an error circle which takes the positioning point location center and a preset positioning error 5m as the radius to calculate the intersecting area, traversing each scene, accumulating the area values according to the scene type, and assigning the area values to corresponding elements of the relation characteristic vector. Finally, normalizing the feature vectors, wherein the form is as follows:

and 5.3, storing the characteristic vectors related to each scene, and corresponding to specific positioning points.

Step 6, splicing the feature vectors obtained in the step 4 and the step 5 into positioning feature vectors, splicing the positioning feature vectors with the corresponding image feature vectors obtained in the step 3, inputting a feature fusion and final decision full-connection layer module, and training parameters of the module;

in specific implementation, the feature fusion and final decision module training process in step 6 includes the following substeps:

step 6.1, the position characteristic vector V obtained by each positioning point through the step 4 and the step 5_locationAnd a feature vector V in relation to each scene_relationSplicing into a positioning feature vector V_positioningThe shape is as follows:

step 6.2, the corresponding image feature vectors obtained in the step 3.3 and the positioning feature vectors are spliced to form [1, 3 x 8+1]]Characteristic vector V of_fuseThe shape is as follows:

step 6.3, feature vector V_fuseInputting a feature fusion and final decision module, wherein the feature vector is output in a form of [1, 3 x 8+1] by a feature fusion full-connection layer]Fusing the feature vectors, inputting the fused feature vectors through a final decision full-link layer, finally outputting the prediction probability vectors of various scenes, and participating in counting with the positioning point labelsAnd calculating loss values and updating parameters of each layer of the module.

Step 7, fixing the parameters of the image feature extraction module finished by fine tuning in the step 3 and the feature fusion module finished by training in the step 6 to obtain a combined feature fusion neural network;

and 8, inputting the image information of the positioning points in the scene of the railway station and the positioning characteristic vectors generated by superposition of the base map into the combined characteristic fusion neural network to obtain scene prediction characteristic vectors, and taking the scene type corresponding to the highest probability item as a scene recognition result.

In specific implementation, the process of identifying the scene where the positioning point is located by using the trained feature fusion neural network in step 8 includes the following substeps:

step 8.1, obtaining positioning characteristic vectors from the positioning information of the positioning points in the modes of step 4 and step 5;

and 8.2, respectively inputting the images and the positioning characteristic vectors of the positioning points into a model from an input layer of the image characteristic extraction module and a positioning characteristic input layer, and outputting probability values of the positioning points belonging to all scenes.

Fig. 4 is a comparison graph of scene recognition results obtained by the method in the prior art and the method of the present embodiment, where the left side is a scene recognition result graph obtained by the method in the prior art, and black dots are scene prediction error dots. The right graph is a scene recognition result graph obtained by the method, and particularly when scene recognition of the anchor point image is performed only through fine tuning of the inclusion V3 convolutional neural network, the convolutional neural network is difficult to extract obvious scene features due to the fact that the real-time image of the mobile device has large random disturbance and noise, and the number of points with prediction errors is large. The identification strategy of fusing the image features and the positioning features in the invention can make full use of the two types of features to carry out joint decision, so that the number of the prediction error results is greatly reduced, and the prediction accuracy is improved.

Based on the same inventive concept, the application also provides a device corresponding to the scene identification method based on indoor opportunity signal enhancement in the first embodiment, which is detailed in the second embodiment.

Example two

The present embodiment provides a scene recognition apparatus based on indoor opportunistic signal enhancement, please refer to fig. 5, the apparatus includes:

a scene image set acquisition module 501, configured to acquire a typical scene image set for a preset application scene;

a positioning information and image information acquisition module 502, configured to acquire positioning information and image information of a mobile device on a main road of a scene to be researched in combination with a scene base map corresponding to the scene to be researched, where the positioning information includes positioning points;

the migration learning module 503 is configured to input the scene image set into a preset feature fusion neural network, perform model fine tuning on a convolutional neural network module of the preset feature fusion neural network, and obtain a convolutional neural network model migrated to a scene to be studied, where the preset feature fusion neural network includes an image feature extraction module and a feature fusion decision module, the image feature extraction module is implemented by the convolutional neural network module, and converts image information corresponding to the location point into an image feature vector;

the position feature vector calculation module 504 is configured to superimpose the positioning point and the scene base map to obtain a position feature vector of the positioning point in the scene;

a relation feature vector calculation module 505, configured to superimpose an error circle with a positioning point as a center and a preset positioning error as a radius with a scene base map, to obtain an intersection area of the error circle and each scene, and obtain a relation feature vector between the positioning point and each scene according to the intersection area;

a feature fusion decision module training module 506, configured to splice the location feature vectors of the location points in the scenes and the relationship feature vectors of the location points and each scene into location feature vectors, and then splice the location feature vectors and the image feature vectors corresponding to the location points, and input the location feature vectors into the feature fusion decision module, and train parameters of the feature fusion decision module;

a merging module 507, configured to fix parameters of the image feature extraction module obtained after model fine tuning and the feature fusion decision module trained in the step, so as to obtain a merged feature fusion neural network;

the prediction module 508 is configured to input the image information corresponding to the location point and the relationship feature vector of the location point and each scene into the merged feature fusion neural network to obtain a scene prediction feature vector, and use a scene type corresponding to a highest probability item in the prediction feature vector as a scene recognition result.

In one embodiment, the migration learning module 503 is specifically configured to:

inputting the scene image set into a preset feature fusion neural network, training a convolutional neural network module, updating full-connection layer parameters of the convolutional neural network module, keeping parameters of a convolutional layer, obtaining a convolutional neural network module after model fine tuning, and taking the convolutional neural network module as an image feature extraction module after model fine tuning;

and inputting the image information corresponding to the positioning points into the model fine tuning post-convolution neural network module to obtain an output tensor, wherein the output tensor is used as an image characteristic vector, and the image characteristic vector corresponds to the positioning points.

In one embodiment, the location feature vector calculation module 504 is specifically configured to:

the position feature vector is initialized and assigned as

judging the relationship between the positioning point and each scene, if the positioning point falls into the kth scene, assigning the (k + 1) th item of the position feature vector as 1, specifically:

and storing the position characteristic vector, and corresponding the position characteristic vector to a specific positioning point.

In one embodiment, the relationship feature vector calculation module 505 is specifically configured to:

initializing a feature vector of relationship with each scene, the feature vector having N_categoryBit elements corresponding to the significance of the relationship between each scene and the anchor point, wherein N is_categoryTo the total number of scene categories, assign it as

Setting a preset positioning error R between the boundary of the scene and the center of the positioning point_noiseOverlapping error circles with the radius to calculate the intersection area, traversing each scene, accumulating the area values according to the scene types, assigning the area values to corresponding elements of the relation characteristic vectors, and normalizing the characteristic vectors to obtain the relation characteristic vectors, wherein the form is as follows:

{S_i/1N_categoryS_i}

and storing the relation characteristic vectors of the positioning points and each scene, and corresponding the relation characteristic vectors to specific positioning points.

In one embodiment, the feature fusion decision module training module 506 is specifically configured to:

the position feature vector V of each positioning point_locationAnd each positioning point and each scene relation feature vector V_relationSplicing into a positioning feature vector V_positioningThe shape is as follows:

splicing the image feature vector and the positioning feature vectorIs shaped into [1, 3 x N ]_category+1]Characteristic vector V of_fuseThe shape is as follows:

splicing the characteristic vector V_fuseInputting a feature fusion decision module, and outputting a feature fusion full-connection layer of the feature fusion decision module in a form of [1, 3 x N ]_category+1]And then inputting the fusion feature vector through a final prediction full-connection layer, training parameters of the feature fusion decision module, and obtaining the trained feature fusion decision module.

In one embodiment, the prediction module 508 is specifically configured to:

Since the apparatus described in the second embodiment of the present invention is an apparatus used for implementing the scene recognition method based on the indoor opportunistic signal enhancement in the first embodiment of the present invention, a person skilled in the art can understand the specific structure and deformation of the apparatus based on the method described in the first embodiment of the present invention, and thus, the detailed description thereof is omitted here. All the devices adopted in the method of the first embodiment of the present invention belong to the protection scope of the present invention.

EXAMPLE III

Based on the same inventive concept, the present application further provides a computer-readable storage medium 600, please refer to fig. 6, on which a computer program 611 is stored, which when executed implements the method in the first embodiment.

Since the computer-readable storage medium introduced in the third embodiment of the present invention is a computer-readable storage medium used for implementing the scene identification method based on indoor opportunistic signal enhancement in the first embodiment of the present invention, based on the method introduced in the first embodiment of the present invention, persons skilled in the art can understand the specific structure and deformation of the computer-readable storage medium, and thus details are not described herein. Any computer readable storage medium used in the method of the first embodiment of the present invention falls within the intended scope of the present invention.

Example four

Based on the same inventive concept, the present application further provides a computer device, please refer to fig. 7, which includes a memory 701, a processor 702, and a computer program 703 stored in the memory and executable on the processor, and the processor implements the method of the first embodiment when executing the program.

Since the computer device introduced in the fourth embodiment of the present invention is a device used for implementing the scene recognition method based on indoor opportunistic signal enhancement in the first embodiment of the present invention, based on the method introduced in the first embodiment of the present invention, a person skilled in the art can know the specific structure and deformation of the computer device, and thus, details are not described herein. All the computer devices adopted in the method of the first embodiment of the present invention are within the scope of the present invention.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.

It will be apparent to those skilled in the art that various modifications and variations can be made in the embodiments of the present invention without departing from the spirit or scope of the embodiments of the invention. Thus, if such modifications and variations of the embodiments of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to encompass such modifications and variations.

Claims

1. A scene identification method based on indoor opportunistic signal enhancement is characterized by comprising the following steps:

step S7: fixing parameters of an image feature extraction module obtained after model fine tuning and a trained feature fusion decision module to obtain a combined feature fusion neural network;

2. The method according to claim 1, wherein step S3 specifically comprises:

3. The method according to claim 1, wherein step S4 specifically comprises:

step S4.1, the position characteristic vector is initialized and is assigned as

4. The method according to claim 1, wherein step S5 specifically comprises:

5. The method according to claim 1, wherein step S6 specifically comprises:

wherein N is_categoryIs the total number of scene categories, V_imageIs the image feature vector;

6. The method according to claim 1, wherein step S8 specifically comprises:

7. A scene recognition device based on indoor signal of opportunity enhancement, comprising:

the merging module is used for fixing the parameters of the image feature extraction module obtained after model fine tuning and the trained feature fusion decision module to obtain a merged feature fusion neural network;

8. The apparatus of claim 7, wherein the migration learning module is specifically configured to:

9. A computer-readable storage medium, on which a computer program is stored, characterized in that the program, when executed, implements the method of any one of claims 1 to 6.

10. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1 to 6 when executing the program.