WO2021093679A1 - 视觉定位方法和装置 - Google Patents
视觉定位方法和装置 Download PDFInfo
- Publication number
- WO2021093679A1 WO2021093679A1 PCT/CN2020/127005 CN2020127005W WO2021093679A1 WO 2021093679 A1 WO2021093679 A1 WO 2021093679A1 CN 2020127005 W CN2020127005 W CN 2020127005W WO 2021093679 A1 WO2021093679 A1 WO 2021093679A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- pose
- model
- image
- initial
- poses
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01C—MEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
- G01C11/00—Photogrammetry or videogrammetry, e.g. stereogrammetry; Photographic surveying
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01C—MEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
- G01C21/00—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
- G01C21/26—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 specially adapted for navigation in a road network
- G01C21/28—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 specially adapted for navigation in a road network with correlation of data from several navigational instruments
- G01C21/30—Map- or contour-matching
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/35—Categorising the entire scene, e.g. birthday party or wedding scene
- G06V20/38—Outdoor scenes
- G06V20/39—Urban scenes
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01C—MEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
- G01C21/00—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
- G01C21/20—Instruments for performing navigational calculations
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01C—MEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
- G01C21/00—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
- G01C21/26—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 specially adapted for navigation in a road network
- G01C21/34—Route searching; Route guidance
- G01C21/36—Input/output arrangements for on-board computers
- G01C21/3602—Input other than that of destination using image analysis, e.g. detection of road signs, lanes, buildings, real preceding vehicles using a camera
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01C—MEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
- G01C21/00—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
- G01C21/26—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 specially adapted for navigation in a road network
- G01C21/34—Route searching; Route guidance
- G01C21/36—Input/output arrangements for on-board computers
- G01C21/3626—Details of the output of route guidance instructions
- G01C21/3647—Guidance involving output of stored or live camera images or video streams
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/73—Determining position or orientation of objects or cameras using feature-based methods
- G06T7/75—Determining position or orientation of objects or cameras using feature-based methods involving models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/761—Proximity, similarity or dissimilarity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/10—Terrestrial scenes
- G06V20/176—Urban or other man-made structures
Definitions
- This application relates to an intelligent perception technology, in particular to a visual positioning method and device.
- Visual positioning is to use the image or video taken by the camera to locate and accurately locate the position and posture of the camera in the real world. Visual positioning is a hot issue in the field of computer vision in recent years. It is of great significance in many fields such as augmented reality, interactive virtual reality, robot visual navigation, public scene monitoring, and intelligent transportation.
- Visual positioning technology includes visual positioning methods based on drone basic maps or satellite maps.
- the drone/satellite basic map (Aerial Model) is mainly obtained by oblique photography of the scene by the drone, and 3D reconstruction of the structure from Motion (SFM) based on the collected data, or the scene is performed by the satellite The white model is reconstructed.
- SFM Structure from Motion
- Based on the visual positioning method of the drone basic map or satellite map use the drone/satellite basic map (Aerial Model) to locate the image or video taken by the camera, and obtain the camera's position in the drone/satellite basic map. 6 degrees of freedom (Degree of freedom, DoF) pose (Pose). This type of visual positioning technology can cope with the visual positioning of large-scale scenes.
- the above-mentioned visual positioning method based on UAV basic map or satellite map has the problems of low positioning success rate and low positioning accuracy.
- the present application provides a visual positioning method and device to avoid waste of resources and improve positioning success rate and positioning accuracy.
- an embodiment of the present application provides a visual positioning method, and the method may include: acquiring a first image that is collected. According to the first image and the aerial model, the first pose is determined. Determine whether there is a ground model corresponding to the first pose in the air-ground model. When there is a ground model corresponding to the first pose, the second pose is determined according to the ground model.
- the air-ground model includes an aerial model and a ground model mapped to the aerial model.
- the coordinate system of the ground model is the same as the coordinate system of the aerial model, and the positioning accuracy of the second pose is higher than that of the first pose .
- the server determines the first pose according to the first image and the aerial model, and determines whether there is a ground model corresponding to the first pose in the air-ground model.
- the second pose is determined according to the ground model, and the first pose is determined based on the aerial model first.
- This can achieve fast, efficient, and coarse positioning that can be applied to a large range of scenes to meet the needs of the county. District-level, prefecture-level visual positioning requirements, and then based on the first pose, refined visual positioning based on the ground model, so as to achieve hierarchical visual positioning, and improve the positioning accuracy and success rate of visual positioning.
- determining the first pose according to the first image and the aerial model may include: determining the initial pose set according to the position information of the terminal device corresponding to the first image and the magnetometer angle deflection information. Obtain the semantic information of the skyline and the building line surface of the first image according to the first image. According to the skyline and aerial model of the first image, N initial poses are determined in the initial pose set. According to the semantic information of the building line and surface, N initial poses and aerial models, the first pose is determined. Wherein, N is an integer greater than 1.
- This initial pose can also be referred to as a candidate pose.
- the method may further include: acquiring the collected at least one second image, where the shooting fields of the first image and the at least one second image have an intersection. For example, the viewing angles of the first image and the at least one second image are different.
- the optimized N initial poses are determined.
- determining the first pose includes: determining the first pose according to the semantic information of the building line and surface, the optimized N initial poses and the aerial model Posture.
- the relative pose between the first image and the at least one second image may be given by a Simultaneous Localization and Mapping (SLAM) algorithm.
- SLAM Simultaneous Localization and Mapping
- the method may further include: determining the optimized N initial poses according to the N initial poses and the relative poses between the first image and the at least one second image.
- the initial pose set includes multiple sets of initial poses, each set of initial poses includes initial position information and initial magnetometer angle deflection information, and the initial position information falls within the first threshold range, and the first threshold range To be determined based on the position information of the terminal device, the initial magnetometer angle deflection information falls within a second threshold range, and the second threshold range is determined based on the magnetometer angle deflection information of the terminal device.
- the center value of the first threshold value range is the position information of the terminal device
- the center value of the second threshold value range is the magnetometer angle deflection information of the terminal device
- determining N initial poses in the set of initial poses according to the skyline and aerial model of the first image includes: performing each set of initial poses and the aerial model separately Skyline rendering, get the skyline corresponding to each set of initial poses.
- the matching degree between the skyline corresponding to each group of initial poses and the skyline of the first image is calculated respectively, and the matching degree of each group of initial poses is determined.
- N initial poses are determined in the initial pose set, and the N initial poses are the first N initial poses in the initial pose set in descending order of matching degree .
- the method may further include: constructing an air-ground model based on a plurality of third images of the constructed ground model and an aerial model.
- the third image may include a skyline.
- constructing the air-ground model based on the multiple third images of the ground model and the aerial model may include: according to the aerial model, determining that the multiple third images are in the aerial model In the pose.
- the air-ground model is determined according to the poses of the multiple third images in the aerial model and the poses of the multiple third images in the ground model.
- the air-ground model is determined according to the poses of the multiple third images in the aerial model and the poses of the multiple third images in the ground model, including: according to the multiple third images in the aerial model
- the poses in and the poses of the multiple third images in the ground model determine multiple coordinate conversion relationships.
- the semantic reprojection error of the multiple third images in the aerial model is determined according to various coordinate conversion relationships, and the optimal coordinate conversion relationship is selected as the air-ground model from the multiple coordinate conversion relationships.
- the optimal coordinate conversion relationship is the coordinate conversion relationship that minimizes the semantic reprojection error.
- the method may further include: acquiring multiple third images and gravity information corresponding to each third image; constructing the ground model according to the third image and the gravity information corresponding to each third image .
- the gravity information can be obtained by SLAM, and the gravity information is used to obtain the roll angle and pitch angle of the camera coordinate system.
- the construction of the air-ground model based on the multiple third images and the aerial photography model of the ground model may include: constructing the air-ground model according to the ground model and the aerial photography model.
- the method may further include: determining the virtual object description information according to the first pose or the second pose. Sending the virtual object description information to the terminal device, where the first virtual object description information is used to display the corresponding virtual object on the terminal device.
- an embodiment of the present application provides a visual positioning method, which may include: collecting a first image, and displaying the first image on a user interface, the first image including the captured skyline.
- Receive the first virtual object description information sent by the server the first virtual object description information is determined according to the first pose, and the first pose is determined according to the semantic information of the skyline and building lines of the first image, and the aerial model of.
- the virtual object corresponding to the first virtual object description information is superimposed and displayed on the user interface.
- the terminal device sends the first image to the server, receives the first virtual object description information sent by the server, and the terminal device displays the virtual object corresponding to the first virtual object description information on the user interface.
- the first virtual object description information is Determined according to the first pose, the first pose is determined based on the semantic information of the skyline and the building line surface of the first image, and the aerial model.
- the positioning accuracy of the first pose is higher than that of the prior art visual positioning
- the positioning accuracy of the method makes the virtual object displayed based on the first pose more fine and accurate.
- the method may further include: displaying first prompt information on the user interface, and the first prompt information is used to prompt the user to photograph the skyline.
- the method further includes: receiving an indication message sent by the server, where the indication message is used to indicate that there is a ground model corresponding to the first pose in the air-ground model, and the ground model is used to determine the second pose,
- the air-ground model includes an aerial photography model and a ground model mapped to the aerial photography model, and the coordinate system of the ground model is the same as the coordinate system of the aerial photography model.
- the second prompt information is displayed on the user interface, and the second prompt information is used to prompt the user of the operation modes available for selection.
- the terminal device can display the prompt information that the ground model exists, so that the user can choose whether to calculate the second pose, that is, whether to perform more refined visual positioning to meet The needs of different users.
- the method further includes: receiving a relocation instruction input by the user through the user interface or on a hardware button, and in response to the relocation instruction, sending a positioning optimization request message to the server, where the positioning optimization request message is used for Request to calculate the second pose.
- Receive the second virtual object description information sent by the server the second virtual object description information is determined according to the second pose, the second pose is determined according to the ground model corresponding to the first pose, and the second pose The positioning accuracy is higher than that of the first pose.
- an embodiment of the present application provides an air-ground model modeling method.
- the method may include: acquiring a plurality of third images for constructing a ground model. Determine the first pose of the multiple third images of the ground model in the aerial model. According to the first pose of the plurality of third images in the aerial model and the second pose of the plurality of third images in the ground model, the aerial model and the ground model are aligned to obtain the air-ground model.
- the air-ground model includes an aerial model and a ground model mapped to the aerial model, and the coordinate system of the ground model is the same as the coordinate system of the aerial model.
- the third image may include a skyline.
- the aerial model and the ground model are aligned to obtain the air-ground model, including:
- the poses of the third image in the aerial model and the poses of the multiple third images in the ground model determine a variety of coordinate conversion relationships.
- the coordinate conversion relationship is used to align the aerial model and the ground model.
- the ground model is mapped to the aerial model to obtain the air-ground model.
- the optimal coordinate conversion relationship is the coordinate conversion relationship that minimizes the semantic reprojection error.
- the method may further include: acquiring multiple third images and gravity information corresponding to each third image; constructing the ground model according to the third image and the gravity information corresponding to each third image .
- an embodiment of the present application provides a visual positioning device, which can be used as a server or an internal chip of the server, and the visual positioning device is used to implement the above-mentioned first aspect or any possible implementation manner of the first aspect
- the visual positioning method in.
- the visual positioning device may include a module or unit for executing the visual positioning method in the first aspect or any possible implementation of the first aspect, for example, a transceiver module or unit, and a processing module or unit.
- an embodiment of the present application provides a visual positioning device that can be used as a server or an internal chip of the server.
- the visual positioning device includes a memory and a processor.
- the memory is used to store instructions
- the processor is used to execute The instructions stored in the memory and the execution of the instructions stored in the memory enable the processor to execute the above-mentioned first aspect or the visual positioning method in any possible implementation manner of the first aspect.
- an embodiment of the present application provides a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, the method in the first aspect or any possible implementation manner of the first aspect is implemented .
- an embodiment of the present application provides a visual positioning device, which can be used as a terminal device, and the visual positioning device is used to perform the visual positioning in the second aspect or any possible implementation of the second aspect.
- the visual positioning device may include a module or unit for executing the visual positioning method in the second aspect or any possible implementation of the second aspect, for example, a transceiver module or unit, and a processing module or unit.
- an embodiment of the present application provides a visual positioning device.
- the communication device can be used as a terminal device.
- the visual positioning device includes a memory and a processor.
- the memory is used to store instructions
- the processor is used to execute instructions stored in the memory. And execution of the instructions stored in the memory enables the processor to execute the visual positioning method in the second aspect or any possible implementation of the second aspect.
- an embodiment of the present application provides a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, the method in the second aspect or any possible implementation manner of the second aspect is implemented .
- an embodiment of the present application provides a visual positioning device, which can be used as a server or an internal chip of the server, and the visual positioning device is used to implement the foregoing third aspect or any possible implementation manner of the third aspect
- the visual positioning device may include modules or units for executing the empty-ground model modeling method in the third aspect or any possible implementation of the third aspect, for example, acquisition modules or units, processing modules or units.
- an embodiment of the present application provides a visual positioning device that can be used as a server or an internal chip of the server.
- the visual positioning device includes a memory and a processor.
- the memory is used for storing instructions
- the processor is used for The instructions stored in the memory are executed, and the execution of the instructions stored in the memory causes the processor to execute the above-mentioned third aspect or the air-ground model modeling method in any possible implementation manner of the third aspect.
- an embodiment of the present application provides a computer-readable storage medium on which a computer program is stored.
- the program is executed by a processor, the third aspect or any one of the possible implementation manners of the third aspect is implemented. method.
- the embodiments of the present application provide a computer program product.
- the computer program product includes a computer program.
- the computer program When the computer program is executed by a computer or a processor, it is used to execute the first aspect or any possible implementation of the first aspect.
- an embodiment of the present application provides a visual positioning method, which may include: acquiring a first image and a second image that are collected.
- the initial pose set is determined according to the position information of the terminal device and the magnetometer angle deflection information corresponding to the first image.
- N optimized candidate poses are determined in the initial pose set.
- the first pose of the first image is determined according to the semantic information of the building line and surface, the N optimized candidate poses and the aerial model, where N is an integer greater than 1.
- the viewing angles of the first image and the second image are different.
- the skyline may include a vegetation skyline.
- the semantic information of the building line and surface may include the upper edge information of the building.
- the initial pose set includes multiple sets of initial poses, each set of initial poses includes initial position information and initial magnetometer angle deflection information, and the initial position information falls within the first threshold range, and the first A threshold range is determined according to the position information of the terminal device, the initial magnetometer angle deflection information belongs to a second threshold range, and the second threshold range is determined according to the magnetometer angle deflection information of the terminal device.
- the center value of the first threshold value range is the position information of the terminal device
- the center value of the second threshold value range is the magnetometer angle deflection information of the terminal device
- the skyline of the first image, the skyline of the second image, the relative pose, and the aerial model determine N optimized candidate positions in the initial pose set Pose, including: rendering the skyline according to each set of initial poses and aerial maps, and obtaining the skyline corresponding to each set of initial poses; respectively calculating the skyline corresponding to each set of initial poses and the skyline of the first image Matching degree; according to the matching degree of the skyline, the skyline of the second image and the relative pose, determine the weight of each group of initial poses; according to the weight of each group of initial poses, determine N optimized poses in the initial pose set
- the N optimized candidate poses are the first N poses in the initial pose set with weights sorted from small to large.
- determining the first pose of the first image according to the semantic information of the building line and surface, the N optimized candidate poses and the aerial model may include: according to the N optimized poses The candidate pose of the first image, the semantic information of the building line and surface of the first image, and the aerial model, respectively calculate the semantic reprojection error corresponding to each optimized candidate pose, and select the one with the smallest reprojection error as the first image First position.
- the method may further include: judging whether there is a ground model corresponding to the first pose in the air-ground model.
- the second pose is determined according to the ground model.
- the air-ground model includes an aerial model and a ground model mapped to the aerial model.
- the coordinate system of the ground model is the same as the coordinate system of the aerial model, and the positioning accuracy of the second pose is higher than that of the first pose .
- the method may further include: when there is no ground model corresponding to the first pose, determining the first virtual object description information according to the first pose; sending the first virtual object description to the terminal device Information, the first virtual object description information is used to display the corresponding virtual object on the terminal device.
- the method may further include: when a ground model corresponding to the first pose exists, determining the second virtual object description information according to the second pose; sending the second virtual object description information to the terminal device , The second virtual object description information is used to display the corresponding virtual object on the terminal device.
- an embodiment of the present application provides an air-ground model construction method.
- the method may include: acquiring multiple third images and gravity information corresponding to each third image; constructing a ground model based on the third image; Model and ground model to construct an air-ground model.
- at least one third image of the plurality of third images includes a skyline
- the gravity information can be obtained by SLAM
- the gravity information is used to obtain a roll angle and a pitch angle of the camera coordinate system.
- the above model can also be called a map, for example, an air-ground map, an aerial map, etc.
- constructing an air-ground model based on the aerial photography model and the ground model includes: according to the aerial photography model, determining the pose of the third image containing the skyline in the aerial photography model according to the aerial photography model; The pose of the third image containing the skyline in the third image in the aerial model and the pose of the third image containing the skyline in the plurality of third images in the ground model are determined to determine the air-ground model.
- Determining the air-ground model includes: according to the pose of the third image containing the skyline in the aerial model in the multiple third images and the pose of the third image containing the skyline in the ground model in the multiple third images , To determine a variety of coordinate conversion relationships;
- the semantic reprojection errors of the building lines and planes of the multiple third images in the aerial model are determined respectively according to multiple coordinate transformation relationships, and the optimal coordinate transformation relationship is selected from the multiple coordinate transformation relationships; among them, the optimal The coordinate conversion relationship is the coordinate conversion relationship that minimizes the semantic reprojection error of the building line and surface.
- the coordinate system of the ground model is converted to the coordinate system of the aerial model to obtain the air-to-ground model; wherein, the air-to-ground model includes the aerial model and the coordinate system is mapped to the ground of the aerial model Model, the coordinate system of the ground model is the same as the coordinate system of the aerial map.
- the visual positioning method and device of the embodiments of the present application perform visual positioning based on the semantic information of the skyline and building line surface of the first image, and/or the ground model in the open-ground model, which can improve the success rate and accuracy of visual positioning .
- FIG. 1 is a schematic diagram of an aerial photography model provided by an embodiment of the application
- FIG. 2 is a schematic diagram of a ground model, an aerial photography model, and an air-ground model provided by an embodiment of the application;
- FIG. 3 is a schematic diagram of an application scenario provided by an embodiment of the application.
- 4A is a schematic diagram of a user interface displayed on a screen of a terminal device according to an embodiment of the application
- 4B is a schematic diagram of a user interface displayed on the screen of the terminal device according to an embodiment of the application.
- FIG. 4C is a schematic diagram of a user interface displayed on a screen of a terminal device according to an embodiment of the application.
- FIG. 5 is a flowchart of a visual positioning method provided by an embodiment of the application.
- FIG. 6 is a flowchart of an improved Geo-localization method based on aerial photography model provided by an embodiment of the application
- FIG. 7A is a semantic segmentation effect diagram provided by an embodiment of this application.
- FIG. 7B is another semantic segmentation effect diagram provided by an embodiment of this application.
- FIG. 8 is a flowchart of another visual positioning method provided by an embodiment of the application.
- FIG. 9 is a schematic diagram of a user interface provided by an embodiment of the application.
- FIG. 10 is a flowchart of an air-ground model modeling method provided by an embodiment of the application.
- FIG. 11 is a schematic diagram of a user interface provided by an embodiment of this application.
- FIG. 12 is a schematic diagram of an air-ground model modeling provided by an embodiment of the application.
- FIG. 13 is a schematic structural diagram of a visual positioning device provided by an embodiment of the application.
- FIG. 14 is a schematic structural diagram of another visual positioning device provided by an embodiment of the application.
- 15 is a schematic structural diagram of another visual positioning device provided by an embodiment of the application.
- FIG. 16 is a schematic structural diagram of another visual positioning device provided by an embodiment of the application.
- At least one (item) refers to one or more, and “multiple” refers to two or more.
- “And/or” is used to describe the association relationship of associated objects, indicating that there can be three types of relationships, for example, “A and/or B” can mean: only A, only B, and both A and B , Where A and B can be singular or plural.
- the character “/” generally indicates that the associated objects before and after are in an “or” relationship.
- the following at least one item (a) or similar expressions refers to any combination of these items, including any combination of a single item (a) or a plurality of items (a).
- At least one of a, b, or c can mean: a, b, c, "a and b", “a and c", “b and c", or "a and b and c" ", where a, b, and c can be single or multiple.
- Visual Localization In the visual localization system, the pose of the camera coordinate system of the terminal device in the real world coordinate system is located, in order to seamlessly integrate the real world and the virtual world.
- Query image The image collected by the terminal device, and the current image frame used for visual positioning.
- Aerial Model Also known as drone/satellite basic map, the aerial model can be obtained in the following two ways: 1) Obliquely shoot the scene through the drone, and proceed according to the data collected from the shooting. The three-dimensional reconstruction of the structure from motion (SFM) is shown in Figure 1(a) and Figure 2(b). 2) Perform white mode reconstruction on the scene via satellite, as shown in Figure 1(b).
- SFM structure from motion
- Ground Model It is also called a map based on terminal equipment mapping, which collects data on the scene based on the terminal equipment, and performs SFM three-dimensional reconstruction based on the collected data to obtain the ground model, for example, the ground model It can be as shown in Figure 2(a).
- Aerial-Ground Model It can also be called an air-to-ground map.
- the aerial model and ground model are aligned through similar transformations to unify the two models into a global coordinate system.
- Figure 2(c) is the point cloud of the air-ground model
- Figure 2(d) is the point cloud reconstruction network based on the air-ground model (Reconstructed Meshes) .
- Geo-localization based on the aerial model based on the aerial model, locate the 6-dof pose of the camera coordinate system of the terminal device in the aerial model.
- Ground-localization based on the ground model based on the ground model, locate the 6-dof pose of the camera coordinate system of the terminal device in the ground model.
- Degree of freedom, DoF pose (Pose): Including (x, y, z) coordinates and angular deflection around three coordinate axes.
- the angular deflection around three coordinate axes is yaw ( yaw, pitch, roll.
- Terminal devices can be mobile phones, tablet personal computers, media players, smart TVs, laptop computers, personal digital assistants (PDAs), personal computers, smart watches , Augmented reality (AR) glasses and other wearable devices (wearable devices), in-vehicle devices, or Internet of Things (IOT) devices, etc., which are not limited in the embodiments of the present application.
- PDAs personal digital assistants
- AR Augmented reality
- IOT Internet of Things
- FIG. 3 is a schematic diagram of an application scenario provided by an embodiment of the application.
- the application scenario may include a terminal device 11 and a server 12.
- the terminal device 11 and the server 12 may communicate with each other, and the server 12
- a visual positioning service can be provided to the terminal device, and based on the visual positioning service, virtual object description information can be pushed to the terminal device 11 so that the terminal device can present a corresponding virtual object, which can be a virtual road sign, a virtual character, and the like.
- the embodiments of the present application provide a visual positioning method to improve the success rate and accuracy of visual positioning, so as to accurately push corresponding virtual object description information to the terminal device.
- a visual positioning method to improve the success rate and accuracy of visual positioning, so as to accurately push corresponding virtual object description information to the terminal device.
- the visual positioning method in the embodiments of the present application can be applied to fields such as AR navigation, AR human-computer interaction, assisted driving, automatic driving, etc., where the position and posture of the camera of the terminal device need to be located.
- visual navigation refers to guiding the user to a certain destination through interactive methods such as augmented reality.
- the user can see the suggested walking direction, distance from the destination and other information on the screen of the terminal device in real time.
- the virtual object is the walking direction of the J2-1-1B16 meeting room displayed on the screen, that is, passing Augmented reality shows users walking directions, etc.
- AR game interaction in a super-large scene can fix AR content in a specific geographic location
- the terminal device used by the user can use the visual positioning method in the embodiment of this application to display on the screen
- Corresponding virtual objects for example, the virtual character shown in Figure 4B, the virtual animation shown in Figure 4C
- the user can interact with the virtual object by clicking/swiping the screen of the terminal device, which can guide the virtual object and the real The world interacts.
- the terminal device 11 is usually provided with a camera, and the terminal device 11 can photograph the scene through the camera.
- the foregoing server 12 takes one server as an example for illustration, and this application is not limited thereto. For example, it may also be a server cluster including multiple servers.
- FIG. 5 is a flowchart of a visual positioning method provided by an embodiment of this application.
- the method in this embodiment involves a terminal device and a server. As shown in FIG. 5, the method in this embodiment may include:
- Step 101 The terminal device collects a first image.
- the terminal device collects a first image through a camera, and the first image may be a query image as described above.
- the smart phone can start the shooting function according to the trigger of the application program to collect the first image.
- the first image can be collected periodically, for example, 2 seconds, 30 seconds, etc., or the first image can be collected when a preset collection condition is met.
- the preset collection condition can be that the GPS data of the smartphone is in the preset range Inside.
- Step 102 The terminal device sends the first image to the server.
- the server receives the first image sent by the terminal device.
- Step 103 The server determines the first pose according to the first image and the aerial model.
- the method of determining the first pose in the embodiment of the present application can be referred to as an improved Geo-localization based on the Aerial Model, and the improved Geo-localization based on the Aerial Model. localization) can effectively combine the semantic information of the skyline and building lines of the first image to determine the first pose and improve the positioning success rate and positioning accuracy.
- the server may determine N initial poses according to the skyline of the first image, and determine the first pose according to the semantic information of the building line and surface of the first image, the N initial poses and the aerial model. For example, traverse the N initial poses, calculate the semantic reprojection error of the N initial poses, and determine the first pose according to the semantic reprojection error.
- the method of calculating the semantic reprojection error of the N initial poses may be to render the edges and faces of the building according to the N initial poses and the aerial model, respectively, to obtain the rendered semantic segmentation map, and to calculate the rendered
- the matching error between the semantic segmentation map and the semantic information (for example, the semantic segmentation map) of the building line and surface of the first image is the semantic reprojection error.
- N is an integer greater than 1.
- the initial pose set is determined according to the position information of the terminal device corresponding to the first image and the magnetometer angle deflection information. Obtain the semantic information of the skyline and the building line surface of the first image according to the first image. According to the skyline and aerial model of the first image, N initial poses are determined in the initial pose set. According to the semantic information of the building line and surface, N initial poses and aerial models, the first pose is determined. For the specific implementation, refer to the explanation of the embodiment shown in FIG. 6.
- the server may also receive at least one second image collected by the terminal device, and the server may optimize the N initial poses according to the at least one second image, and determine the optimized N initial poses, according to The semantic information of the building line and surface of the first image and the optimized N initial poses are used to determine the first pose. That is, combined with multiple frames of images to assist in solving the pose of the first image. There is an intersection between the shooting field of view of the at least one second image and the first image.
- the viewing angles of the at least one second image and the first image are different.
- Step 104 The server determines the first virtual object description information according to the first pose, and sends the first virtual object description information to the terminal device.
- the server may determine the first virtual object description information according to the first pose, and the first virtual object description information is used to display the corresponding virtual object on the terminal device, for example, the walking guide icon shown in FIG. 4A.
- the icon is displayed in the actual scene in the real world, that is, displayed on the street as shown in FIG. 4A.
- Step 105 The terminal device displays the virtual object corresponding to the first virtual object description information on the user interface.
- the terminal device displays the virtual object corresponding to the first virtual object description information on the user interface, the user interface displays the actual scene of the real world, and the virtual object may be displayed on the user interface in an augmented reality manner.
- the terminal device sends the first image to the server, and the server determines the first pose according to the semantic information of the skyline and building lines of the first image, and the aerial model, and the server determines the first virtual object according to the first pose Descriptive information, sending the first virtual object description information to the terminal device, and the terminal device displays the virtual object corresponding to the first virtual object description information on the user interface, based on the semantic information of the skyline and building lines of the first image, Determining the first pose can improve the success rate and accuracy of visual positioning.
- this embodiment can effectively combine the advantages of visual positioning based on aerial maps and visual positioning based on mobile phone refined mapping.
- FIG. 6 is a flowchart of an improved Geo-localization method based on aerial photography model provided by an embodiment of the application.
- the execution subject of this embodiment may be a server or an internal chip of the server, as shown in FIG. 6,
- the method of this embodiment may include:
- Step 201 Determine an initial pose set according to the position information of the terminal device and the magnetometer angle deflection information corresponding to the first image.
- the position information of the terminal device corresponding to the first image may be Global Positioning System (GPS) information, and the magnetometer angle deflection information may be a yaw angle.
- GPS Global Positioning System
- the position information and the magnetometer angle deflection information may be the position information and the magnetometer angle deflection information when the terminal device collects the first image, which may be obtained through the wireless communication module and the magnetometer of the terminal device.
- the initial pose set may include multiple sets of initial poses, and each set of initial poses may include initial position information and initial magnetometer angle deflection information.
- the initial position information falls within a first threshold range, and the first threshold range is based on the terminal If the location information of the device is determined, the initial magnetometer angle deflection information belongs to a second threshold value range, and the second threshold value range is determined according to the magnetometer angle deflection information of the terminal device.
- the terminal device may construct a position candidate set (T) and a yaw angle candidate set (Y), respectively, according to the position information of the terminal device corresponding to the first image and the magnetometer angle deflection information, and the position preparation
- the selected set (T) includes multiple initial position information
- the yaw angle candidate set (Y) includes multiple yaw angles
- Angles can form a set of initial poses, so that multiple sets of initial poses can be formed.
- An achievable way of constructing the position candidate set (T) is to select a position point as the initial position information in the position candidate set (T) within a range of a region with a first preset interval as the interval.
- the range may be a range in which the position information (x, y) of the terminal device corresponding to the first image is the center of the circle, and the radius is the first threshold. That is, the center value of the above-mentioned first threshold value range is the location information of the terminal device.
- the first threshold may be 30 meters, 35 meters, and so on.
- the first preset interval may be 1 meter.
- the angle range may be the range of the plus or minus second threshold of the yaw angle of the terminal device corresponding to the first image. That is, the center value of the above second threshold value range is the magnetometer angle deflection information of the terminal device.
- the second threshold may be 90 degrees, 85 degrees, and so on.
- the second preset interval may be 0.1 degree.
- Step 202 Obtain the semantic information of the skyline and the building line surface of the first image according to the first image.
- different types of semantic segmentation can be performed on the first image, and the skyline of the first image can be extracted.
- the different categories may include vegetation, buildings, sky, etc.
- Semantic segmentation of the horizontal and vertical lines of the building can be performed on the first image to obtain the semantic information of the line and surface of the building in the first image.
- the horizontal and vertical planes of the building include the edges (horizontal and vertical sides) and the plane of the building.
- the first image (for example, the leftmost image in FIG. 7A) is input to a first semantic segmentation network, and the first semantic segmentation network is used to distinguish buildings and sky. , Vegetation, ground, etc., output a semantic segmentation effect map (for example, the image in the middle of FIG. 7A), and extract the skyline based on the semantic segmentation effect map to obtain the skyline of the first image (for example, the rightmost image in FIG. 7A).
- a semantic segmentation effect map for example, the image in the middle of FIG. 7A
- extract the skyline based on the semantic segmentation effect map to obtain the skyline of the first image (for example, the rightmost image in FIG. 7A).
- the first semantic segmentation network can be any neural network, for example, a convolutional neural network.
- the first semantic segmentation network may be obtained after training using training data, that is, using the training data to train the first semantic segmentation network to distinguish buildings, sky, vegetation, ground, etc.
- the semantic segmentation task is a dense pixel-level classification task.
- the training strategy used in training is the standard cross-entropy loss, which measures the gap between the predicted value and the label value, and improves the prediction effect of the network by minimizing the loss :
- N represents all pixels
- p i represents the predicted value of any pixel that is predicted and labeled (ground truth) for the same category
- p j represents the predicted value of each category of any pixel.
- the semantic segmentation network of the embodiment of this application calculates a total of two parts of loss (Loss), the first part is the final output and the cross entropy of the label map L final , which is L in the above formula, and the second part is the regularization loss L weight .
- the embodiments of the present application alleviate overfitting by reducing the weights of features or penalizing unimportant features, while regularization helps to penalize feature weights, that is, the weights of features will also become part of the loss function of the model. Therefore, the overall loss (Loss) of the semantic segmentation network is as follows:
- ⁇ is a hyperparameter, which is used to control the importance of loss.
- the value of ⁇ is set to 1.
- the semantic segmentation network model is adjusted repeatedly and iteratively to minimize the overall loss (Loss) of the semantic segmentation network, thereby training the first semantic segmentation network.
- the first image (for example, the leftmost image in FIG. 7B) is input to the second semantic segmentation network, and the second semantic segmentation network is used to distinguish buildings in the image.
- Horizontal lines, vertical lines, building surfaces, etc. output semantic information of building lines and surfaces, for example, a building semantic segmentation map (for example, the rightmost image in FIG. 7B).
- the second semantic segmentation network can be any neural network, for example, a convolutional neural network.
- the second semantic segmentation network may be obtained after training using training data, that is, using the training data to train the second semantic segmentation network to distinguish between horizontal lines, vertical lines, building surfaces, etc.
- the specific training method can be similar to the training method of the first semantic segmentation network, which will not be repeated here.
- the skyline of the first image can also be corrected and adjusted.
- the angle can be simultaneous positioning and mapping (Simultaneous Localization). and Mapping (SLAM) algorithm, for example, the angle given by the SLAM algorithm, which is used to represent the relative relationship between the camera coordinate system of the terminal device and the direction of gravity.
- SLAM simultaneous positioning and mapping
- Step 203 Determine N initial poses in the initial pose set according to the skyline and aerial model of the first image.
- each set of initial poses may include Initial position information and initial magnetometer angle deflection information.
- For each set of initial poses perform skyline rendering according to the aerial photography model, obtain the skyline corresponding to each set of initial poses, and calculate the matching degree between the skyline corresponding to each set of initial poses and the skyline of the first image, and determine The matching degree of each set of initial poses.
- N initial poses are determined in the initial pose set. The N initial poses may have a higher matching degree in the initial pose set. N initial poses of.
- the specific implementation of calculating the matching degree between the skyline corresponding to each set of initial poses and the skyline of the first image can be: using the rendered skyline corresponding to the initial pose and the skyline of the first image in the form of a sliding window Perform matching (L2 distance or other distance metrics) to determine the degree of matching.
- N initial poses are used to participate in the visual positioning, which can improve the positioning success rate of the visual positioning.
- a set of initial poses is ((x 1 , y 1 ), yaw1), the initial pose is rendered based on the aerial model, the skyline corresponding to the initial pose is obtained, and the initial pose The corresponding skyline and the skyline of the first image are matched in the form of a sliding window to determine the matching degree of the initial pose.
- Step 204 Determine the optimized N initial poses according to the N initial poses, the skyline of the first image and the skyline of the at least one second image, and the relative poses between the first image and the at least one second image. Posture.
- step 205 may be directly executed to determine the first pose according to the semantic information of the building line and surface, the N initial poses and the aerial model.
- Step 204 is an optional step.
- step 203 is executed, multi-frame joint optimization is performed on the N initial poses determined in step 203 to obtain optimized N initial poses.
- An achievable manner of multi-frame joint optimization may be: optimizing the N initial poses in step 203 in combination with at least one second image.
- the at least one second image reference may be made to the explanation of step 103 in the embodiment shown in FIG. 5, which will not be repeated here.
- two second images are taken as an example for description.
- I 0 represent the first image
- I 1 and I 2 represent two second images.
- the N initial poses of I 0 are optimized.
- the relative pose relationship of the three images can be given by SLAM, denoted as Represents the posture conversion relationship of I 1 and I 2 with respect to I 0. You can use I 1 and I 2 to assist in solving the pose of I 0.
- the shooting fields of the three images have a certain intersection, and the overall field of view composed of the three images is relatively large.
- the specific optimization method can be:
- Poptini represents the optimized initial pose (optimized initial).
- Step 205 Determine the first pose according to the semantic information of the building line and surface, the optimized N initial poses and the aerial model.
- a more accurate pose can be solved by PnL (Perspective-n-Line), that is, use the following steps to optimize the 6dof pose of the first image I 0 , and use the optimized pose as the first pose .
- PnL Perspective-n-Line
- the step of determining the first pose may be:
- the 3D line segment information for example, the horizontal and vertical line segments of the building line
- step ac Randomly sample some poses near the solved pose, repeat step ac, if the calculated pose is better (using the semantic reprojection matching error from the aerial model to the image taken by the mobile phone as a measure), update the pose, Try to avoid the posture optimized in the above steps from falling into the local optimum.
- Step 206 Determine the optimized first pose according to the first pose and the relative pose between the first image and the at least one second image.
- the best 6dof pose estimation of the first image I 0 is obtained, That is, the first posture after optimization.
- N initial poses are determined according to the skyline of the first image, and the first pose is optimized on the basis of the N initial poses based on the semantic information of the building line and surface of the first image, so that Effectively combine the semantic information of the skyline and building lines of the first image to improve the positioning success rate and positioning accuracy of visual positioning.
- FIG. 8 is a flowchart of another visual positioning method provided by an embodiment of the application.
- the method in this embodiment involves terminal devices and servers.
- This embodiment is based on the embodiment shown in FIG. 5 and further combines the air-ground model to optimize The first pose of the first image to achieve more accurate visual positioning.
- the method of this embodiment may include:
- Step 301 The terminal device collects a first image.
- Step 302 The terminal device sends the first image to the server.
- Step 303 The server determines the first pose according to the first image and the aerial model.
- Step 304 The server determines the first virtual object description information according to the first pose, and sends the first virtual object description information to the terminal device.
- Step 305 The terminal device displays the virtual object corresponding to the first virtual object description information on the user interface.
- steps 301 to 305 For the explanation of steps 301 to 305, reference may be made to steps 101 to 105 of the embodiment shown in FIG. 5, which will not be repeated here.
- Step 306 The server judges whether there is a ground model corresponding to the first pose in the air-ground model. If yes, go to step 307.
- the air-ground model includes an aerial model and a ground model mapped to the aerial model, and the coordinate system of the ground model in the air-ground model is the same as the coordinate system of the aerial model.
- the specific construction method of the open-ground model please refer to the specific explanation of the embodiment shown in FIG. 10 below.
- Step 307 When there is a ground model corresponding to the first pose, determine the second pose according to the ground model.
- the positioning accuracy of the second pose is higher than that of the first pose.
- the refined visual positioning can be performed to determine the second pose.
- the refined visual positioning may include processing procedures such as image retrieval, feature point extraction, and feature point matching.
- Step 308 The server determines the second virtual object description information according to the second pose, and sends the second virtual object description information to the terminal device.
- the server may determine the first virtual object description information according to the second pose, and the second virtual object description information is used to display the corresponding virtual object on the terminal device, for example, the guide icon of the cafe as shown in FIG. 4A, The guide icon is displayed in the actual scene in the real world, that is, displayed on the building as shown in FIG. 4A.
- the second virtual object description information is determined based on a more refined second pose, and the virtual object corresponding to the second virtual object description information may be a more detailed virtual object.
- the virtual object corresponding to the first virtual object description information may be a road guide icon
- the virtual object corresponding to the second virtual object description information may be a guide icon of a shop in a street.
- Step 309 The terminal device displays the virtual object corresponding to the second virtual object description information on the user interface.
- the terminal device displays the virtual object corresponding to the first virtual object description information and the virtual object corresponding to the second virtual object description information on the user interface.
- the user interface displays the actual scene of the real world, and the virtual object can adopt augmented reality. Is displayed on the user interface.
- the server determines whether there is a ground model corresponding to the first pose in the air-ground model.
- the second pose is determined according to the ground model, and the server determines the second pose according to the second pose.
- the description information of the virtual object is sent to the terminal device, and the terminal device displays the virtual object corresponding to the description information of the second virtual object on the user interface, which can improve the success rate and accuracy of visual positioning, and the server to the terminal The accuracy of the description information of the virtual object pushed by the device.
- Fig. 9 is a schematic diagram of a user interface provided by an embodiment of the application. As shown in FIG. 9, the user interface 901-the user interface 904 are included.
- the terminal device may collect a first image, and the first image is presented in the user interface 901.
- first prompt information may also be displayed in the user interface 901, and the first prompt information is used to prompt the user to shoot the skyline.
- the first prompt information may be "guarantee to shoot the skyline”.
- the first image in the user interface 901 includes the skyline, so the visual positioning requirements can be met.
- the terminal device may send the first image to the server through the above step 302.
- the server may determine the first pose through the above steps 303 to 304, and send the terminal device the first virtual object description information corresponding to the first pose.
- the terminal device may display a user interface 902 according to the first virtual object description information, and the user interface 902 presents a virtual object corresponding to the first virtual object description information, for example, a cloud.
- the server may also determine whether there is a ground model corresponding to the first pose in the air-ground model according to the above step 306, and when there is a ground model corresponding to the first pose, it may send an instruction message to the terminal device, the instruction message is used to indicate the air-ground model There is a ground model corresponding to the first pose, and the terminal device can display second prompt information on the user interface according to the instruction message.
- the second prompt information is used to prompt the user for the operation mode that can be selected. For example, see User On the interface 903, the second prompt information is "whether further positioning is needed” and the operation icons "Yes" and "No".
- the user can click the "Yes" operation icon, and the terminal device sends a positioning optimization request message to the server according to the user's operation.
- the positioning optimization request message is used to request the calculation of the second pose.
- the server determines the second pose through step 307 and step 308, and sends the second virtual object description information to the terminal device, and the terminal device presents the second virtual object description information corresponding to the virtual object on the user interface, for example, the user interface 904,
- the user interface 904 presents virtual objects corresponding to the first virtual object description information, such as clouds, and virtual objects corresponding to the second virtual object description information, such as the sun and lightning.
- the server in the embodiment of the present application performs two operations. On the one hand, it is online visual positioning calculation, including the solution of the first pose and the second pose, as described in each of the foregoing embodiments. On the other hand, it is offline open space map construction, which can be seen in Figure 10 below for details.
- Offline air-to-ground map construction mainly refers to: the server obtains multiple images uploaded by the terminal device to construct the ground model, and determines the position of the multiple images in the aerial model through the improved Geo-localization visual positioning algorithm described in the embodiments of this application.
- the first pose, the SFM operation is performed on multiple images at the same time to build a ground model and obtain the poses of multiple images in the ground model.
- the first pose obtained in the aerial model through multiple images and the corresponding positions in the corresponding ground model Attitude, the aerial model and the ground model are aligned by semantic reprojection error to obtain the air-ground model.
- FIG. 10 is a flowchart of an air-ground model modeling method provided by an embodiment of the application. As shown in FIG. 10, the method may include:
- Step 401 Acquire multiple images for constructing a ground model.
- a user uses a terminal device to collect an image and upload it to a server, and the server performs 3D modeling based on the SFM algorithm to obtain a ground model. That is, the image is used to build a ground model. The image needs to be collected to the skyline.
- the prompt message on the user interface as shown in FIG. 11 may be used to prompt the user of the image shooting requirement.
- the image is an image for constructing a ground model.
- the point cloud of the ground model constructed based on the image can be shown in the second column.
- Step 402 Determine the poses of the multiple images for constructing the ground model in the aerial model.
- the aerial photography model can be the aerial photography model obtained by first using UAV/satellite to image the application scene, and then constructing a 2.5D model based on oblique photography.
- the image is an image collected by a drone/satellite used to construct an aerial model, and the point cloud of the constructed aerial model It can be as shown in the third column.
- the improved geo-localization method based on the aerial photography model as shown in FIG. 6 can be used to determine the pose of the image in the aerial model.
- the first image in the method shown in FIG. Replace each image in step 401 of this embodiment to determine the pose of each image in the aerial model.
- Step 403 According to the poses of the multiple images in the aerial model and the poses of the multiple images in the ground model, align the aerial model with the ground model to obtain an air-ground model.
- the air-ground model includes an aerial model and a ground model mapped to the aerial model, and the coordinate system of the ground model in the air-ground model is the same as the coordinate system of the aerial model.
- the point cloud of the constructed air-ground model may be shown in the image in the fourth column of each row as shown in FIG. 12, that is, the point cloud of the ground model and the point cloud of the aerial model are combined.
- the reconstructed network shown in the fifth column of each row as shown in FIG.
- An achievable way is to determine multiple coordinate conversion relationships based on the poses of multiple images in the aerial model and the poses of multiple images in the ground model.
- Determine the semantic reprojection error of multiple images in the aerial model according to various coordinate conversion relationships and select the optimal coordinate conversion relationship from the various coordinate conversion relationships as the coordinate conversion relationship of the air-ground model, and the coordinate conversion of the air-ground model
- the relationship is used to align the aerial model and the ground model.
- the ground model is mapped to the aerial model to obtain the air-ground model.
- the optimal coordinate conversion relationship is the coordinate conversion relationship that minimizes the semantic reprojection error. That is, through the above method, the ground model is registered in the aerial model, and the air-ground model is obtained.
- the specific achievable way of aligning the aerial model and the ground model can be:
- the pose in the aerial model is re-projected to render the semantic segmentation map on the corresponding 2.5D aerial model according to the converted pose, and the rendered semantic segmentation map is compared with The semantic information of the image determines the projection error.
- T i G2A corresponding to the minimum value is the optimal coordinate system conversion relationship of the open-ground model.
- an air-to-ground model is constructed by mapping the ground model to the aerial model.
- the air-to-ground model is a hierarchical model that combines the large-scale scene information of the aerial model and the refined information of the ground model, so that the use of
- the visual positioning method of the open-ground model can be fast, efficient, and suitable for coarse positioning of a large range of scenes, to meet the visual positioning needs of the county, district, and city levels, and based on the results of coarse positioning, perform refined visual positioning , So as to achieve hierarchical visual positioning and improve the accuracy of visual positioning.
- the embodiment of the present application also provides a visual positioning device for executing the method steps executed by the server or the processor of the server in the above method embodiments.
- the visual positioning device may include: a transceiver module 131 and a processing module 132.
- the processing module 132 is configured to obtain the first image collected by the terminal device through the transceiver module 131.
- the processing module 132 is also used to determine the first pose according to the first image and the aerial model. Determine whether there is a ground model corresponding to the first pose in the air-ground model; when there is a ground model corresponding to the first pose, determine the second pose according to the ground model.
- the air-to-ground model includes an aerial model and a ground model mapped to the aerial model.
- the coordinate system of the ground model in the air-to-ground model is the same as that of the aerial model, and the positioning accuracy of the second pose is higher than that of the first pose .
- the processing module 132 is configured to determine the initial pose set according to the position information of the terminal device corresponding to the first image and the magnetometer angle deflection information. Obtain the semantic information of the skyline and the building line surface of the first image according to the first image. According to the skyline and aerial model of the first image, N initial poses are determined in the initial pose set. According to the semantic information of the building line and surface, N initial poses and aerial models, the first pose is determined. Wherein, N is an integer greater than 1.
- the processing module 132 is further configured to obtain at least one second image collected by the terminal device through the transceiver module 131, and the shooting fields of the first image and the at least one second image overlap.
- the processing module 132 is further configured to determine the optimized posture based on the N initial poses, the skyline of the first image and the skyline of the at least one second image, and the relative poses between the first image and the at least one second image. N initial poses.
- the first pose is determined according to the semantic information of the building line and surface, the optimized N initial poses and the aerial photography model.
- the processing module 132 is further configured to determine the optimized N initial poses according to the N initial poses and the relative poses between the first image and the at least one second image.
- the initial pose set includes multiple sets of initial poses, and each set of initial poses includes initial position information and initial magnetometer angle deflection information.
- the initial position information falls within a first threshold range, and the first threshold range is based on If the position information of the terminal device is determined, the initial magnetometer angle deflection information falls within a second threshold value range, and the second threshold value range is determined according to the magnetometer angle deflection information of the terminal device.
- the center value of the first threshold value range is the position information of the terminal device
- the center value of the second threshold value range is the magnetometer angle deflection information of the terminal device
- the processing module 132 is configured to: perform skyline rendering according to each set of initial poses and the aerial model, and obtain the skyline corresponding to each set of initial poses; respectively calculate the corresponding set of initial poses.
- the degree of matching between the skyline and the skyline of the first image determines the degree of matching of each group of initial poses; according to the degree of matching of each group of initial poses, N initial poses are determined in the initial pose set.
- the poses are the first N initial poses in the initial pose set in the order of matching degrees from large to small.
- the processing module 132 is further used to construct an air-ground model based on the multiple third images of the ground model and the aerial model.
- the processing module 132 is configured to: determine the poses of the multiple third images in the aerial model according to the aerial photography model; according to the poses of the multiple third images in the aerial model and the multiple third images in the aerial model The pose in the ground model determines the air-ground model.
- the processing module 132 is configured to: determine a variety of coordinate conversion relationships according to the poses of the multiple third images in the aerial model and the poses of the multiple third images in the ground model;
- the coordinate conversion relationship determines the semantic reprojection error of multiple third images in the aerial model, and the optimal coordinate conversion relationship is selected as the air-ground model from the multiple coordinate conversion relationships.
- the optimal coordinate conversion relationship is the coordinate conversion relationship that minimizes the semantic reprojection error.
- the processing module 132 is further configured to determine the first virtual object description information according to the first pose.
- the first virtual object description information is sent to the terminal device through the transceiver module 131, and the first virtual object description information is used to display the corresponding virtual object on the terminal device.
- the processing module 132 is further configured to: determine the second virtual object description information according to the second pose.
- the second virtual object description information is sent to the terminal device through the transceiver module 131, and the second virtual object description information is used to display the corresponding virtual object on the terminal device.
- the visual positioning device provided in the embodiment of the present application can be used to execute the above-mentioned visual positioning method, and its content and effects can be referred to the method part, which will not be repeated in the embodiment of the present application.
- An embodiment of the present application also provides a visual positioning device.
- the visual positioning device includes a processor 1401 and a transmission interface 1402, and the transmission interface 1402 is used to obtain the first image collected.
- the transmission interface 1402 may include a transmission interface and a reception interface.
- the transmission interface 1402 may be any type of interface according to any proprietary or standardized interface protocol, such as high definition multimedia interface (HDMI), mobile Industrial processor interface (Mobile Industry Processor Interface, MIPI), MIPI standardized display serial interface (Display Serial Interface, DSI), Video Electronics Standards Association (Video Electronics Standards Association, VESA) standardized embedded display port (Embedded Display Port) , EDP), Display Port (DP) or V-By-One interface.
- HDMI high definition multimedia interface
- MIPI Mobile Industrial processor interface
- MIPI MIPI standardized display serial interface
- DSI Display Serial Interface
- Video Electronics Standards Association Video Electronics Standards Association
- VESA Video Electronics Standards Association
- EDP embedded Display Port
- DP Display Port
- V-By-One interface is a digital interface standard for image transmission development, as well as various wired or wireless interfaces, optical interfaces, etc.
- the processor 1401 is configured to call the program instructions stored in the memory to execute the visual positioning method as in the above method embodiment.
- the device further includes a memory 1403.
- the processor 1402 may be a single-core processor or a multi-core processor group
- the transmission interface 1402 is an interface for receiving or sending data
- the data processed by the visual positioning device may include audio data, video data, or image data.
- the visual positioning device may be a processor chip.
- the embodiments of the present application also provide a computer storage medium.
- the computer storage medium may include computer instructions.
- the computer instructions run on an electronic device, the electronic device executes each of the steps performed by the server in the foregoing method embodiments. step.
- inventions of the embodiments of the present application also provide a computer program product, which when the computer program product runs on a computer, causes the computer to execute each step executed by the server in the foregoing method embodiment.
- inventions of the embodiments of the present application also provide a device that has the function of realizing the server behavior in the foregoing method embodiment.
- the function can be realized by hardware, or by hardware executing corresponding software.
- the hardware or software includes one or more modules corresponding to the above-mentioned functions, for example, an acquisition unit or module, and a determination unit or module.
- the embodiment of the present application also provides a visual positioning device, which is used to execute the method steps executed by the terminal device or the processor of the terminal device in the above method embodiments.
- the visual positioning device may include: a processing module 151 and a transceiver module 152.
- the processing module 151 is configured to collect a first image and display the first image on the user interface, the first image including the captured skyline.
- the processing module 151 is also used to send the first image to the server through the transceiver module 152.
- the transceiver module 152 is also configured to receive the first virtual object description information sent by the server, the first virtual object description information is determined according to a first pose, and the first pose is based on the skyline and the first image of the first image. Semantic information of the building line and surface and the aerial photography model are determined.
- the processing module 151 is also configured to superimpose and display the virtual object corresponding to the first virtual object description information on the user interface.
- the processing module 151 is further configured to: before acquiring the first image, display first prompt information on the user interface, and the first prompt information is used to prompt the user to photograph the skyline.
- the transceiver module 152 is further configured to receive an indication message sent by the server, the indication message is used to indicate that there is a ground model corresponding to the first pose in the air-ground model, and the ground model is used to determine the second pose,
- the air-ground model includes an aerial model and a ground model mapped to the aerial model.
- the coordinate system of the ground model is the same as the coordinate system of the aerial model.
- the processing module 151 is further configured to display second prompt information on the user interface according to the instruction message, and the second prompt information is used to prompt the user of the operation modes available for selection.
- the processing module 151 is further configured to: receive a relocation instruction input by the user through the user interface or on a hardware button, and in response to the relocation instruction, send a positioning optimization request message to the server through the transceiver module 152, and the positioning optimization The request message is used to request the calculation of the second pose.
- the transceiver module 152 is also configured to receive second virtual object description information sent by the server, the second virtual object description information is determined according to a second pose, and the second pose is determined according to the ground model corresponding to the first pose , The positioning accuracy of the second pose is higher than that of the first pose.
- the visual positioning device provided in the embodiment of the present application can be used to execute the above-mentioned visual positioning method, and its content and effects can be referred to the method part, which will not be repeated in the embodiment of the present application.
- FIG. 16 is a schematic structural diagram of a vision processing device according to an embodiment of the application.
- the visual processing apparatus 1600 may be the terminal device involved in the foregoing embodiment.
- the vision processing device 1600 includes a processor 1601 and a transceiver 1602.
- the vision processing device 1600 further includes a memory 1603.
- the processor 1601, the transceiver 1602, and the memory 1603 can communicate with each other through an internal connection path to transfer control signals and/or data signals.
- the memory 1603 is used to store computer programs.
- the processor 1601 is configured to execute a computer program stored in the memory 1603, so as to implement various functions in the foregoing device embodiments.
- the memory 1603 may also be integrated in the processor 1601 or independent of the processor 1601.
- the visual processing device 1600 may further include an antenna 1604 for transmitting the signal output by the transceiver 1602.
- the transceiver 1602 receives signals through an antenna.
- the vision processing apparatus 1600 may further include a power supply 1605 for providing power to various devices or circuits in the terminal equipment.
- the visual processing device 1600 may also include one of an input unit 1606, a display unit 1607 (also can be considered as an output unit), an audio circuit 1608, a camera 1609, a sensor 1610, etc. Or more.
- the audio circuit may also include a speaker 16081, a microphone 16082, etc., which will not be described in detail.
- the embodiments of the present application also provide a computer storage medium.
- the computer storage medium may include computer instructions.
- the computer instructions run on an electronic device, the electronic device executes what is executed by the terminal device in the foregoing method embodiment. Various steps.
- inventions of the embodiments of the present application also provide a computer program product, which when the computer program product runs on a computer, causes the computer to execute each step performed by the terminal device in the foregoing method embodiment.
- inventions of the embodiments of the present application also provide a device that has the function of realizing the behavior of the terminal device in the foregoing method embodiment.
- the function can be realized by hardware, or by hardware executing corresponding software.
- the hardware or software includes one or more modules corresponding to the above-mentioned functions, for example, an acquisition unit or module, a sending unit or module, and a display unit or module.
- the embodiment of the present application also provides an air-ground model modeling device for executing the method steps of the embodiment shown in FIG. 10 above.
- the air-ground model modeling device may include: an acquisition module and a processing module.
- the acquisition module is used to acquire multiple images for constructing a ground model.
- the processing module is used to determine the poses of the multiple images used to construct the ground model in the aerial model.
- the processing module is further configured to align the aerial model and the ground model to obtain an air-ground model according to the poses of the multiple images in the aerial model and the poses of the multiple images in the ground model.
- the air-ground model includes an aerial model and a ground model mapped to the aerial model, and the coordinate system of the ground model in the air-ground model is the same as the coordinate system of the aerial model.
- the processing module is configured to: determine a variety of coordinate conversion relationships according to the poses of multiple images in the aerial model and the poses of multiple images in the ground model; and respectively according to the multiple coordinate conversion relationships , Determine the semantic reprojection error of multiple images in the aerial model, and select the optimal coordinate transformation relationship from the multiple coordinate transformation relationships as the coordinate transformation relationship of the air-ground model.
- the coordinate transformation relationship of the air-ground model is used to align the aerial model and Ground model: According to the coordinate conversion relationship of the air-ground model, the ground model is mapped to the aerial model to obtain the air-ground model; among them, the optimal coordinate conversion relationship is the coordinate conversion relationship that minimizes the semantic reprojection error.
- the visual positioning device provided in the embodiment of the present application can be used to execute the method steps shown in FIG. 10, and the content and effect of the visual positioning device can be referred to the method part, which will not be repeated in the embodiment of the present application.
- the air-ground model modeling device After the air-ground model modeling device obtains the air-ground model, the air-ground model can be configured in a corresponding server, and the server provides a visual positioning function service to the terminal device.
- the processor mentioned in the above embodiments may be an integrated circuit chip with signal processing capability.
- the steps of the foregoing method embodiments can be completed by hardware integrated logic circuits in the processor or instructions in the form of software.
- the processor can be a general-purpose processor, digital signal processor (digital signal processor, DSP), application-specific integrated circuit (ASIC), field programmable gate array (field programmable gate array, FPGA) or other Programming logic devices, discrete gates or transistor logic devices, discrete hardware components.
- the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
- the steps of the method disclosed in the embodiments of the present application may be directly embodied as being executed and completed by a hardware encoding processor, or executed and completed by a combination of hardware and software modules in the encoding processor.
- the software module can be located in a mature storage medium in the field, such as random access memory, flash memory, read-only memory, programmable read-only memory, or electrically erasable programmable memory, registers.
- the storage medium is located in the memory, and the processor reads the information in the memory and completes the steps of the above method in combination with its hardware.
- the memory mentioned in the above embodiments may be volatile memory or non-volatile memory, or may include both volatile and non-volatile memory.
- the non-volatile memory can be read-only memory (ROM), programmable read-only memory (programmable ROM, PROM), erasable programmable read-only memory (erasable PROM, EPROM), and electrically available Erase programmable read-only memory (electrically EPROM, EEPROM) or flash memory.
- the volatile memory may be random access memory (RAM), which is used as an external cache.
- RAM random access memory
- static random access memory static random access memory
- dynamic RAM dynamic RAM
- DRAM dynamic random access memory
- synchronous dynamic random access memory synchronous DRAM, SDRAM
- double data rate synchronous dynamic random access memory double data rate SDRAM, DDR SDRAM
- enhanced synchronous dynamic random access memory enhanced SDRAM, ESDRAM
- synchronous connection dynamic random access memory serial DRAM, SLDRAM
- direct rambus RAM direct rambus RAM
- the disclosed system, device, and method may be implemented in other ways.
- the device embodiments described above are merely illustrative, for example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or It can be integrated into another system, or some features can be ignored or not implemented.
- the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
- the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
- the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
- the function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
- the technical solution of the present application essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (personal computer, server, or network device, etc.) execute all or part of the steps of the method described in each embodiment of the present application.
- the aforementioned storage media include: U disk, mobile hard disk, read-only memory (read-only memory, ROM), random access memory (random access memory, RAM), magnetic disks or optical disks and other media that can store program codes. .
Landscapes
- Engineering & Computer Science (AREA)
- Remote Sensing (AREA)
- Radar, Positioning & Navigation (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- Automation & Control Theory (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computing Systems (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Databases & Information Systems (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Processing Or Creating Images (AREA)
Abstract
一种视觉定位方法和装置。该方法包括:获取采集的第一图像;根据第一图像和航拍模型,确定第一位姿;判断空地模型中是否存在与第一位姿对应的地面模型,当存在与第一位姿对应的地面模型时,根据地面模型确定第二位姿;其中,空地模型包括航拍模型和映射至该航拍模型中的地面模型,地面模型的坐标系与航拍模型的坐标系相同,第二位姿的定位精度高于第一位姿的定位精度。基于地面模型进行精细化视觉定位,能够提升定位的精度和成功率。
Description
本申请要求于2019年11月15日提交中国专利局、申请号为201911122668.2、申请名称为“视觉定位方法和装置”,以及于2020年2月27日提交中国专利局、申请号为202010126108.0、申请名称为“视觉定位方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本申请涉及一种智能感知技术,尤其涉及一种视觉定位方法和装置。
视觉定位是使用相机所拍摄的图像或者视频来进行定位,精确定位出相机在真实世界中的位置和姿态。视觉定位是近些年来计算机视觉领域的热点问题,其在增强现实、交互虚拟现实、机器人视觉导航、公共场景监控、智能交通等诸多领域都具有十分重要的意义。
视觉定位技术包括基于无人机基础地图或者卫星地图的视觉定位方法。无人机/卫星基础地图(Aerial Model)主要通过无人机对场景进行倾斜摄影,根据采集到的数据进行运动恢复结构(Structure from Motion,SFM)三维重建得到的,或者,通过卫星对场景进行白模重建得到的。基于无人机基础地图或者卫星地图的视觉定位方法,使用该无人机/卫星基础地图(Aerial Model)对相机所拍摄的图像或者视频进行定位,获取相机在无人机/卫星基础地图中的6个自由度(Degree of freedom,DoF)位姿(Pose)。该类视觉定位技术可以应对大规模场景的视觉定位。
上述无人机基础地图或者卫星地图的视觉定位方法,存在定位成功率较低和定位精度不高的问题。
发明内容
本申请提供一种视觉定位方法和装置,以避免资源浪费,提升定位成功率和定位精度。
第一方面,本申请实施例提供一种视觉定位方法,该方法可以包括:获取采集的第一图像。根据第一图像和航拍模型,确定第一位姿。判断空地模型中是否存在第一位姿对应的地面模型。当存在第一位姿对应的地面模型时,根据地面模型确定第二位姿。其中,空地模型包括航拍模型和映射至航拍模型中的地面模型,该地面模型的坐标系与该航拍模型的坐标系相同,该第二位姿的定位精度高于该第一位姿的定位精度。
本实现方式,服务器根据第一图像和航拍模型,确定第一位姿,判断空地模型中是否存在第一位姿对应的地面模型。当存在第一位姿对应的地面模型时,根据地面模型确定第二位姿,先基于航拍模型确定第一位姿,可以实现快捷、高效、可适用于大范围场景的粗定位,以满足县区级、地市级的视觉定位需求,再根据第一位姿,基于地面模型进行精细化视觉定位,从而实现层次化视觉定位,提升视觉定位的定位精度和成功率。
在一种可能的设计中,根据第一图像和航拍模型,确定第一位姿,可以包括:根据第 一图像对应的终端设备的位置信息和磁力计角度偏转信息,确定初始位姿集合。根据第一图像获取第一图像的天际线和建筑物线面语义信息。根据第一图像的天际线和航拍模型,在初始位姿集合中确定N个初始位姿。根据建筑物线面语义信息、N个初始位姿和航拍模型,确定第一位姿。其中,N为大于1的整数。
该初始位姿也可以称之为候选位姿。
在一种可能的设计中,该方法还可以包括:获取采集的至少一个第二图像,第一图像和至少一个第二图像的拍摄视野存在交集。比如,第一图像和至少一个第二图像的视角不同。根据N个初始位姿、第一图像的天际线和至少一个第二图像的天际线、以及第一图像和至少一个第二图像之间的相对位姿,确定优化后的N个初始位姿。根据建筑物线面语义信息、N个初始位姿和航拍模型,确定所述第一位姿,包括:根据建筑物线面语义信息、优化后的N个初始位姿和航拍模型,确定第一位姿。
该第一图像和至少一个第二图像之间的相对位姿可以是同时定位与建图(Simultaneous Localization and Mapping,SLAM)算法给出的。
在一种可能的设计中,该方法还可以包括:根据N个初始位姿、以及第一图像和至少一个第二图像之间的相对位姿,确定优化后的N个初始位姿。
在一种可能的设计中,初始位姿集合包括多组初始位姿,每组初始位姿包括初始位置信息和初始磁力计角度偏转信息,初始位置信息属于第一阈值范围内,第一阈值范围为根据终端设备的位置信息确定的,初始磁力计角度偏转信息属于第二阈值范围内,第二阈值范围为根据终端设备的磁力计角度偏转信息确定的。
在一种可能的设计中,第一阈值范围的中心值为终端设备的位置信息,第二阈值范围的中心值为所述终端设备的磁力计角度偏转信息。
在一种可能的设计中,根据所述第一图像的天际线和航拍模型,在初始位姿集合中确定N个初始位姿,包括:分别根据每组初始位姿和所述航拍模型,进行天际线渲染,获取每组初始位姿对应的天际线。分别计算每组初始位姿对应的天际线与所述第一图像的天际线的匹配度,确定每组初始位姿的匹配度。根据每组初始位姿的匹配度,在初始位姿集合中确定N个初始位姿,N个初始位姿为所述初始位姿集合中匹配度从大到小排序的前N个初始位姿。
在一种可能的设计中,该方法还可以包括:基于构建地面模型的多个第三图像和航拍模型,构建空地模型。
该第三图像可以包括天际线。
在一种可能的设计中,基于构建地面模型的多个第三图像和所述航拍模型,构建所述空地模型,可以包括:根据所述航拍模型,确定多个第三图像在所述航拍模型中的位姿。根据多个第三图像在航拍模型中的位姿和多个第三图像在地面模型中的位姿,确定空地模型。
在一种可能的设计中,根据多个第三图像在航拍模型中的位姿和多个第三图像在地面模型中的位姿,确定空地模型,包括:根据多个第三图像在航拍模型中的位姿和多个第三图像在所述地面模型中的位姿,确定多种坐标转换关系。分别根据多种坐标转换关系,确定多个第三图像在航拍模型中的语义重投影误差,在多种坐标转换关系中选取最优的坐标转换关系作为空地模型。其中,该最优的坐标转换关系为使得语义重投影误差最小的坐标 转换关系。
在一种可能的设计中,该方法还可以包括:获取多个第三图像以及每个第三图像对应的重力信息;根据该第三图像和每个第三图像对应的重力信息构建该地面模型。该重力信息可由SLAM获取,该重力信息用于获取相机坐标系的侧倾(roll)角和俯仰(pitch)角。
该基于构建地面模型的多个第三图像和航拍模型,构建空地模型,可以包括:根据该地面模型和该航拍模型,构建空地模型。
在一种可能的设计中,该方法还可以包括:根据第一位姿或第二位姿确定虚拟物体描述信息。向终端设备发送虚拟物体描述信息,该第一虚拟物体描述信息用于在终端设备上显示对应的虚拟物体。
第二方面,本申请实施例提供一种视觉定位方法,该方法可以包括:采集第一图像,并在用户界面上显示第一图像,第一图像包括拍摄到的天际线。向服务器发送第一图像。接收服务器发送的第一虚拟物体描述信息,第一虚拟物体描述信息为根据第一位姿确定的,第一位姿为根据第一图像的天际线和建筑物线面语义信息、以及航拍模型确定的。在用户界面上叠加显示第一虚拟物体描述信息对应的虚拟物体。
本实现方式,终端设备向服务器发送第一图像,接收服务器发送的第一虚拟物体描述信息,终端设备在用户界面上显示该第一虚拟物体描述信息对应的虚拟物体,第一虚拟物体描述信息为根据第一位姿确定的,第一位姿为根据第一图像的天际线和建筑物线面语义信息、以及航拍模型确定的,该第一位姿的定位精度高于现有技术的视觉定位方法的定位精度,从而使得基于第一位姿所显示的虚拟物体更为精细和准确。
在一种可能的设计中,采集第一图像之前,该方法还可以包括:在用户界面上显示第一提示信息,该第一提示信息用于提示用户拍摄天际线。
在一种可能的设计中,该方法还包括:接收服务器发送的指示消息,该指示消息用于指示空地模型中存在第一位姿对应的地面模型,该地面模型用于确定第二位姿,该空地模型包括航拍模型和映射至航拍模型中的地面模型,该地面模型的坐标系与该航拍模型的坐标系相同。根据指示消息,在用户界面上显示第二提示信息,该第二提示信息用于提示用户可供选择的操作方式。
本实现方式,当存在第一位姿对应的地面模型时,终端设备可显示存在地面模型的提示信息,以供用户选择是否计算第二位姿,即是否进行更为精细的视觉定位,以满足不同用户的使用需求。
在一种可能的设计中,该方法还包括:接收用户通过用户界面或者在硬件按钮上输入的重定位指令,响应于重定位指令,向服务器发送定位优化请求消息,该定位优化请求消息用于请求计算第二位姿。接收服务器发送的第二虚拟物体描述信息,该第二虚拟物体描述信息为根据第二位姿确定的,该第二位姿为根据第一位姿对应的地面模型确定的,第二位姿的定位精度高于第一位姿的定位精度。
第三方面,本申请实施例提供一种空地模型建模方法,该方法可以包括:获取构建地面模型的多个第三图像。确定构建地面模型的多个第三图像在航拍模型中的第一位姿。根据多个第三图像在航拍模型中的第一位姿和多个第三图像在地面模型中的第二位姿,对齐航拍模型和地面模型以获取空地模型。其中,空地模型包括航拍模型和映射至航拍模型中的地面模型,地面模型的坐标系与航拍模型的坐标系相同。
该第三图像可以包括天际线。
在一种可能的设计中,根据多个第三图像在航拍模型中的位姿和多个第三图像在地面模型中的位姿,对齐航拍模型和地面模型,获取空地模型,包括:根据多个第三图像在航拍模型中的位姿和多个第三图像在地面模型中的位姿,确定多种坐标转换关系。分别根据多种坐标转换关系,确定多个第三图像在航拍模型中的语义重投影误差,在多种坐标转换关系中选取最优的坐标转换关系,作为空地模型的坐标转换关系,空地模型的坐标转换关系用于对齐所述航拍模型和所述地面模型。根据空地模型的坐标转换关系,将地面模型映射至航拍模型中,获取空地模型。其中,该最优的坐标转换关系为使得语义重投影误差最小的坐标转换关系。
在一种可能的设计中,该方法还可以包括:获取多个第三图像以及每个第三图像对应的重力信息;根据该第三图像和每个第三图像对应的重力信息构建该地面模型。
第四方面,本申请实施例提供一种视觉定位装置,该视觉定位装置可以作为服务器或服务器的内部芯片,该视觉定位装置用于执行上述第一方面或第一方面的任一可能的实现方式中的视觉定位方法。具体地,该视觉定位装置以包括用于执行第一方面或第一方面的任一可能的实现方式中的视觉定位方法的模块或单元,例如,收发模块或单元,处理模块或单元。
第五方面,本申请实施例提供一种视觉定位装置,该视觉定位装置可以作为服务器或服务器的内部芯片,该视觉定位装置包括存储器和处理器,该存储器用于存储指令,该处理器用于执行存储器存储的指令,并且对存储器中存储的指令的执行使得处理器执行上述第一方面或第一方面的任一可能的实现方式中的视觉定位方法。
第六方面,本申请实施例提供一种计算机可读存储介质,其上存储有计算机程序,所述程序被处理器执行时实现第一方面或第一方面的任一可能的实现方式中的方法。
第七方面,本申请实施例提供一种视觉定位装置,该视觉定位装置可以作为终端设备,该视觉定位装置用于执行上述第二方面或第二方面的任一可能的实现方式中的视觉定位方法。具体地,该视觉定位装置可以包括用于执行第二方面或第二方面的任一可能的实现方式中的视觉定位方法的模块或单元,例如,收发模块或单元,处理模块或单元。
第八方面,本申请实施例提供一种视觉定位装置,该通信装置可以作为终端设备,该视觉定位装置包括存储器和处理器,该存储器用于存储指令,该处理器用于执行存储器存储的指令,并且对存储器中存储的指令的执行使得处理器执行第二方面或第二方面的任一可能的实现方式中的视觉定位方法。
第九方面,本申请实施例提供一种计算机可读存储介质,其上存储有计算机程序,所述程序被处理器执行时实现第二方面或第二方面的任一可能的实现方式中的方法。
第十方面,本申请实施例提供一种视觉定位装置,该视觉定位装置可以作为服务器或服务器的内部芯片,该视觉定位装置用于执行上述第三方面或第三方面的任一可能的实现方式中的空地模型建模方法。具体地,该视觉定位装置以包括用于执行第三方面或第三方面的任一可能的实现方式中的空地模型建模方法的模块或单元,例如,获取模块或单元,处理模块或单元。
第十一方面,本申请实施例提供一种视觉定位装置,该视觉定位装置可以作为服务器或服务器的内部芯片,该视觉定位装置包括存储器和处理器,该存储器用于存储指令,该 处理器用于执行存储器存储的指令,并且对存储器中存储的指令的执行使得处理器执行上述第三方面或第三方面的任一可能的实现方式中的空地模型建模方法。
第十二方面,本申请实施例提供一种计算机可读存储介质,其上存储有计算机程序,所述程序被处理器执行时实现第三方面或第三方面的任一可能的实现方式中的方法。
第十三方面,本申请实施例提供一种计算机程序产品,计算机程序产品包括计算机程序,当计算机程序被计算机或处理器执行时,用于执行第一方面或第一方面的任一可能的实现方式中的方法,或者,用于执行第二方面或第二方面的任一可能的实现方式中的方法,或者,用于执行第三方面或第三方面的任一可能的实现方式中的方法。
第十四方面,本申请实施例提供一种视觉定位方法,该方法可以包括:获取采集的第一图像和第二图像。根据第一图像对应的终端设备的位置信息和磁力计角度偏转信息,确定初始位姿集合。根据第一图像获取第一图像的天际线和建筑物线面语义信息。根据该第二图像获取第二图像的天际线和建筑物线面语义信息。基于SLAM获取第一图像和第二图像之间的相对位姿。根据该第一图像的天际线、该第二图像的天际线、该相对位姿、和该航拍模型,在该初始位姿集合中确定N个优化后的候选位姿。根据建筑物线面语义信息、该N个优化后的候选位姿和该航拍模型,确定该第一图像的第一位姿,其中,N为大于1的整数。
在一种可能的设计中,该第一图像和该第二图像的视角不同。该天际线可以包括植被天际线。该建筑物线面语义信息可以包括建筑物的上边沿信息。
在一种可能的设计中,该初始位姿集合包括多组初始位姿,每组初始位姿包括初始位置信息和初始磁力计角度偏转信息,该初始位置信息属于第一阈值范围内,该第一阈值范围为根据终端设备的位置信息确定的,该初始磁力计角度偏转信息属于第二阈值范围内,该第二阈值范围为根据终端设备的磁力计角度偏转信息确定的。
在一种可能的设计中,该第一阈值范围的中心值为终端设备的位置信息,该第二阈值范围的中心值为该终端设备的磁力计角度偏转信息。
在一种可能的设计中,根据该第一图像的天际线、该第二图像的天际线、该相对位姿、和该航拍模型,在该初始位姿集合中确定N个优化后的候选位姿,包括:分别根据每组初始位姿和航拍地图,进行天际线渲染,获取每组初始位姿对应的天际线;分别计算每组初始位姿对应的天际线与第一图像的天际线的匹配度;根据天际线的匹配度、第二图像的天际线和相对位姿,确定每组初始位姿的权重;根据每组初始位姿的权重,在初始位姿集合中确定N个优化后的候选位姿,所述N个优化后的候选位姿为初始位姿集合中权重从小到大排序的前N个位姿。
在一种可能的设计中,根据建筑物线面语义信息、该N个优化后的候选位姿和该航拍模型,确定该第一图像的第一位姿,可以包括:根据该N个优化后的候选位姿、该第一图像的建筑物线面语义信息和该航拍模型,分别计算每个优化后的候选位姿对应的语义重投影误差,选择重投影误差最小的作为该第一图像的第一位姿。
在一种可能的设计中,该方法还可以包括:判断空地模型中是否存在该第一位姿对应的地面模型。当存在第一位姿对应的地面模型时,根据地面模型确定第二位姿。其中,空地模型包括航拍模型和映射至航拍模型中的地面模型,该地面模型的坐标系与该航拍模型的坐标系相同,该第二位姿的定位精度高于该第一位姿的定位精度。
在一种可能的设计中,该方法还可以包括:当不存在第一位姿对应的地面模型时,根据第一位姿确定第一虚拟物体描述信息;向终端设备发送该第一虚拟物体描述信息,该第一虚拟物体描述信息用于在终端设备上显示对应的虚拟物体。
在一种可能的设计中,该方法还可以包括:当存在第一位姿对应的地面模型时,根据第二位姿确定第二虚拟物体描述信息;向终端设备发送该第二虚拟物体描述信息,该第二虚拟物体描述信息用于在终端设备上显示对应的虚拟物体。
第十五方面,本申请实施例提供一种空地模型构建方法,该方法可以包括:获取多个第三图像以及每个第三图像对应的重力信息;根据该第三图像构建地面模型;根据航拍模型和地面模型,构建空地模型。其中,该多个第三图像中至少一个第三图像包括天际线,该重力信息可由SLAM获取,该重力信息用于获取相机坐标系的侧倾(roll)角和俯仰(pitch)角。
上述模型也可以称之为地图,例如,空地地图,航拍地图等。
在一种可能的设计中,根据航拍模型和地面模型,构建空地模型,包括:根据航拍模型,确定多个第三图像中包含天际线的第三图像在航拍模型中的位姿;根据多个第三图像中包含天际线的第三图像在航拍模型中的位姿和多个第三图像中包含天际线的第三图像在地面模型中的位姿,确定该空地模型。
在一种可能的设计中,根据多个第三图像中包含天际线的第三图像在航拍模型中的位姿和多个第三图像中包含天际线的第三图像在地面模型中的位姿,确定该空地模型,包括:根据多个第三图像中包含天际线的第三图像在航拍模型中的位姿和多个第三图像中包含天际线的第三图像在地面模型中的位姿,确定多种坐标转换关系;
分别根据多种坐标转换关系,确定多个第三图像在航拍模型中的建筑物线面语义重投影误差,在该多种坐标转换关系中选取最优的坐标转换关系;其中,该最优的坐标转换关系为使得建筑物线面语义重投影误差最小的坐标转换关系。
根据该最优的坐标系转换关系,将地面模型的坐标系转换到该航拍模型的坐标系,获取该空地模型;其中,该空地模型包括该航拍模型和坐标系映射至该航拍模型的该地面模型,该地面模型的坐标系与该航拍地图的坐标系相同。
本申请实施例的视觉定位方法和装置,基于第一图像的天际线和建筑物线面语义信息,和/或,空地模型中的地面模型,进行视觉定位,可以提升视觉定位的成功率和精度。
图1为本申请实施例提供的一种航拍模型的示意图;
图2为本申请实施例提供的一种地面模型、航拍模型和空地模型的示意图;
图3为本申请实施例提供的一种应用场景的示意图;
图4A为本申请实施例提供的终端设备的屏幕上显示的一种用户界面的示意图;
图4B为本申请实施例提供的终端设备的屏幕上显示的一种用户界面的示意图;
图4C为本申请实施例提供的终端设备的屏幕上显示的一种用户界面的示意图;
图5为本申请实施例提供的一种视觉定位方法的流程图;
图6为本申请实施例提供的一种改进的基于航拍模型的视觉定位(Geo-localization)方法的流程图
图7A为本申请实施例提供的一种语义分割效果图;
图7B为本申请实施例提供的另一种语义分割效果图;
图8为本申请实施例提供的另一种视觉定位方法的流程图;
图9为本申请实施例提供的一种用户界面示意图;
图10为本申请实施例提供的一种空地模型建模方法的流程图;
图11为本申请实施例提供的一种用户界面示意图;
图12为本申请实施例提供的一种空地模型建模的示意图;
图13为本申请实施例提供的一种视觉定位装置的结构示意图;
图14为本申请实施例提供的另一种视觉定位装置的结构示意图;
图15为本申请实施例提供的另一种视觉定位装置的结构示意图;
图16为本申请实施例提供的另一种视觉定位装置的结构示意图。
本申请实施例所涉及的术语“第一”、“第二”等仅用于区分描述的目的,而不能理解为指示或暗示相对重要性,也不能理解为指示或暗示顺序。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元。方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
应当理解,在本申请中,“至少一个(项)”是指一个或者多个,“多个”是指两个或两个以上。“和/或”,用于描述关联对象的关联关系,表示可以存在三种关系,例如,“A和/或B”可以表示:只存在A,只存在B以及同时存在A和B三种情况,其中A,B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。“以下至少一项(个)”或其类似表达,是指这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a,b或c中的至少一项(个),可以表示:a,b,c,“a和b”,“a和c”,“b和c”,或“a和b和c”,其中a,b,c可以是单个,也可以是多个。
首先对本申请实施例所涉及的几个名词进行解释:
视觉定位(Visual Localization):在视觉定位系统中,定位出终端设备的相机(camera)坐标系在真实世界坐标系中的位姿,目的是为了使真实世界与虚拟世界无缝融合。
查询(query)图像:终端设备采集的图像,用来进行视觉定位的当前图像帧。
航拍模型(Aerial Model):也称为无人机/卫星基础地图,该航拍模型主要可以通过如下两种方式获取:1)通过无人机对场景进行倾斜拍摄,根据拍摄所采集到的数据进行运动恢复结构(Structure from Motion,SFM)三维重建,如图1(a)以及图2(b)所示。2)通过卫星对场景进行白模重建,如图1(b)所示。
地面模型(Ground Model):也称之为基于终端设备建图的地图,基于终端设备对场景进行数据采集,根据采集到的数据进行SFM三维重建,获取该地面模型,示例性的,该地面模型可以如图2(a)所示。
空地模型(Aerial-Ground Model):也可以称之为空地地图,对航拍模型(Aerial Model)和地面模型(Ground Model)通过相似变换实现两个模型对齐,将两个模型统一到一个全局坐标系下,如图2(c)和图2(d)所示,其中,图2(c)为空地模型的点云,图2(d)为基于 空地模型的点云的重建网络(Reconstructed Meshes)。
基于航拍模型(Aerial Model)的视觉定位(Geo-localization):基于航拍模型(Aerial Model),定位出终端设备的相机坐标系在航拍模型中的6-dof位姿。
基于地面模型(Ground Model)的视觉定位(Ground-localization):基于地面模型(Ground Model),定位出终端设备的相机坐标系在地面模型中的6-dof位姿。
6个自由度(Degree of freedom,DoF)位姿(Pose):包括(x,y,z)坐标,以及环绕三个坐标轴的角度偏转,环绕三个坐标轴的角度偏转分别为偏航(yaw),俯仰(pitch),侧倾(roll)。
本申请实施例涉及终端设备。终端设备可以是移动电话、平板个人电脑(tablet personal computer)、媒体播放器、智能电视、笔记本电脑(laptop computer)、个人数字助理(personal digital assistant,PDA)、个人计算机(personal computer)、智能手表、增强现实(augmented reality,AR)眼镜等可穿戴式设备(wearable device)、车载设备、或物联网(the Internet of things,IOT)设备等,本申请实施例对此不作限定。
图3为本申请实施例提供的一种应用场景的示意图,如图3所示,该应用场景可以包括终端设备11和服务器12,示例性的,终端设备11与服务器12可以进行通信,服务器12可以向终端设备提供视觉定位服务,以及基于视觉定位服务,向终端设备11推送虚拟物体描述信息,以使得终端设备可以呈现相应的虚拟物体,该虚拟物体可以是虚拟路标、虚拟人物等。本申请实施例提供一种视觉定位方法,以提升视觉定位的成功率和准确率,从而准确地向终端设备推送相应的虚拟物体描述信息,其具体解释说明可以参见下述实施例。
本申请实施例的视觉定位方法可以应用于AR导航、AR人机交互、辅助驾驶、自动驾驶等需要定位终端设备的相机的位置和姿态的领域。例如,超大场景视觉导航系统,视觉导航指的是通过增强现实等交互方式将用户引导至某一个目的地点。用户可实时在终端设备的屏幕上看到建议的步行方向、离目的地的距离等信息,如图4A所示,虚拟物体为屏幕上显示的J2-1-1B16会议室的步行方向,即通过增强现实向用户展示步行方向等。再例如,超大场景AR游戏交互,如图4B和4C所示,AR游戏交互可以将AR内容固定在特定的地理位置,用户所使用的终端设备可以通过本申请实施例的视觉定位方法,在屏幕上显示相应的虚拟物体(例如,图4B所示的虚拟人物,图4C所示的虚拟动画),用户通过点击/滑动终端设备的屏幕等方式实现和虚拟物体的互动,可以引导虚拟物体和真实世界发生交互。
需要说明的是,终端设备11通常设置有摄像头,终端设备11可以通过摄像头对场景进行拍摄。上述服务器12以一个服务器为例进行举例说明,本申请不以此作为限制,例如,其也可以是包括多个服务器的服务器集群。
图5为本申请实施例提供的一种视觉定位方法的流程图,本实施例的方法涉及终端设备和服务器,如图5所示,本实施例的方法可以包括:
步骤101、终端设备采集第一图像。
终端设备通过摄像头采集第一图像,该第一图像可以是如上所述的查询(query)图像。
以终端设备是智能手机为例,智能手机可以根据应用程序的触发,启动拍摄功能,采集该第一图像。例如,可以周期性采集第一图像,例如,2秒,30秒等,也可以是满足预设 采集条件时,采集第一图像,该预设采集条件可以是智能手机的GPS数据在预设范围内。终端设备采集的每一个第一图像均可以通过如下步骤,以实现视觉定位。
步骤102、终端设备向服务器发送第一图像。
服务器接收终端设备发送的第一图像。
步骤103、服务器根据第一图像和航拍模型,确定第一位姿。
本申请实施例的确定第一位姿的方式可以称之为改进的基于航拍模型(Aerial Model)的视觉定位(Geo-localization),该改进的基于航拍模型(Aerial Model)的视觉定位(Geo-localization)可以有效结合第一图像的天际线和建筑物线面语义信息,确定第一位姿,提升定位成功率和定位精度。
示例性的,服务器可以根据第一图像的天际线确定N个初始位姿,根据第一图像的建筑物线面语义信息、该N个初始位姿和航拍模型,确定第一位姿。例如,遍历该N个初始位姿,计算该N个初始位姿的语义重投影误差,根据语义重投影误差,确定第一位姿。其中,计算该N个初始位姿的语义重投影误差的方式可以是,分别根据该N个初始位姿和航拍模型渲染建筑物的边和面,得到渲染出的语义分割图,计算渲染出的语义分割图与该第一图像的建筑物线面语义信息(例如,语义分割图)的匹配误差,该匹配误差即为语义重投影误差。N为大于1的整数。
一种可实现方式,根据第一图像对应的终端设备的位置信息和磁力计角度偏转信息,确定初始位姿集合。根据第一图像获取第一图像的天际线和建筑物线面语义信息。根据第一图像的天际线和航拍模型,在初始位姿集合中确定N个初始位姿。根据建筑物线面语义信息、N个初始位姿和航拍模型,确定第一位姿。其具体实施方式可以参见图6所示实施例的解释说明。
在一些实施例中,服务器还可以接收终端设备采集的至少一个第二图像,服务器可以根据该至少一个第二图像,对N个初始位姿进行优化,确定优化后的N个初始位姿,根据第一图像的建筑物线面语义信息和该优化后的N个初始位姿,确定该第一位姿。即结合多帧图像辅助第一图像的位姿求解。该至少一个第二图像和第一图像的拍摄视野存在交集。
可选的,该至少一个第二图像和第一图像的拍摄视野也可以不存在交集。换言之,该至少一个第二图像和第一图像的视角不同。
步骤104、服务器根据第一位姿确定第一虚拟物体描述信息,向终端设备发送该第一虚拟物体描述信息。
例如,服务器可以根据第一位姿确定第一虚拟物体描述信息,该第一虚拟物体描述信息用于在终端设备上显示相应的虚拟物体,例如,如图4A所示的步行引导图标,该引导图标显示在真实世界的实际场景中,即显示在如图4A所示的街道上。
步骤105、终端设备在用户界面上显示该第一虚拟物体描述信息对应的虚拟物体。
终端设备在用户界面上显示该第一虚拟物体描述信息对应的虚拟物体,该用户界面中显示有真实世界的实际场景,该虚拟物体可以采用增强现实的方式显示在该用户界面上。
本实施例,终端设备向服务器发送第一图像,服务器根据第一图像的天际线和建筑物线面语义信息、以及航拍模型,确定第一位姿,服务器根据第一位姿确定第一虚拟物体描述信息,向终端设备发送该第一虚拟物体描述信息,终端设备在用户界面上显示该第一虚拟物体描述信息对应的虚拟物体,以基于第一图像的天际线和建筑物线面语义信息,确定 第一位姿,可以提升视觉定位的成功率和精度。
进一步的,本实施例可以有效结合基于航拍地图的视觉定位和基于手机精细化建图的视觉定位各自的优势,通过构建空地地图和层次化视觉定位,有效缓解大场景下采集成本与定位精度之间的矛盾。
下面采用图6所示实施例对上述步骤103的一种具体的可实现方式进行解释说明。
图6为本申请实施例提供的一种改进的基于航拍模型的视觉定位(Geo-localization)方法的流程图,本实施例的执行主体可以是服务器或服务器的内部芯片,如图6所示,本实施例的方法可以包括:
步骤201、根据第一图像对应的终端设备的位置信息和磁力计角度偏转信息,确定初始位姿集合。
该第一图像对应的终端设备的位置信息可以是全球定位系统(Global Positioning System,GPS)信息,该磁力计角度偏转信息可以是偏航(yaw)角。该位置信息和磁力计角度偏转信息可以是终端设备采集该第一图像时的位置信息和磁力计角度偏转信息,其可以通过终端设备的无线通信模块和磁力计获取。
该初始位姿集合可以包括多组初始位姿,每组初始位姿可以包括初始位置信息和初始磁力计角度偏转信息,该初始位置信息属于第一阈值范围内,该第一阈值范围为根据终端设备的位置信息确定的,该初始磁力计角度偏转信息属于第二阈值范围内,该第二阈值范围为根据终端设备的磁力计角度偏转信息确定的。
示例性的,终端设备可以根据第一图像对应的终端设备的位置信息和磁力计角度偏转信息,分别构建位置备选集合(T)和偏航(yaw)角备选集合(Y),位置备选集合(T)包括多个初始位置信息,偏航(yaw)角备选集合(Y)包括多个偏航(yaw)角,T中的一个初始位置信息和Y中的一个偏航(yaw)角可以组成一组初始位姿,从而可以组成多组初始位姿。
构建位置备选集合(T)的一种可实现方式为,在一个区域范围内,以第一预设间隔为间隔,选取位置点作为位置备选集合(T)中的初始位置信息,该区域范围可以是以第一图像对应的终端设备的位置信息(x,y)为圆心,半径为第一阈值的范围。即上述第一阈值范围的中心值为终端设备的位置信息。例如,该第一阈值可以是30米、35米等。该第一预设间隔可以是1米。
构建偏航(yaw)角备选集合(Y)的一种可实现方式为,在一个角度范围内,以第二预设间隔为间隔,选取角度作为偏航(yaw)角备选集合(Y)中的偏航(yaw)角,该角度范围可以是以第一图像对应的终端设备的偏航(yaw)角的正负第二阈值的范围。即上述第二阈值范围的中心值为终端设备的磁力计角度偏转信息。例如,该第二阈值可以是90度、85度等。该第二预设间隔可以是0.1度。
上述构建位置备选集合(T)和偏航(yaw)角备选集合(Y)的可实现方式为一种举例说明,本申请实施例不以此作为限制。
步骤202、根据第一图像获取第一图像的天际线和建筑物线面语义信息。
在本步骤中,可以对第一图像进行不同类别的语义分割,并提取第一图像的天际线。该不同类别可以包括植被、建筑物、天空等。可以对第一图像进行建筑物横竖线面的语义分割,获取第一图像的建筑物线面语义信息。建筑物横竖线面包括建筑物的边缘(横边和 竖边)和平面。
示例性的,以图7A为例进行举例说明,将第一图像(例如,图7A最左侧的图像)输入至第一语义分割网络,该第一语义分割网络用于区分出建筑物、天空、植被和地面等,输出语义分割效果图(例如,图7A中间的图像),基于该语义分割效果图提取天际线,得到第一图像的天际线(例如,图7A最右侧的图像)。
该第一语义分割网络可以是任意神经网络,例如,卷积神经网络等。
该第一语义分割网络可以是使用训练数据训练后得到的,即使用训练数据训练第一语义分割网络区分出建筑物、天空、植被和地面等。语义分割任务是密集的像素级别的分类任务,在训练时采用的训练策略是标准的交叉熵损失,用以衡量预测值与标签值之间的差距,通过最小化该损失,提高网络的预测效果:
其中,N代表所有的像素,p
i代表任一像素预测与标注(ground truth)为同一类别的预测值,p
j代表任一像素各类别的预测值。本申请实施例的语义分割网络共计算了两部分的损失(Loss),第一部分是最终的输出与标签图的交叉熵L
final,即上述公式中的L,第二部分是正则化损失L
weight。本申请实施例通过减少特征或者惩罚不重要特征的权重来缓解过拟合,而正则化就是帮助惩罚特征权重的,即特征的权重也会成为模型的损失函数一部分。因此,语义分割网络的总体损失(Loss)如下:
L
total=L
final+γL
weight
其中γ是超参数,用于控制损失的重要程度。例如,γ的值设为1。
通过重复迭代调整语义分割网络模型,以最小化语义分割网络的总体损失(Loss),从而训练得到该第一语义分割网络。
示例性的,以图7B为例进行举例说明,将第一图像(例如,图7B最左侧的图像)输入至第二语义分割网络,该第二语义分割网络用于区分图像中的建筑物横线、竖线、建筑物面等,输出建筑物线面语义信息,例如,建筑物语义分割图(例如,图7B最右侧的图像)。
该第二语义分割网络可以是任意神经网络,例如,卷积神经网络等。
该第二语义分割网络可以是使用训练数据训练后得到的,即使用训练数据训练第二语义分割网络区分出建筑物横线、竖线、建筑物面等。具体的训练方式可以采用类似第一语义分割网络的训练方式,此处不再赘述。
需要说明的是,采用如上述方式获取第一图像的天际线后,还可以对该天际线进行校正调整,例如,在重力方向旋转一定角度,该角度可以是采用同时定位与建图(Simultaneous Localization and Mapping,SLAM)算法确定的,例如,SLAM算法给出的角度,该角度用于表示终端设备的相机坐标系与重力方向的相对关系。
步骤203、根据第一图像的天际线和航拍模型,在初始位姿集合中确定N个初始位姿。
例如,可以遍历上述位置备选集合(T)和偏航(yaw)角备选集合(Y)中的各个元素,组成初始位姿集合中的多组初始位姿,每组初始位姿可以包括初始位置信息和初始磁力计角度偏转信息。针对每组初始位姿,根据航拍模型,进行天际线渲染,获取每组初始 位姿对应的天际线,分别计算每组初始位姿对应的天际线与第一图像的天际线的匹配度,确定每组初始位姿的匹配度,根据每组初始位姿的匹配度,在初始位姿集合中确定N个初始位姿,该N个初始位姿可以是该初始位姿集合中匹配度较高的N个初始位姿。
计算每组初始位姿对应的天际线与第一图像的天际线的匹配度的具体实现方式可以为:将渲染好的初始位姿对应的天际线和第一图像的天际线用滑动窗口的形式进行匹配(L2距离或者其他距离度量),确定匹配度。
本申请实施例采用N个初始位姿参与视觉定位,可以提升视觉定位的定位成功率。
示例性的,一组初始位姿为((x
1,y
1),yaw1),基于航拍模型对该初始位姿进行天际线渲染,获取该初始位姿对应的天际线,将该初始位姿对应的天际线和第一图像的天际线用滑动窗口的形式进行匹配,确定该初始位姿的匹配度。
步骤204、根据N个初始位姿、第一图像的天际线和至少一个第二图像的天际线、以及第一图像和至少一个第二图像之间的相对位姿,确定优化后的N个初始位姿。
在执行步骤203之后,可以直接执行步骤205,以根据建筑物线面语义信息、该N个初始位姿和航拍模型,确定第一位姿。步骤204作为一个可选的步骤,可以在执行步骤203之后,对步骤203所确定N个初始位姿进行多帧联合优化,以得到优化后的N个初始位姿。
多帧联合优化的一种可实现方式可以为:结合至少一个第二图像对上述步骤203的N个初始位姿进行优化。该至少一个第二图像的解释说明可以参见图5所示实施例的步骤103的解释说明,此处不再赘述。本实施例以两个第二图像为例进行举例说明。以I
0表示第一图像,I
1和I
2表示两个第二图像。例如,基于三个图像的天际线,以及SLAM给出的三个图像之间的相对位姿,优化I
0的N个初始位姿。
示例性的,为了更为精准地确定第一图像的第一位姿,该三个图像进行的拍摄视野有一定交集,且三个图像组成的整体视野较大。优化方式具体可以是:
将步骤203获取的I
0的N个初始位姿记为
基于
计算得到I
1和I
2在航拍模型中的位姿,分别根据I
1的位姿和I
2的位姿进行天际线渲染,得到I
1的渲染的天际线和I
2的渲染的天际线,计算I
1的渲染的天际线与I
1的天际线的匹配度,计算I
2的渲染的天际线与I
2的天际线的匹配度,将I
0、I
1和I
2的天际线的匹配度加在一起,记为
用来衡量
的估计精度。举例来说,如果n=3时,对应的
最高,则对应的初始位姿
优于其他初始位姿。
针对I
1和I
2进行和上述方式相同的处理方式,例如,如果获取的I
1的N个初始位姿记为
该I
1的N个初始位姿可以针对I
1采用上述步骤201至步骤203的方式获取该I
1的N个初始位姿,
表示I
0和I
2相对于I
1的位姿转换关系,
可以由SLAM给出,基于
计算得到I
0和I
2在航拍模型中的位姿,分别根据I
0的位姿和I
2的位姿进行天际线渲染,得到I
0的渲染的天际线和I
2的渲染的天际线,计算I
0的渲染的天际线与I
0的天际线的匹配度,计算I
2的渲染的天际线与I
2的天际线的匹配度,然后计算I
0和I
2在该位姿下的天际线匹配度,最后将I
0、I
1和I
2的天际线匹配值加在一起,记为
用来衡量
的估计精度。举例来说,如果n=3时,对应的
最高,则对应的初始位姿
优于其他初始位姿。
步骤205、根据建筑物线面语义信息、优化后的N个初始位姿和航拍模型,确定第一位姿。
遍历上一步骤得到的优化后的N个初始位姿,基于航拍模型,分别渲染出N个初始位姿的建筑物的线面语义信息,根据渲染出的建筑物的线面语义信息和上述步骤202得到的建筑物线面语义信息,计算各个初始位姿的语义重投影误差,选取一个语义重投影误差最小的位姿,得到该第一图像I
0的3-dof位姿。结合SLAM给出的相机坐标系和重力方向的关系,得到该第一图像的另外三个自由度的信息,得到第一图像I
0的6dof位姿(相对于世界坐标系的6dof位姿),即上述第一位姿。
可选的,还可以通过PnL(Perspective-n-Line)求解更精确的位姿,即使用下述步骤优化上述第一图像I
0的6dof位姿,将优化后的位姿作为第一位姿。
例如,具体优化上述第一图像I
0的6dof位姿,确定第一位姿的步骤可以为:
a.根据当前的位姿(上述第一图像I
0的6dof位姿),从航拍模型中提取当前视角下的3D线段信息(例如,建筑物线的横竖线段);
b.将3D线段信息输入至PnL算法中,输出优化的位姿;
c.根据优化的位姿,在对应的航拍模型上重投影渲染出语义分割图,计算该图与图像对应的语义分割图的匹配误差;重复b-c,直至匹配误差收敛,得到更精细的6-dof位姿;
d.在求解得到的位姿附近随机采样一些位姿,重复步骤a-c,如果计算的位姿更优(以航拍模型到手机拍摄图像的语义重投影匹配误差为衡量标准),则更新位姿,尽量避免以上步骤优化的位姿陷入局部最优。
步骤206、根据第一位姿、以及第一图像和至少一个第二图像之间的相对位姿,确定优化后的第一位姿。
示例性的,根据SLAM给出的第一图像和至少一个第二图像之间(也称帧间)的相对位姿,通过求解PoseGraph优化,得到第一图像I
0的最佳6dof位姿估计,即优化后的第一位姿。
本实施例,根据第一图像的天际线,确定N个初始位姿,根据第一图像的建筑物线面语义信息,在N个初始位姿的基础上,优化得到第一位姿,从而可以有效结合第一图像的天际线和建筑物线面语义信息,提升视觉定位的定位成功率和定位精度。
在视觉定位过程中,还可以通过结合至少一个第二图像的天际线和建筑物线面语义信息,优化位姿,提升定位精度。
图8为本申请实施例提供的另一种视觉定位方法的流程图,本实施例的方法涉及终端设备和服务器,本实施例在图5所示实施例的基础上,进一步结合空地模型,优化第一图像的第一位姿,以实现更为精确的视觉定位,如图8所示,本实施例的方法可以包括:
步骤301、终端设备采集第一图像。
步骤302、终端设备向服务器发送第一图像。
步骤303、服务器根据第一图像和航拍模型,确定第一位姿。
步骤304、服务器根据第一位姿确定第一虚拟物体描述信息,向终端设备发送该第一虚拟物体描述信息。
步骤305、终端设备在用户界面上显示该第一虚拟物体描述信息对应的虚拟物体。
其中,步骤301至步骤305的解释说明可以参见图5所示实施例的步骤101至步骤105,此处不再赘述。
步骤306、服务器判断空地模型中是否存在第一位姿对应的地面模型。若是,则执行步骤307。
其中,空地模型包括航拍模型和映射至航拍模型中的地面模型,该空地模型中的地面模型的坐标系与航拍模型的坐标系相同。其中,空地模型的具体构建方式可以参见下述图10所示实施例的具体解释说明。
步骤307、当存在第一位姿对应的地面模型时,根据地面模型确定第二位姿。
其中,第二位姿的定位精度高于第一位姿的定位精度。
通过第一位姿对应的地面模型,可以进行精细化的视觉定位,确定第二位姿。该精细化的视觉定位可以包括图像检索、特征点提取和特征点匹配等处理过程。
步骤308、服务器根据第二位姿确定第二虚拟物体描述信息,向终端设备发送该第二虚拟物体描述信息。
例如,服务器可以根据第二位姿确定第一虚拟物体描述信息,该第二虚拟物体描述信息用于在终端设备上显示相应的虚拟物体,例如,如图4A所示的咖啡馆的引导图标,该引导图标显示在真实世界的实际场景中,即显示在如图4A所示的建筑物上。
相较于第一虚拟物体描述信息,第二虚拟物体描述信息是基于更为精细化的第二位姿确定的,该第二虚拟物体描述信息对应的虚拟物体可以是更为细节的虚拟物体,例如,第一虚拟物体描述信息对应的虚拟物体可以是道路引导图标,第二虚拟物体描述信息对应的虚拟物体可以是街道内的店铺的引导图标。
步骤309、终端设备在用户界面上显示该第二虚拟物体描述信息对应的虚拟物体。
终端设备在用户界面上显示该第一虚拟物体描述信息对应的虚拟物体和该第二虚拟物体描述信息对应的虚拟物体,该用户界面中显示有真实世界的实际场景,该虚拟物体可以采用增强现实的方式显示在该用户界面上。
本实施例,服务器判断空地模型中是否存在第一位姿对应的地面模型,当存在第一位姿对应的地面模型时,根据地面模型确定第二位姿,服务器根据第二位姿确定第二虚拟物体描述信息,向终端设备发送该第二虚拟物体描述信息,终端设备在用户界面上显示该第二虚拟物体描述信息对应的虚拟物体,可以提升视觉定位的成功率和精度,以及服务器向终端设备推送虚拟物体描述信息的准确性。
下面结合图9,通过具体示例,对上述图8所示实施例的视觉定位方法进行说明。
图9为本申请实施例提供的一种用户界面示意图。如图9所示,包括用户界面901-用户界面904。
如用户界面901所示,终端设备可以采集第一图像,该第一图像呈现在用户界面901中。
可选的,在用户界面901中还可以显示第一提示信息,该第一提示信息用于提示用户拍摄天际线,例如,该第一提示信息可以是“保证拍摄到天际线”。
用户界面901中的第一图像包括天际线,所以可以满足视觉定位需求。终端设备可以通过上述步骤302将第一图像发送给服务器。服务器可以通过上述步骤303至304,确定第一位姿,将终端设备发送第一位姿对应的第一虚拟物体描述信息。终端设备根据该第一虚拟物体描述信息可以显示用户界面902,用户界面902中呈现了第一虚拟物体描述信息对应的虚拟物体,例如,云朵。
服务器还可以根据上述步骤306判断空地模型中是否存在第一位姿对应的地面模型,当存在第一位姿对应的地面模型时,可以向终端设备发送指示消息,该指示消息用于指示空地模型中存在第一位姿对应的地面模型,终端设备根据该指示消息,可以在用户界面上显示第二提示信息,该第二提示信息用于提示用户可供选择的操作方式,例如,请参见用户界面903,该第二提示信息为“是否需要进一步定位”以及操作图标“是”、“否”。
用户可以点击“是”的操作图标,终端设备根据用户的操作,向服务器发送定位优化请求消息,该定位优化请求消息用于请求计算第二位姿。服务器通过步骤307和步骤308,确定第二位姿,并向终端设备发送第二虚拟物体描述信息,终端设备在用户界面上呈现该第二虚拟物体描述信息对应虚拟物体,例如,用户界面904,用户界面904中呈现了第一虚拟物体描述信息对应的虚拟物体,例如,云朵,以及第二虚拟物体描述信息对应的虚拟物体,例如,太阳和闪电。
本申请实施例的服务器进行两方面的操作,一方面是在线视觉定位计算,包括第一位姿和第二位姿的求解,如上述各个实施例所述。另一方面是离线的空地地图构建,具体可以参见下图10所示。离线的空地地图构建主要指的是:服务器端获取终端设备上传的构建地面模型的多个图像,通过本申请实施例所述改进的Geo-localization视觉定位算法,确定多个图像在航拍模型中的第一位姿,同时多个图像进行SFM操作构建地面模型并获取多个图像在地面模型中的位姿,通过多个图像在航拍模型中获取的第一位姿和对应地面模型中的相应位姿,通过语义重投影误差对齐航拍模型和地面模型以获取空地模型。
图10为本申请实施例提供的一种空地模型建模方法的流程图,如图10所示,该方法可以包括:
步骤401、获取构建地面模型的多个图像。
示例性的,在局部区域内,用户使用终端设备采集图像并上传到服务器,服务器基于SFM算法进行3D建模,得到地面模型。即该图像用于构建地面模型。该图像需要采集到天际线。在用户采集该图像过程中,可以通过如图11所示的用户界面上的提示信息,以提示用户图像的拍摄需求。
例如,如图12所示的每一行的第一列的上面的图像的示例,该图像即为构建地面模型的一个图像。基于图像所构建的地面模型的点云可以如第二列所示。
步骤402、确定构建地面模型的多个图像在航拍模型中的位姿。
其中,航拍模型可以是,首先使用无人机/卫星对应用场景进行图像采集,然后基于倾 斜摄影进行2.5D模型构建,得到的该航拍模型。例如,如图12所示的每一行的第一列的下面的图像的示例,该图像即为一个无人机/卫星所采集的图像,用于构建航拍模型,所构建的航拍模型的点云可以如第三列所示。
本实施例可以通过如图6所示的改进的基于航拍模型的视觉定位(Geo-localization)方法确定图像在航拍模型中的位姿,具体的,将图6所示的方法中的第一图像替换为本实施例的步骤401中的每一个图像,从而确定每一个图像在航拍模型中的位姿。
步骤403、根据多个图像在航拍模型中的位姿和多个图像在地面模型中的位姿,对齐航拍模型和地面模型以获取空地模型。
其中,空地模型包括航拍模型和映射至航拍模型中的地面模型,空地模型中的地面模型的坐标系与航拍模型的坐标系相同。
例如,构建的空地模型的点云可以如图12所示的每一行的第四列的图像所示,即融合了地面模型的点云和航拍模型的点云。基于空地模型的点云可以得到如图12所示的每一行的第五列所示的重建网络。
一种可实现方式,根据多个图像在航拍模型中的位姿和多个图像在地面模型中的位姿,确定多种坐标转换关系。分别根据多种坐标转换关系,确定多个图像在航拍模型中的语义重投影误差,在多种坐标转换关系中选取最优的坐标转换关系,作为空地模型的坐标转换关系,空地模型的坐标转换关系用于对齐航拍模型和地面模型。根据空地模型的坐标转换关系,将地面模型映射至航拍模型中,获取空地模型。其中,最优的坐标转换关系为使得语义重投影误差最小的坐标转换关系。即通过上述方式,将地面模型注册至航拍模型中,获取空地模型。
对齐航拍模型和地面模型的具体的可实现方式可以为:
通过相似变换,基于P
i
G(i=0...M-1)和P
i
A(i=0...M-1),针对每幅图像可以得到地面模型和航拍模型之间的坐标系转换关系T
i
G2A(i=0...M-1)。
遍历T
i
G2A(i=0...M-1),举例而言,基于
计算
在航拍模型中的语义重投影误差,这里的语义重投影误差指的是建筑物横竖线面误差,并将语义重投影误差累加得到
遍历T
i
G2A(i=0...M-1)得到不同的语义重投影误差
本实施例,通过将地面模型映射至航拍模型中,构建空地模型,该空地模型是一种层 次化的模型,融合了航拍模型的大范围场景的信息和地面模型的精细化的信息,使得使用该空地模型的视觉定位方法,可以进行快捷、高效、可适用于大范围场景的粗定位,以满足县区级、地市级的视觉定位需求,并基于粗定位的结果,进行精细化视觉定位,从而实现层次化视觉定位,提升视觉定位的准确率。
本申请实施例还提供一种视觉定位装置,用于执行以上各方法实施例中服务器或服务器的处理器执行的方法步骤。如图13所示,该视觉定位装置可以包括:收发模块131和处理模块132。
处理模块132,用于通过收发模块131获取终端设备采集的第一图像。
处理模块132,还用于根据第一图像和航拍模型,确定第一位姿。判断空地模型中是否存在第一位姿对应的地面模型;当存在第一位姿对应的地面模型时,根据地面模型确定第二位姿。
其中,空地模型包括航拍模型和映射至航拍模型中的地面模型,空地模型中的地面模型的坐标系与航拍模型的坐标系相同,第二位姿的定位精度高于第一位姿的定位精度。
在一些实施例中,处理模块132用于:根据第一图像对应的终端设备的位置信息和磁力计角度偏转信息,确定初始位姿集合。根据第一图像获取第一图像的天际线和建筑物线面语义信息。根据第一图像的天际线和航拍模型,在初始位姿集合中确定N个初始位姿。根据建筑物线面语义信息、N个初始位姿和航拍模型,确定第一位姿。其中,N为大于1的整数。
在一些实施例中,处理模块132还用于通过收发模块131获取终端设备采集的至少一个第二图像,第一图像和至少一个第二图像的拍摄视野存在交集。处理模块132还用于根据N个初始位姿、第一图像的天际线和至少一个第二图像的天际线、以及第一图像和至少一个第二图像之间的相对位姿,确定优化后的N个初始位姿。根据建筑物线面语义信息、优化后的N个初始位姿和航拍模型,确定所述第一位姿。
在一些实施例中,处理模块132还用于:根据N个初始位姿、以及第一图像和至少一个第二图像之间的相对位姿,确定优化后的N个初始位姿。
在一些实施例中,初始位姿集合包括多组初始位姿,每组初始位姿包括初始位置信息和初始磁力计角度偏转信息,初始位置信息属于第一阈值范围内,第一阈值范围为根据终端设备的位置信息确定的,初始磁力计角度偏转信息属于第二阈值范围内,第二阈值范围为根据终端设备的磁力计角度偏转信息确定的。
在一些实施例中,第一阈值范围的中心值为终端设备的位置信息,第二阈值范围的中心值为终端设备的磁力计角度偏转信息。
在一些实施例中,处理模块132用于:分别根据每组初始位姿和所述航拍模型,进行天际线渲染,获取每组初始位姿对应的天际线;分别计算每组初始位姿对应的天际线与第一图像的天际线的匹配度,确定每组初始位姿的匹配度;根据每组初始位姿的匹配度,在初始位姿集合中确定N个初始位姿,该N个初始位姿为初始位姿集合中匹配度从大到小排序的前N个初始位姿。
在一些实施例中,处理模块132还用于:基于构建地面模型的多个第三图像和航拍模型,构建空地模型。
在一些实施例中,处理模块132用于:根据航拍模型,确定多个第三图像在航拍模型中的位姿;根据多个第三图像在航拍模型中的位姿和多个第三图像在地面模型中的位姿,确定空地模型。
在一些实施例中,处理模块132用于:根据多个第三图像在航拍模型中的位姿和多个第三图像在地面模型中的位姿,确定多种坐标转换关系;分别根据多种坐标转换关系,确定多个第三图像在航拍模型中的语义重投影误差,在多种坐标转换关系中选取最优的坐标转换关系作为空地模型。其中,该最优的坐标转换关系为使得语义重投影误差最小的坐标转换关系。
在一些实施例中,处理模块132还用于:根据第一位姿确定第一虚拟物体描述信息。通过收发模块131向终端设备发送第一虚拟物体描述信息,该第一虚拟物体描述信息用于在终端设备上显示对应的虚拟物体。
在一些实施例中,处理模块132还用于:根据第二位姿确定第二虚拟物体描述信息。通过收发模块131向终端设备发送第二虚拟物体描述信息,该第二虚拟物体描述信息用于在终端设备上显示对应的虚拟物体。
本申请实施例提供的视觉定位装置可以用于执行上述视觉定位方法,其内容和效果可参考方法部分,本申请实施例对此不再赘述。
本申请实施例还提供一种视觉定位装置,如图14所示,该视觉定位装置包括处理器1401和传输接口1402,该传输接口1402用于获取采集的第一图像。
传输接口1402可以包括发送接口和接收接口,示例性的,传输接口1402可以为根据任何专有或标准化接口协议的任何类别的接口,例如高清晰度多媒体接口(high definition multimedia interface,HDMI)、移动产业处理器接口(Mobile Industry Processor Interface,MIPI)、MIPI标准化的显示串行接口(Display Serial Interface,DSI)、视频电子标准协会(Video Electronics Standards Association,VESA)标准化的嵌入式显示端口(Embedded Display Port,eDP)、Display Port(DP)或者V-By-One接口,V-By-One接口是一种面向图像传输开发的数字接口标准,以及各种有线或无线接口、光接口等。
该处理器1401被配置为调用存储在存储器中的程序指令,以执行如上述方法实施例的视觉定位方法,其内容和效果可参考方法部分,本申请实施例对此不再赘述。可选的,该装置还包括存储器1403。该处理器1402可以为单核处理器或多核处理器组,该传输接口1402为接收或发送数据的接口,该视觉定位装置处理的数据可以包括音频数据、视频数据或图像数据。示例性的,该视觉定位装置可以为处理器芯片。
本申请实施例另一些实施例还提供一种计算机存储介质,该计算机存储介质可包括计算机指令,当该计算机指令在电子设备上运行时,使得该电子设备执行上述方法实施例中服务器执行的各个步骤。
本申请实施例另一些实施例还提供一种计算机程序产品,当该计算机程序产品在计算机上运行时,使得该计算机执行上述方法实施例中服务器执行的各个步骤。
本申请实施例另一些实施例还提供一种装置,该装置具有实现上述方法实施例中服务器行为的功能。所述功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。所述硬件或软件包括一个或多个与上述功能相对应的模块,例如,获取单元或模块,确定单元 或模块。
本申请实施例还提供一种视觉定位装置,用于执行以上各方法实施例中终端设备或终端设备的处理器执行的方法步骤。如图15所示,该视觉定位装置可以包括:处理模块151和收发模块152。
处理模块151,用于采集第一图像,并在用户界面上显示第一图像,该第一图像包括拍摄到的天际线。
处理模块151,还用于通过收发模块152向服务器发送第一图像。
收发模块152,还用于接收服务器发送的第一虚拟物体描述信息,该第一虚拟物体描述信息为根据第一位姿确定的,该第一位姿为根据所述第一图像的天际线和建筑物线面语义信息、以及航拍模型确定的。
处理模块151,还用于在用户界面上叠加显示第一虚拟物体描述信息对应的虚拟物体。
在一些实施例中,处理模块151还用于:采集第一图像之前,在用户界面上显示第一提示信息,该第一提示信息用于提示用户拍摄天际线。
在一些实施例中,收发模块152还用于:接收服务器发送的指示消息,该指示消息用于指示空地模型中存在第一位姿对应的地面模型,该地面模型用于确定第二位姿,空地模型包括航拍模型和映射至航拍模型中的地面模型,地面模型的坐标系与航拍模型的坐标系相同。处理模块151还用于根据指示消息,在用户界面上显示第二提示信息,第二提示信息用于提示用户可供选择的操作方式。
在一些实施例中,处理模块151还用于:接收用户通过用户界面或者在硬件按钮上输入的重定位指令,响应于重定位指令,通过收发模块152向服务器发送定位优化请求消息,该定位优化请求消息用于请求计算第二位姿。收发模块152还用于接收服务器发送的第二虚拟物体描述信息,该第二虚拟物体描述信息为根据第二位姿确定的,该第二位姿为根据第一位姿对应的地面模型确定的,第二位姿的定位精度高于第一位姿的定位精度。
本申请实施例提供的视觉定位装置可以用于执行上述视觉定位方法,其内容和效果可参考方法部分,本申请实施例对此不再赘述。
图16为本申请实施例的一种视觉处理装置的结构示意图。如图16所示,视觉处理装置1600可以是上述实施例中涉及到的终端设备。视觉处理装置1600包括处理器1601和收发器1602。
可选地,视觉处理装置1600还包括存储器1603。其中,处理器1601、收发器1602和存储器1603之间可以通过内部连接通路互相通信,传递控制信号和/或数据信号。
其中,存储器1603用于存储计算机程序。处理器1601用于执行存储器1603中存储的计算机程序,从而实现上述装置实施例中的各功能。
可选地,存储器1603也可以集成在处理器1601中,或者独立于处理器1601。
可选地,视觉处理装置1600还可以包括天线1604,用于将收发器1602输出的信号发射出去。或者,收发器1602通过天线接收信号。
可选地,视觉处理装置1600还可以包括电源1605,用于给终端设备中的各种器件或电路提供电源。
除此之外,为了使得终端设备的功能更加完善,视觉处理装置1600还可以包括输入单元1606、显示单元1607(也可以认为是输出单元)、音频电路1608、摄像头1609和传感器1610等中的一个或多个。音频电路还可以包括扬声器16081、麦克风16082等,不再赘述。
本申请实施例另一些实施例还提供一种计算机存储介质,该计算机存储介质可包括计算机指令,当该计算机指令在电子设备上运行时,使得该电子设备执行上述方法实施例中终端设备执行的各个步骤。
本申请实施例另一些实施例还提供一种计算机程序产品,当该计算机程序产品在计算机上运行时,使得该计算机执行上述方法实施例中终端设备执行的各个步骤。
本申请实施例另一些实施例还提供一种装置,该装置具有实现上述方法实施例中终端设备行为的功能。所述功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。所述硬件或软件包括一个或多个与上述功能相对应的模块,例如,采集单元或模块,发送单元或模块,显示单元或模块。
本申请实施例还提供一种空地模型建模装置,用于执行以上图10所示实施例的方法步骤。该空地模型建模装置可以包括:获取模块和处理模块。
获取模块,用于获取构建地面模型的多个图像。
处理模块,用于确定构建地面模型的多个图像在航拍模型中的位姿。
处理模块,还用于根据多个图像在所述航拍模型中的位姿和多个图像在所述地面模型中的位姿,对齐所述航拍模型和所述地面模型以获取空地模型。
其中,空地模型包括航拍模型和映射至航拍模型中的地面模型,空地模型中的地面模型的坐标系与所述航拍模型的坐标系相同。
在一些实施例中,处理模块用于:根据多个图像在航拍模型中的位姿和多个图像在所述地面模型中的位姿,确定多种坐标转换关系;分别根据多种坐标转换关系,确定多个图像在航拍模型中的语义重投影误差,在多种坐标转换关系中选取最优的坐标转换关系,作为空地模型的坐标转换关系,空地模型的坐标转换关系用于对齐航拍模型和地面模型;根据空地模型的坐标转换关系,将地面模型映射至航拍模型中,获取空地模型;其中,最优的坐标转换关系为使得语义重投影误差最小的坐标转换关系。
本申请实施例提供的视觉定位装置可以用于执行上述图10所示的方法步骤,其内容和效果可参考方法部分,本申请实施例对此不再赘述。
空地模型建模装置获取空地模型后,可以将该空地模型配置至相应的服务器中,该服务器向终端设备提供视觉定位功能服务。
以上各实施例中提及的处理器可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法实施例的各步骤可以通过处理器中的硬件的集成逻辑电路或者软件形式的指令完成。处理器可以是通用处理器、数字信号处理器(digital signal processor,DSP)、特定应用集成电路(application-specific integrated circuit,ASIC)、现场可编程门阵列(field programmable gate array,FPGA)或其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。本申请实施例公开的方法的步骤可以直接体现为硬件编码处理器执行完成,或者用编码处理器中的硬件及软件模块组合执行完成。软件 模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器,处理器读取存储器中的信息,结合其硬件完成上述方法的步骤。
上述各实施例中提及的存储器可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(read-only memory,ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(random access memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(static RAM,SRAM)、动态随机存取存储器(dynamic RAM,DRAM)、同步动态随机存取存储器(synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(double data rate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(direct rambus RAM,DR RAM)。应注意,本文描述的系统和方法的存储器旨在包括但不限于这些和任意其它适合类型的存储器。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
Claims (30)
- 一种视觉定位方法,其特征在于,包括:获取采集的第一图像;根据所述第一图像和航拍模型,确定第一位姿;判断空地模型中是否存在所述第一位姿对应的地面模型;当存在所述第一位姿对应的地面模型时,根据所述地面模型确定第二位姿;其中,所述空地模型包括所述航拍模型和映射至所述航拍模型中的地面模型,所述地面模型的坐标系与所述航拍模型的坐标系相同,所述第二位姿的定位精度高于所述第一位姿的定位精度。
- 根据权利要求1所述的方法,其特征在于,所述根据所述第一图像和航拍模型,确定第一位姿,包括:根据所述第一图像对应的终端设备的位置信息和磁力计角度偏转信息,确定初始位姿集合;根据所述第一图像获取所述第一图像的天际线和建筑物线面语义信息;根据所述第一图像的天际线和所述航拍模型,在所述初始位姿集合中确定N个初始位姿;根据所述建筑物线面语义信息、所述N个初始位姿和所述航拍模型,确定所述第一位姿;其中,N为大于1的整数。
- 根据权利要求2所述的方法,其特征在于,所述方法还包括:获取采集的至少一个第二图像,所述第一图像和所述至少一个第二图像的视角不同;根据所述N个初始位姿、所述第一图像的天际线和所述至少一个第二图像的天际线、以及所述第一图像和所述至少一个第二图像之间的相对位姿,确定优化后的N个初始位姿;所述根据所述建筑物线面语义信息、所述N个初始位姿和所述航拍模型,确定所述第一位姿,包括:根据所述建筑物线面语义信息、所述优化后的N个初始位姿和所述航拍模型,确定所述第一位姿。
- 根据权利要求3所述的方法,其特征在于,所述方法还包括:根据所述N个初始位姿、以及所述第一图像和所述至少一个第二图像之间的相对位姿,确定优化后的N个初始位姿。
- 根据权利要求2至4任一项所述的方法,其特征在于,所述初始位姿集合包括多组初始位姿,每组初始位姿包括初始位置信息和初始磁力计角度偏转信息,所述初始位置信息属于第一阈值范围内,所述第一阈值范围为根据所述终端设备的位置信息确定的,所述初始磁力计角度偏转信息属于第二阈值范围内,所述第二阈值范围为根据所述终端设备的磁力计角度偏转信息确定的。
- 根据权利要求5所述的方法,其特征在于,所述第一阈值范围的中心值为所述终端设备的位置信息,所述第二阈值范围的中心值为所述终端设备的磁力计角度偏转信息。
- 根据权利要求5或6所述的方法,其特征在于,所述根据所述第一图像的天际线和所述航拍模型,在所述初始位姿集合中确定N个初始位姿,包括:分别根据每组初始位姿和所述航拍模型,进行天际线渲染,获取每组初始位姿对应的天际线;分别计算每组初始位姿对应的天际线与所述第一图像的天际线的匹配度,确定每组初始位姿的匹配度;根据每组初始位姿的匹配度,在所述初始位姿集合中确定N个初始位姿,所述N个初始位姿为所述初始位姿集合中匹配度从大到小排序的前N个初始位姿。
- 根据权利要求1至7任一项所述的方法,其特征在于,所述方法还包括:基于构建地面模型的多个第三图像和所述航拍模型,构建所述空地模型。
- 根据权利要求8所述的方法,其特征在于,所述基于构建地面模型的多个第三图像和所述航拍模型,构建所述空地模型,包括:根据所述航拍模型,确定所述多个第三图像在所述航拍模型中的位姿;根据所述多个第三图像在所述航拍模型中的位姿和所述多个第三图像在所述地面模型中的位姿,确定所述空地模型。
- 根据权利要求9所述的方法,其特征在于,所述根据所述多个第三图像在所述航拍模型中的位姿和所述多个第三图像在所述地面模型中的位姿,确定所述空地模型,包括:根据所述多个第三图像在所述航拍模型中的位姿和所述多个第三图像在所述地面模型中的位姿,确定多种坐标转换关系;分别根据所述多种坐标转换关系,确定所述多个第三图像在航拍模型中的语义重投影误差,在所述多种坐标转换关系中选取最优的坐标转换关系作为所述空地模型;其中,所述最优的坐标转换关系为使得语义重投影误差最小的坐标转换关系。
- 根据权利要求1至10任一项所述的方法,其特征在于,所述方法还包括:根据所述第一位姿或第二位姿确定虚拟物体描述信息;向终端设备发送所述虚拟物体描述信息,所述虚拟物体描述信息用于在所述终端设备上显示对应的虚拟物体。
- 一种视觉定位方法,其特征在于,包括:采集第一图像,并在用户界面上显示所述第一图像,所述第一图像包括拍摄到的天际线;向服务器发送所述第一图像;接收所述服务器发送的第一虚拟物体描述信息,所述第一虚拟物体描述信息为根据第一位姿确定的,所述第一位姿为根据所述第一图像的天际线和建筑物线面语义信息、以及航拍模型确定的;在所述用户界面上叠加显示所述第一虚拟物体描述信息对应的虚拟物体。
- 根据权利要求12所述的方法,其特征在于,采集第一图像之前,所述方法还包括:在所述用户界面上显示第一提示信息,所述第一提示信息用于提示用户拍摄天际线。
- 根据权利要求12或13所述的方法,其特征在于,所述方法还包括:接收所述服务器发送的指示消息,所述指示消息用于指示空地模型中存在所述第一位姿对应的地面模型,所述地面模型用于确定第二位姿,所述空地模型包括航拍模型和映射至所述航拍模型中的地面模型,所述地面模型的坐标系与所述航拍模型的坐标系相同;根据所述指示消息,在所述用户界面上显示第二提示信息,所述第二提示信息用于提 示用户可供选择的操作方式。
- 根据权利要求14所述的方法,其特征在于,所述方法还包括:接收用户通过所述用户界面或者在硬件按钮上输入的重定位指令,响应于所述重定位指令,向所述服务器发送定位优化请求消息,所述定位优化请求消息用于请求计算所述第二位姿;接收所述服务器发送的第二虚拟物体描述信息,所述第二虚拟物体描述信息为根据第二位姿确定的,所述第二位姿为根据所述第一位姿对应的地面模型确定的,所述第二位姿的定位精度高于所述第一位姿的定位精度。
- 一种视觉定位装置,其特征在于,包括:处理模块,用于通过收发模块获取终端设备采集的第一图像;所述处理模块,还用于根据所述第一图像和航拍模型,确定第一位姿;所述处理模块,还用于判断空地模型中是否存在所述第一位姿对应的地面模型;当存在所述第一位姿对应的地面模型时,根据所述地面模型确定第二位姿;其中,所述空地模型包括所述航拍模型和映射至所述航拍模型中的地面模型,所述地面模型的坐标系与所述航拍模型的坐标系相同,所述第二位姿的定位精度高于所述第一位姿的定位精度。
- 根据权利要求16所述的装置,其特征在于,所述处理模块用于:根据所述第一图像对应的终端设备的位置信息和磁力计角度偏转信息,确定初始位姿集合;根据所述第一图像获取所述第一图像的天际线和建筑物线面语义信息;根据所述第一图像的天际线和所述航拍模型,在所述初始位姿集合中确定N个初始位姿;根据所述建筑物线面语义信息、所述N个初始位姿和所述航拍模型,确定所述第一位姿;其中,N为大于1的整数。
- 根据权利要求17所述的装置,其特征在于,所述处理模块还用于通过所述收发模块获取终端设备采集的至少一个第二图像,所述第一图像和所述至少一个第二图像的视角不同;所述处理模块还用于根据所述N个初始位姿、所述第一图像的天际线和所述至少一个第二图像的天际线、以及所述第一图像和所述至少一个第二图像之间的相对位姿,确定优化后的N个初始位姿;根据所述建筑物线面语义信息、所述优化后的N个初始位姿和所述航拍模型,确定所述第一位姿。
- 根据权利要求18所述的装置,其特征在于,所述处理模块还用于:根据所述N个初始位姿、以及所述第一图像和所述至少一个第二图像之间的相对位姿,确定优化后的N个初始位姿。
- 根据权利要求17至19任一项所述的装置,其特征在于,所述初始位姿集合包括多组初始位姿,每组初始位姿包括初始位置信息和初始磁力计角度偏转信息,所述初始位置信息属于第一阈值范围内,所述第一阈值范围为根据所述终端设备的位置信息确定的, 所述初始磁力计角度偏转信息属于第二阈值范围内,所述第二阈值范围为根据所述终端设备的磁力计角度偏转信息确定的。
- 根据权利要求20所述的装置,其特征在于,所述第一阈值范围的中心值为所述终端设备的位置信息,所述第二阈值范围的中心值为所述终端设备的磁力计角度偏转信息。
- 根据权利要求20或21所述的装置,其特征在于,所述处理模块用于:分别根据每组初始位姿和所述航拍模型,进行天际线渲染,获取每组初始位姿对应的天际线;分别计算每组初始位姿对应的天际线与所述第一图像的天际线的匹配度,确定每组初始位姿的匹配度;根据每组初始位姿的匹配度,在所述初始位姿集合中确定N个初始位姿,所述N个初始位姿为所述初始位姿集合中匹配度从大到小排序的前N个初始位姿。
- 根据权利要求16至22任一项所述的装置,其特征在于,所述处理模块还用于:基于构建地面模型的多个第三图像和所述航拍模型,构建所述空地模型。
- 根据权利要求23所述的装置,其特征在于,所述处理模块用于:根据所述航拍模型,确定所述多个第三图像在所述航拍模型中的位姿;根据所述多个第三图像在所述航拍模型中的位姿和所述多个第三图像在所述地面模型中的位姿,确定所述空地模型。
- 根据权利要求24所述的装置,其特征在于,所述处理模块用于:根据所述多个第三图像在所述航拍模型中的位姿和所述多个第三图像在所述地面模型中的位姿,确定多种坐标转换关系;分别根据所述多种坐标转换关系,确定所述多个第三图像在航拍模型中的语义重投影误差,在所述多种坐标转换关系中选取最优的坐标转换关系作为所述空地模型;其中,所述最优的坐标转换关系为使得语义重投影误差最小的坐标转换关系。
- 根据权利要求16至25任一项所述的装置,其特征在于,所述处理模块还用于:根据所述第一位姿或所述第二位姿确定虚拟物体描述信息;通过所述收发模块向终端设备发送所述虚拟物体描述信息,所述虚拟物体描述信息用于在所述终端设备上显示对应的虚拟物体。
- 一种视觉定位装置,其特征在于,包括:处理模块,用于采集第一图像,并在用户界面上显示所述第一图像,所述第一图像包括拍摄到的天际线;所述处理模块,还用于通过收发模块向服务器发送所述第一图像;所述收发模块,还用于接收所述服务器发送的第一虚拟物体描述信息,所述第一虚拟物体描述信息为根据第一位姿确定的,所述第一位姿为根据所述第一图像的天际线和建筑物线面语义信息、以及航拍模型确定的;所述处理模块,还用于在所述用户界面上叠加显示所述第一虚拟物体描述信息对应的虚拟物体。
- 根据权利要求27所述的装置,其特征在于,所述处理模块还用于:采集第一图像之前,在所述用户界面上显示第一提示信息,所述第一提示信息用于提示用户拍摄天际线。
- 根据权利要求27或28所述的装置,其特征在于,所述收发模块还用于:接收所述服务器发送的指示消息,所述指示消息用于指示空地模型中存在所述第一位姿对应的地面模型,所述地面模型用于确定第二位姿,所述空地模型包括航拍模型和映射至所述航拍模型中的地面模型,所述地面模型的坐标系与所述航拍模型的坐标系相同;所述处理模块还用于根据所述指示消息,在所述用户界面上显示第二提示信息,所述 第二提示信息用于提示用户可供选择的操作方式。
- 根据权利要求29所述的装置,其特征在于,所述处理模块还用于:接收用户通过所述用户界面或者在硬件按钮上输入的重定位指令,响应于所述重定位指令,通过所述收发模块向所述服务器发送定位优化请求消息,所述定位优化请求消息用于请求计算所述第二位姿;所述收发模块还用于接收所述服务器发送的第二虚拟物体描述信息,所述第二虚拟物体描述信息为根据第二位姿确定的,所述第二位姿为根据所述第一位姿对应的地面模型确定的,所述第二位姿的定位精度高于所述第一位姿的定位精度。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP20887084.0A EP4050305A4 (en) | 2019-11-15 | 2020-11-06 | METHOD AND DEVICE FOR VISUAL POSITIONING |
US17/743,892 US20220375220A1 (en) | 2019-11-15 | 2022-05-13 | Visual localization method and apparatus |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911122668.2 | 2019-11-15 | ||
CN201911122668 | 2019-11-15 | ||
CN202010126108.0 | 2020-02-27 | ||
CN202010126108.0A CN112815923B (zh) | 2019-11-15 | 2020-02-27 | 视觉定位方法和装置 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/743,892 Continuation US20220375220A1 (en) | 2019-11-15 | 2022-05-13 | Visual localization method and apparatus |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021093679A1 true WO2021093679A1 (zh) | 2021-05-20 |
Family
ID=75852968
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/127005 WO2021093679A1 (zh) | 2019-11-15 | 2020-11-06 | 视觉定位方法和装置 |
Country Status (4)
Country | Link |
---|---|
US (1) | US20220375220A1 (zh) |
EP (1) | EP4050305A4 (zh) |
CN (1) | CN112815923B (zh) |
WO (1) | WO2021093679A1 (zh) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113643434A (zh) * | 2021-07-12 | 2021-11-12 | 广东省国土资源测绘院 | 基于空地协同的三维建模方法、智能终端以及存储装置 |
CN113838129A (zh) * | 2021-08-12 | 2021-12-24 | 高德软件有限公司 | 一种获得位姿信息的方法、装置以及系统 |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113393515B (zh) * | 2021-05-21 | 2023-09-19 | 杭州易现先进科技有限公司 | 一种结合场景标注信息的视觉定位方法和系统 |
CN113776533A (zh) * | 2021-07-29 | 2021-12-10 | 北京旷视科技有限公司 | 可移动设备的重定位方法及装置 |
CN117311837A (zh) * | 2022-06-24 | 2023-12-29 | 北京字跳网络技术有限公司 | 视觉定位参数更新方法、装置、电子设备及存储介质 |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102084398A (zh) * | 2008-06-25 | 2011-06-01 | 微软公司 | 街道级图像到3d建筑物模型的对准 |
JP2012127896A (ja) * | 2010-12-17 | 2012-07-05 | Kumamoto Univ | 移動体位置測定装置 |
CN103424113A (zh) * | 2013-08-01 | 2013-12-04 | 毛蔚青 | 移动终端基于图像识别技术的室内定位与导航方法 |
CN103900539A (zh) * | 2014-03-27 | 2014-07-02 | 北京空间机电研究所 | 一种空中立方体全景成像目标定位方法 |
CN104422439A (zh) * | 2013-08-21 | 2015-03-18 | 希姆通信息技术(上海)有限公司 | 导航方法、装置、服务器、导航系统及其使用方法 |
CN105953798A (zh) * | 2016-04-19 | 2016-09-21 | 深圳市神州云海智能科技有限公司 | 移动机器人的位姿确定方法和设备 |
CN109099888A (zh) * | 2017-06-21 | 2018-12-28 | 中兴通讯股份有限公司 | 一种位姿测量方法、设备及存储介质 |
CN109357673A (zh) * | 2018-10-30 | 2019-02-19 | 上海仝物云计算有限公司 | 基于图像的视觉导航方法和装置 |
US20190094027A1 (en) * | 2016-03-30 | 2019-03-28 | Intel Corporation | Techniques for determining a current location of a mobile device |
Family Cites Families (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7061510B2 (en) * | 2001-03-05 | 2006-06-13 | Digimarc Corporation | Geo-referencing of aerial imagery using embedded image identifiers and cross-referenced data sets |
JP2012511697A (ja) * | 2008-12-09 | 2012-05-24 | トムトム ノース アメリカ インコーポレイテッド | 測地参照データベースを生成する方法 |
US8249302B2 (en) * | 2009-06-30 | 2012-08-21 | Mitsubishi Electric Research Laboratories, Inc. | Method for determining a location from images acquired of an environment with an omni-directional camera |
US8963943B2 (en) * | 2009-12-18 | 2015-02-24 | Electronics And Telecommunications Research Institute | Three-dimensional urban modeling apparatus and method |
AU2011256984B2 (en) * | 2010-05-28 | 2014-03-20 | Bae Systems Plc | Simulating a terrain view from an airborne point of view |
CN103635935B (zh) * | 2011-03-18 | 2016-10-12 | 苹果公司 | 3d街道 |
US20150213590A1 (en) * | 2011-07-29 | 2015-07-30 | Google Inc. | Automatic Pose Setting Using Computer Vision Techniques |
TWI486556B (zh) * | 2013-01-04 | 2015-06-01 | Univ Nat Central | Integration of Radar and Optical Satellite Image for Three - dimensional Location |
US9251419B2 (en) * | 2013-02-07 | 2016-02-02 | Digitalglobe, Inc. | Automated metric information network |
US9418472B2 (en) * | 2014-07-17 | 2016-08-16 | Google Inc. | Blending between street view and earth view |
US9858669B2 (en) * | 2015-10-23 | 2018-01-02 | The Boeing Company | Optimized camera pose estimation system |
US10430961B2 (en) * | 2015-12-16 | 2019-10-01 | Objectvideo Labs, Llc | Using satellite imagery to enhance a 3D surface model of a real world cityscape |
US10627232B2 (en) * | 2017-06-27 | 2020-04-21 | Infatics, Inc. | Method and system for aerial image processing |
US11676296B2 (en) * | 2017-08-11 | 2023-06-13 | Sri International | Augmenting reality using semantic segmentation |
CN109141442B (zh) * | 2018-09-07 | 2022-05-17 | 高子庆 | 基于uwb定位与图像特征匹配的导航方法和移动终端 |
CN109556596A (zh) * | 2018-10-19 | 2019-04-02 | 北京极智嘉科技有限公司 | 基于地面纹理图像的导航方法、装置、设备及存储介质 |
CN109669533B (zh) * | 2018-11-02 | 2022-02-11 | 北京盈迪曼德科技有限公司 | 一种基于视觉和惯性的动作捕捉方法、装置及系统 |
CN110108258B (zh) * | 2019-04-09 | 2021-06-08 | 南京航空航天大学 | 一种单目视觉里程计定位方法 |
CN110223380B (zh) * | 2019-06-11 | 2021-04-23 | 中国科学院自动化研究所 | 融合航拍与地面视角图像的场景建模方法、系统、装置 |
CN110375739B (zh) * | 2019-06-26 | 2021-08-24 | 中国科学院深圳先进技术研究院 | 一种移动端视觉融合定位方法、系统及电子设备 |
-
2020
- 2020-02-27 CN CN202010126108.0A patent/CN112815923B/zh active Active
- 2020-11-06 EP EP20887084.0A patent/EP4050305A4/en active Pending
- 2020-11-06 WO PCT/CN2020/127005 patent/WO2021093679A1/zh unknown
-
2022
- 2022-05-13 US US17/743,892 patent/US20220375220A1/en active Pending
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102084398A (zh) * | 2008-06-25 | 2011-06-01 | 微软公司 | 街道级图像到3d建筑物模型的对准 |
JP2012127896A (ja) * | 2010-12-17 | 2012-07-05 | Kumamoto Univ | 移動体位置測定装置 |
CN103424113A (zh) * | 2013-08-01 | 2013-12-04 | 毛蔚青 | 移动终端基于图像识别技术的室内定位与导航方法 |
CN104422439A (zh) * | 2013-08-21 | 2015-03-18 | 希姆通信息技术(上海)有限公司 | 导航方法、装置、服务器、导航系统及其使用方法 |
CN103900539A (zh) * | 2014-03-27 | 2014-07-02 | 北京空间机电研究所 | 一种空中立方体全景成像目标定位方法 |
US20190094027A1 (en) * | 2016-03-30 | 2019-03-28 | Intel Corporation | Techniques for determining a current location of a mobile device |
CN105953798A (zh) * | 2016-04-19 | 2016-09-21 | 深圳市神州云海智能科技有限公司 | 移动机器人的位姿确定方法和设备 |
CN109099888A (zh) * | 2017-06-21 | 2018-12-28 | 中兴通讯股份有限公司 | 一种位姿测量方法、设备及存储介质 |
CN109357673A (zh) * | 2018-10-30 | 2019-02-19 | 上海仝物云计算有限公司 | 基于图像的视觉导航方法和装置 |
Non-Patent Citations (2)
Title |
---|
LI HAIFENG: "Research on Visual Localization for Mobile Robot in Urban Environment", INFORMATION & TECHNOLOGY, CHINA DOCTORAL DISSERTATIONS FULL-TEXT DATABASE, 1 May 2012 (2012-05-01), XP055813493 * |
See also references of EP4050305A4 |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113643434A (zh) * | 2021-07-12 | 2021-11-12 | 广东省国土资源测绘院 | 基于空地协同的三维建模方法、智能终端以及存储装置 |
CN113643434B (zh) * | 2021-07-12 | 2022-11-15 | 广东省国土资源测绘院 | 基于空地协同的三维建模方法、智能终端以及存储装置 |
CN113838129A (zh) * | 2021-08-12 | 2021-12-24 | 高德软件有限公司 | 一种获得位姿信息的方法、装置以及系统 |
CN113838129B (zh) * | 2021-08-12 | 2024-03-15 | 高德软件有限公司 | 一种获得位姿信息的方法、装置以及系统 |
Also Published As
Publication number | Publication date |
---|---|
US20220375220A1 (en) | 2022-11-24 |
CN112815923B (zh) | 2022-12-30 |
EP4050305A1 (en) | 2022-08-31 |
EP4050305A4 (en) | 2023-04-05 |
CN112815923A (zh) | 2021-05-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11165959B2 (en) | Connecting and using building data acquired from mobile devices | |
WO2021093679A1 (zh) | 视觉定位方法和装置 | |
US11557083B2 (en) | Photography-based 3D modeling system and method, and automatic 3D modeling apparatus and method | |
Chen et al. | Rise of the indoor crowd: Reconstruction of building interior view via mobile crowdsourcing | |
RU2741443C1 (ru) | Способ и устройство для планирования точек выборки для съёмки и картографирования, терминал управления и носитель для хранения данных | |
WO2019238114A1 (zh) | 动态模型三维重建方法、装置、设备和存储介质 | |
WO2019242262A1 (zh) | 基于增强现实的远程指导方法、装置、终端和存储介质 | |
US9699375B2 (en) | Method and apparatus for determining camera location information and/or camera pose information according to a global coordinate system | |
US10127721B2 (en) | Method and system for displaying and navigating an optimal multi-dimensional building model | |
Li et al. | Camera localization for augmented reality and indoor positioning: a vision-based 3D feature database approach | |
CN104169965A (zh) | 用于多拍摄装置系统中图像变形参数的运行时调整的系统、方法和计算机程序产品 | |
CN107566793A (zh) | 用于远程协助的方法、装置、系统及电子设备 | |
CN112927363A (zh) | 体素地图构建方法及装置、计算机可读介质和电子设备 | |
CA3069813C (en) | Capturing, connecting and using building interior data from mobile devices | |
WO2021244114A1 (zh) | 视觉定位方法和装置 | |
CN116858215B (zh) | 一种ar导航地图生成方法及装置 | |
US11418716B2 (en) | Spherical image based registration and self-localization for onsite and offsite viewing | |
CN111652831B (zh) | 对象融合方法、装置、计算机可读存储介质及电子设备 | |
CN115731406A (zh) | 基于页面图的视觉差异性检测方法、装置及设备 | |
CA3102860C (en) | Photography-based 3d modeling system and method, and automatic 3d modeling apparatus and method | |
US20240312136A1 (en) | Automated Generation Of Building Floor Plans Having Associated Absolute Locations Using Multiple Data Capture Devices | |
CN114119737A (zh) | 室内导航的视觉定位方法及相关设备 | |
CN115760584A (zh) | 一种图像处理方法及相关设备 | |
EP4285326A1 (en) | Systems and methods for image capture |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20887084 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2020887084 Country of ref document: EP Effective date: 20220523 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |