[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN114842085B - Full-scene vehicle attitude estimation method - Google Patents

Full-scene vehicle attitude estimation method Download PDF

Info

Publication number
CN114842085B
CN114842085B CN202210780438.0A CN202210780438A CN114842085B CN 114842085 B CN114842085 B CN 114842085B CN 202210780438 A CN202210780438 A CN 202210780438A CN 114842085 B CN114842085 B CN 114842085B
Authority
CN
China
Prior art keywords
image
vehicle
layer
key point
full
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210780438.0A
Other languages
Chinese (zh)
Other versions
CN114842085A (en
Inventor
刘寒松
王永
王国强
刘瑞
翟贵乾
李贤超
焦安健
谭连胜
董玉超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sonli Holdings Group Co Ltd
Original Assignee
Sonli Holdings Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sonli Holdings Group Co Ltd filed Critical Sonli Holdings Group Co Ltd
Priority to CN202210780438.0A priority Critical patent/CN114842085B/en
Publication of CN114842085A publication Critical patent/CN114842085A/en
Application granted granted Critical
Publication of CN114842085B publication Critical patent/CN114842085B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/08Detecting or categorising vehicles
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention belongs to the technical field of vehicle attitude estimation, and relates to a full-scene vehicle attitude estimation method.

Description

Full-scene vehicle attitude estimation method
Technical Field
The invention belongs to the technical field of vehicle attitude estimation, and relates to a full-scene vehicle attitude estimation method.
Background
The automatic driving prospect is wide, the development trend of future automobiles is that the development of automatic driving requires that vehicles have the ability to clearly judge the surrounding environment, correct driving routes and driving behaviors are selected, drivers are assisted to control the vehicles, the driving scenes in reality are complex and changeable, different countermeasures are required in each complex scene, and the estimation of vehicle postures is used as an important task in the automatic driving technology and aims to locate key points of the vehicles from images or videos and help to judge the driving states of the surrounding vehicles.
At present, the main challenge of vehicle attitude estimation is the occlusion problem, and no matter in which driving scene, the occlusion problem exists, such as occlusion between vehicles, occlusion between pedestrians and vehicles, and occlusion between other objects and vehicles, but the existing vehicle attitude estimation method is difficult to identify the vehicle attitude in the occlusion scene, and therefore a vehicle attitude estimation method facing a full scene is urgently needed.
The convolutional neural network obtains excellent performance in the field of attitude estimation, most work regards the deep convolutional neural network as a strong black box predictor, however, how to capture the spatial relationship between components is still unclear, from the viewpoint of science and practical application, the interpretability of the model can help to understand how the model relates variables to achieve final prediction, and how the attitude estimation algorithm processes various input images, and the Transformer can capture long-distance relationship to reveal the dependency relationship between key points in the task of vehicle attitude estimation.
Since the advent of the Transformer, its high computational efficiency and scalability have made it dominate natural language processing, being a deep neural network based mainly on the mechanism of self-attention, and due to its powerful performance, researchers are looking for ways to apply the Transformer to computer vision tasks, where the performance of the Transformer-based model in various vision benchmarking tests is similar or better than that of other types of networks (such as convolutional networks and recursive networks), but no report or use of the model in vehicle pose estimation is known at present.
Disclosure of Invention
The invention aims to overcome the defects in the prior art, designs and provides a full-scene vehicle attitude estimation method, realizes high-efficiency vehicle attitude estimation, takes Swin transform as a backbone network for feature extraction, uses a transform encoder to encode feature map information into position representation of key points, obtains key point dependence items by calculating attention scores, predicts the final key point positions, effectively solves the problem of vehicle occlusion, and realizes full-scene vehicle attitude estimation.
In order to achieve the purpose, the Swin transform is introduced as a backbone network, a network structure is optimized according to the characteristics of a vehicle attitude estimation task, original image information is compressed into a position sequence with compact key points, the vehicle attitude estimation task is converted into a coding task, a key point dependent item is obtained by calculating an attention score, and a final key point position is predicted, wherein the specific process comprises the following steps:
(1) and (3) data set construction:
selecting vehicle images in an open source data set, collecting images of various vehicles in a traffic monitoring and parking lot, constructing a vehicle data set, and dividing the vehicle data set into a training set, a verification set and a test set;
(2) image segmentation: the image in the vehicle data set is segmented into non-overlapping image slices by a slice segmentation module, each image slice is regarded as a mark and is characterized by serial RGB values of the input image;
(3) extracting hierarchical features of a backbone network: the image slice marks obtained in the step (2) firstly pass through a linear embedding layer of a first stage of a backbone network, the feature dimension is changed into a random dimension C, and then, the two embedding layers and a second stage are used for carrying out layered feature extraction to obtain a feature map;
(4) position coding: inputting the characteristic diagram obtained in the step (3) into a position coding layer for position coding, and enabling the characteristic diagram to pass through
Figure 893806DEST_PATH_IMAGE001
Convolution with a bit lineOr one linear layer is flattened into
Figure 996892DEST_PATH_IMAGE002
An
Figure 969527DEST_PATH_IMAGE003
Dimensional vectors which pass through four attention layers and a feedforward neural network and then output characteristic vectors, wherein H and W are the height and width of the image respectively;
(5) generating a keypoint heat map: reshaping the characteristic vector obtained in the step (4) back
Figure 802354DEST_PATH_IMAGE004
Then channel dimensions are determined from
Figure 100611DEST_PATH_IMAGE003
Lowering to K, and generating a predicted K key point heat map, wherein K is the number of key points of each vehicle and has a value of 78;
(6) and outputting a result: and inhibiting the key point heat map to the key point coordinates through a non-maximum value, and marking the positions of the key points in the original image to realize the attitude estimation of the full-scene vehicle.
Further, 78 key points are defined for each vehicle in the vehicle image in the step (1), and a boundary frame and a category of the vehicle, namely a minimum bounding rectangle of the vehicle, are labeled.
Further, the size of each image slice in step (2) is
Figure 392790DEST_PATH_IMAGE005
With a characteristic dimension of
Figure 293750DEST_PATH_IMAGE006
Further, the trunk network in the step (3) adopts a Swin Transformer trunk network, the first stage includes a linear embedding layer and two Swin Transformer blocks, and the number of labels of the two Swin Transformer blocks is
Figure 590870DEST_PATH_IMAGE007
Where H and W are the height and width of the input image; the second stage comprises a linear merging layer and two Swin transform blocks, the image slices subjected to the feature extraction in the first stage are marked in a reduction mode through the linear merging layer, and the linear merging layer enables each group to be marked
Figure 122345DEST_PATH_IMAGE008
The features of adjacent blocks are connected, with a linear layer acting in the dimension of
Figure 262340DEST_PATH_IMAGE009
The number of marks is reduced by 4 times, and the output dimension is changed to
Figure 842357DEST_PATH_IMAGE003
Then, feature transformation is carried out through two Swin transform blocks, and the resolution of the obtained image is
Figure 384196DEST_PATH_IMAGE002
And realizing layered feature extraction.
Further, the position coding layer in step (4) adopts an encoder with a standard transform architecture, the position coding layer regards the feature map as a dynamic weight determined by specific image content, information flow in forward propagation is reweighed, a key point dependency is obtained by calculating a score of a last attention layer, a higher value of a position attention score in an image indicates that the contribution degree to predicting key points is larger, and the blocked key points are predicted by the dependency of the key points.
Compared with the prior art, the method has the advantages that the Swin transducer is used for replacing the traditional convolutional neural network, the layered transducer is adopted for the main network, the calculation efficiency is improved, the linear calculation complexity is low, the long-distance relation in the image is captured by using the encoder of the standard transducer, the dependency relation of the predicted key point is disclosed, the final position of the predicted key point is formed by collecting the dependency item which contributes greatly to the key point through the final attention layer, the shielding problem is solved, the method obtains better balance between the detection precision and the speed, and the method has higher practical application value.
Drawings
Fig. 1 is a schematic structural framework diagram of a vehicle attitude estimation system provided by the present invention.
Fig. 2 is a schematic structural diagram of a first stage of the backbone network according to the present invention.
Fig. 3 is a schematic structural diagram of a second stage of the backbone network according to the present invention.
FIG. 4 is a single structure diagram of the coding layer according to the present invention.
FIG. 5 is a block flow diagram of a vehicle attitude estimation method according to the present invention.
Detailed Description
The invention will be further described by way of examples, without in any way limiting the scope of the invention, with reference to the accompanying drawings.
Example (b):
the embodiment provides a full-scene vehicle attitude estimation method based on a transform backbone and a position encoder, which introduces Swin transform as a backbone network, converts a vehicle attitude estimation task into an encoding task by compressing original image information into a position sequence with compact key points, obtains a key point dependency term by calculating an attention score, predicts a final key point position, can effectively predict a blocked vehicle key point position, and realizes full-scene vehicle attitude estimation, as shown in FIGS. 1-5, and specifically comprises the following steps:
(1) and (3) data set construction:
selecting vehicle images in an open source data set, collecting images containing various vehicles in real scenes such as traffic monitoring, parking lots and the like, constructing a vehicle data set, defining 78 key points on each vehicle, taking a car as an example, mainly defining points with strong local texture feature information, such as corner point definitions (4 corner points of car lamps, 4 corner points of front and rear windshields and the like) on multiple selected vehicles, labeling a boundary frame and a category of the vehicle, namely a minimum bounding rectangle of the vehicle, and finally dividing the data set into a training set, a verification set and a test set;
(2) image segmentation:
the vehicle image is segmented into non-overlapping image slices by a slice segmentation module, each image slice being of a size
Figure 555415DEST_PATH_IMAGE005
Their characteristic dimension is
Figure 293957DEST_PATH_IMAGE006
Each image slice is regarded as a mark and is characterized by serial RGB values of the input image;
(3) extracting hierarchical features of a backbone network;
the backbone network is divided into two stages, the image slice marking firstly passes through the first stage, as shown in fig. 2, the first stage comprises a linear embedding layer and two Swin transducer blocks, the linear embedding layer is applied to the image slice original value characteristics and maps the image slice original value characteristics to a random dimension C, and the number of the transducer blocks is that
Figure 536720DEST_PATH_IMAGE007
H and W are the height and width of the input image, followed by a second stage, as shown in FIG. 3, of reducing the mark by a linear merge layer that reduces the mark per group as the network goes deep
Figure 74011DEST_PATH_IMAGE008
The features of adjacent blocks are connected, with a linear layer acting in the dimension of
Figure 744027DEST_PATH_IMAGE009
The number of marks is reduced by 4 times, and the output dimension is changed to
Figure 733980DEST_PATH_IMAGE003
Then, feature transformation is carried out through two Swin transform blocks, and the resolution of the obtained image is
Figure 718116DEST_PATH_IMAGE002
Realizing the extraction of the layered characteristics;
(4) position coding:
the feature graph output by the backbone network is input into the coding layer, the embodiment has 4 coding layers, each coding layer is as shown in fig. 4, firstly, the feature graph passes through
Figure 500127DEST_PATH_IMAGE001
Convolution or a linear layer, flattened into
Figure 809886DEST_PATH_IMAGE002
An
Figure 880610DEST_PATH_IMAGE003
Dimensional vectors which are subjected to 4 attention layers and a feedforward neural network to obtain characteristic vectors;
(5) generating a keypoint heat map:
the feature vector is output by the coding layer, and is reshaped back
Figure 934017DEST_PATH_IMAGE004
Then channel dimensions are determined from
Figure 773797DEST_PATH_IMAGE003
Decreasing to K (K is the number of keypoints per vehicle, value 78), generating a predicted K keypoint heat map;
(6) and outputting a result: and (5) applying a non-maximum value to suppress the key point coordinates in the key point heat map generated in the step (5), and marking the positions of the key points in the original image to realize the estimation of the vehicle attitude of the whole scene.
Structures, algorithms, and computational processes not described in detail herein are all common in the art.
It is noted that the disclosed embodiments are intended to aid in further understanding of the invention, but those skilled in the art will appreciate that: various substitutions and modifications are possible without departing from the spirit and scope of the invention and appended claims. Therefore, the invention should not be limited to the disclosure of the embodiment examples, but the scope of the invention is defined by the appended claims.

Claims (5)

1. A full scene vehicle attitude estimation method is characterized by comprising the following steps:
(1) and (3) data set construction:
selecting vehicle images in an open source data set, collecting images of various vehicles in a traffic monitoring and parking lot, constructing a vehicle data set, and dividing the vehicle data set into a training set, a verification set and a test set;
(2) image segmentation: the image in the vehicle data set is segmented into non-overlapping image slices by a slice segmentation module, each image slice is regarded as a mark and is characterized by serial RGB values of the input image;
(3) extracting hierarchical features of a backbone network: the image slice marks obtained in the step (2) firstly pass through a linear embedding layer of a first stage of a backbone network, the feature dimension is changed into a random dimension C, and then, the two embedding layers and a second stage are used for carrying out layered feature extraction to obtain a feature map;
(4) position coding: inputting the characteristic diagram obtained in the step (3) into a position coding layer for position coding, and enabling the characteristic diagram to pass through
Figure DEST_PATH_IMAGE001
Convolution or one linear layer being flattened
Figure DEST_PATH_IMAGE002
An
Figure DEST_PATH_IMAGE003
Dimensional vectors which pass through four attention layers and a feedforward neural network and then output characteristic vectors, wherein H and W are the height and width of the image respectively;
(5) generating a keypoint heat map: reshaping the characteristic vector obtained in the step (4) back
Figure DEST_PATH_IMAGE004
Then channel dimensions are determined from
Figure 830090DEST_PATH_IMAGE003
Lowering to K, and generating K key point heat maps, wherein K is the number of key points of each vehicle and has a value of 78;
(6) and outputting a result: and inhibiting the key point heat map to a key point coordinate through a non-maximum value, and marking the position of the key point in the original image to realize the attitude estimation of the full-scene vehicle.
2. The full-scene vehicle pose estimation method according to claim 1, wherein 78 key points are defined for each vehicle in the vehicle image in the step (1), and a boundary box and a category of the vehicle are labeled.
3. The full-scene vehicle pose estimation method of claim 2, wherein the size of each image slice in step (2) is
Figure DEST_PATH_IMAGE005
With a characteristic dimension of
Figure DEST_PATH_IMAGE006
4. The full-scene vehicle attitude estimation method according to claim 3, wherein the trunk network in the step (3) adopts a Swin transform trunk network, the first stage comprises a linear embedding layer and two Swin transform blocks, and the number of labels of the two Swin transform blocks is equal to
Figure DEST_PATH_IMAGE007
Where H and W are the height and width of the input image; the second stage comprises a linear merging layer and two Swin transform blocks, the image slices subjected to the feature extraction in the first stage are marked in a reduction mode through the linear merging layer, and the linear merging layer enables each group to be marked
Figure DEST_PATH_IMAGE008
Characteristics of adjacent blocksConnecting by applying a linear layer in the dimension of
Figure DEST_PATH_IMAGE009
The number of marks is reduced by 4 times, and the output dimension is changed to
Figure 370399DEST_PATH_IMAGE003
Then, feature transformation is carried out through two Swin transform blocks, and the resolution of the obtained image is
Figure 884557DEST_PATH_IMAGE002
And realizing layered feature extraction.
5. The full-scene vehicle attitude estimation method according to claim 4, wherein the position coding layer in step (4) adopts an encoder of a standard transform architecture, the position coding layer regards the feature map as a dynamic weight determined by specific image content, re-weights the information flow in forward propagation, obtains a key point dependency by calculating a score of a last attention layer, and predicts the occluded key point through the dependency of the key point, wherein a higher value of a certain position attention score in the image indicates a greater contribution degree to predicting the key point.
CN202210780438.0A 2022-07-05 2022-07-05 Full-scene vehicle attitude estimation method Active CN114842085B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210780438.0A CN114842085B (en) 2022-07-05 2022-07-05 Full-scene vehicle attitude estimation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210780438.0A CN114842085B (en) 2022-07-05 2022-07-05 Full-scene vehicle attitude estimation method

Publications (2)

Publication Number Publication Date
CN114842085A CN114842085A (en) 2022-08-02
CN114842085B true CN114842085B (en) 2022-09-16

Family

ID=82574897

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210780438.0A Active CN114842085B (en) 2022-07-05 2022-07-05 Full-scene vehicle attitude estimation method

Country Status (1)

Country Link
CN (1) CN114842085B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115272992B (en) * 2022-09-30 2023-01-03 松立控股集团股份有限公司 Vehicle attitude estimation method
CN116758341B (en) * 2023-05-31 2024-03-19 北京长木谷医疗科技股份有限公司 GPT-based hip joint lesion intelligent diagnosis method, device and equipment
CN117352120B (en) * 2023-06-05 2024-06-11 北京长木谷医疗科技股份有限公司 GPT-based intelligent self-generation method, device and equipment for knee joint lesion diagnosis
CN116740714B (en) * 2023-06-12 2024-02-09 北京长木谷医疗科技股份有限公司 Intelligent self-labeling method and device for hip joint diseases based on unsupervised learning
CN116894973B (en) * 2023-07-06 2024-05-03 北京长木谷医疗科技股份有限公司 Integrated learning-based intelligent self-labeling method and device for hip joint lesions

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113792669A (en) * 2021-09-16 2021-12-14 大连理工大学 Pedestrian re-identification baseline method based on hierarchical self-attention network
CN114663917A (en) * 2022-03-14 2022-06-24 清华大学 Multi-view-angle-based multi-person three-dimensional human body pose estimation method and device

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200020117A1 (en) * 2018-07-16 2020-01-16 Ford Global Technologies, Llc Pose estimation
CN109598339A (en) * 2018-12-07 2019-04-09 电子科技大学 A kind of vehicle attitude detection method based on grid convolutional network
CN113591936B (en) * 2021-07-09 2022-09-09 厦门市美亚柏科信息股份有限公司 Vehicle attitude estimation method, terminal device and storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113792669A (en) * 2021-09-16 2021-12-14 大连理工大学 Pedestrian re-identification baseline method based on hierarchical self-attention network
CN114663917A (en) * 2022-03-14 2022-06-24 清华大学 Multi-view-angle-based multi-person three-dimensional human body pose estimation method and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
DST3D: DLA-Swin Transformer for Single-Stage Monocular 3D Object Detection;Zhihong Wu等;《2022 IEEE Intelligent Vehicles Symposium (IV)》;20220609;全文 *
SWIN-POSE: SWIN TRANSFORMER BASED HUMAN POSE ESTIMATION;Zinan Xiong 等;《arXiv:2201.07384v1》;20220119;全文 *

Also Published As

Publication number Publication date
CN114842085A (en) 2022-08-02

Similar Documents

Publication Publication Date Title
CN114842085B (en) Full-scene vehicle attitude estimation method
CN111598030B (en) Method and system for detecting and segmenting vehicle in aerial image
CN111126359B (en) High-definition image small target detection method based on self-encoder and YOLO algorithm
CN112396607B (en) Deformable convolution fusion enhanced street view image semantic segmentation method
CN113486726A (en) Rail transit obstacle detection method based on improved convolutional neural network
CN110781850A (en) Semantic segmentation system and method for road recognition, and computer storage medium
CN111401436B (en) Streetscape image segmentation method fusing network and two-channel attention mechanism
CN112990065B (en) Vehicle classification detection method based on optimized YOLOv5 model
CN113870335A (en) Monocular depth estimation method based on multi-scale feature fusion
CN116797787B (en) Remote sensing image semantic segmentation method based on cross-modal fusion and graph neural network
CN111696038A (en) Image super-resolution method, device, equipment and computer-readable storage medium
CN117975418A (en) Traffic sign detection method based on improved RT-DETR
CN115588126A (en) GAM, CARAFE and SnIoU fused vehicle target detection method
CN115171074A (en) Vehicle target identification method based on multi-scale yolo algorithm
Yu et al. Intelligent corner synthesis via cycle-consistent generative adversarial networks for efficient validation of autonomous driving systems
CN113096133A (en) Method for constructing semantic segmentation network based on attention mechanism
CN118071643A (en) Optical image deblurring method obtained by hydropower station dam underwater robot
CN111626298A (en) Real-time image semantic segmentation device and segmentation method
CN112733934B (en) Multi-mode feature fusion road scene semantic segmentation method in complex environment
CN116228626A (en) Magnetic inductance element surface defect detection method based on improved YOLOv5
CN114187569A (en) Real-time target detection method integrating Pearson coefficient matrix and attention
CN114863132A (en) Method, system, equipment and storage medium for modeling and capturing image spatial domain information
CN118154655B (en) Unmanned monocular depth estimation system and method for mine auxiliary transport vehicle
CN116486203B (en) Single-target tracking method based on twin network and online template updating
CN117541922B (en) SF-YOLOv-based power station roofing engineering defect detection method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant