Liu et al., 2023 - Google Patents
Inter-modal masked autoencoder for self-supervised learning on point cloudsLiu et al., 2023
- Document ID
- 16265910150963681401
- Author
- Liu J
- Wu Y
- Gong M
- Liu Z
- Miao Q
- Ma W
- Publication year
- Publication venue
- IEEE Transactions on Multimedia
External Links
Snippet
Masked autoencoder (MAE) is a recently widely used self-supervised learning method that has achieved great success in NLP and computer vision. However, the potential advantages of masked pre-training for point cloud understanding have not been fully explored. There is …
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G06K9/6232—Extracting features by transforming the feature space, e.g. multidimensional scaling; Mappings, e.g. subspace methods
- G06K9/6247—Extracting features by transforming the feature space, e.g. multidimensional scaling; Mappings, e.g. subspace methods based on an approximation criterion, e.g. principal component analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6267—Classification techniques
- G06K9/6268—Classification techniques relating to the classification paradigm, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6201—Matching; Proximity measures
- G06K9/6202—Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/36—Image preprocessing, i.e. processing the image information without deciding about the identity of the image
- G06K9/46—Extraction of features or characteristics of the image
- G06K9/4604—Detecting partial patterns, e.g. edges or contours, or configurations, e.g. loops, corners, strokes, intersections
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00221—Acquiring or recognising human faces, facial parts, facial sketches, facial expressions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F19/00—Digital computing or data processing equipment or methods, specially adapted for specific applications
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Han et al. | A survey on vision transformer | |
Aggarwal et al. | Generative adversarial network: An overview of theory and applications | |
Wang et al. | Learning visual relationship and context-aware attention for image captioning | |
Liu et al. | Temporal decoupling graph convolutional network for skeleton-based gesture recognition | |
Tang et al. | CTFN: Hierarchical learning for multimodal sentiment analysis using coupled-translation fusion network | |
Hu et al. | Signbert+: Hand-model-aware self-supervised pre-training for sign language understanding | |
Zou et al. | 6d-vit: Category-level 6d object pose estimation via transformer-based instance representation learning | |
Zuo et al. | Exploiting modality-invariant feature for robust multimodal emotion recognition with missing modalities | |
Liu et al. | Inter-modal masked autoencoder for self-supervised learning on point clouds | |
Zhang et al. | Uncovering prototypical knowledge for weakly open-vocabulary semantic segmentation | |
Gao et al. | PE-Transformer: Path enhanced transformer for improving underwater object detection | |
CN112651940A (en) | Collaborative visual saliency detection method based on dual-encoder generation type countermeasure network | |
Fang et al. | GroupTransNet: Group transformer network for RGB-D salient object detection | |
Yao et al. | Transformers and CNNs fusion network for salient object detection | |
Gao et al. | 3D interacting hand pose and shape estimation from a single RGB image | |
Xiao et al. | A survey of label-efficient deep learning for 3D point clouds | |
Zheng et al. | Sar: Spatial-aware regression for 3d hand pose and mesh reconstruction from a monocular rgb image | |
Wang et al. | Dual-perspective fusion network for aspect-based multimodal sentiment analysis | |
Li et al. | Sequential interactive biased network for context-aware emotion recognition | |
Li et al. | Exploiting global and instance-level perceived feature relationship matrices for 3D face reconstruction and dense alignment | |
Fan et al. | Multi-level contrastive learning: Hierarchical alleviation of heterogeneity in multimodal sentiment analysis | |
Liu et al. | Deep Fuzzy Multi-Teacher Distillation Network for Medical Visual Question Answering | |
Chen et al. | Learning point-language hierarchical alignment for 3D visual grounding | |
Peng et al. | Pattern Recognition and Computer Vision: Third Chinese Conference, PRCV 2020, Nanjing, China, October 16–18, 2020, Proceedings, Part III | |
Yang et al. | Language-aware vision transformer for referring segmentation |