Parvin et al., 2023 - Google Patents
Transformer-based local-global guidance for image captioningParvin et al., 2023
- Document ID
- 3255039449670770759
- Author
- Parvin H
- Naghsh-Nilchi A
- Mohammadi H
- Publication year
- Publication venue
- Expert Systems with Applications
External Links
Snippet
Image captioning is a difficult problem for machine learning algorithms to compress huge amounts of images into descriptive languages. The recurrent models are popularly used as the decoder to extract the caption with significant performance, while these models have …
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30634—Querying
- G06F17/30657—Query processing
- G06F17/30675—Query execution
- G06F17/30684—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2765—Recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2705—Parsing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30781—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F17/30784—Information retrieval; Database structures therefor; File system structures therefor of video data using features automatically derived from the video content, e.g. descriptors, fingerprints, signatures, genre
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2809—Data driven translation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6267—Classification techniques
- G06K9/6268—Classification techniques relating to the classification paradigm, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/36—Image preprocessing, i.e. processing the image information without deciding about the identity of the image
- G06K9/46—Extraction of features or characteristics of the image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G06N99/005—Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computer systems utilising knowledge based models
- G06N5/02—Knowledge representation
- G06N5/022—Knowledge engineering, knowledge acquisition
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06Q—DATA PROCESSING SYSTEMS OR METHODS, SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES, NOT OTHERWISE PROVIDED FOR
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Wang et al. | Application of convolutional neural network in natural language processing | |
Lopez et al. | Deep Learning applied to NLP | |
Karpathy et al. | Deep visual-semantic alignments for generating image descriptions | |
Li et al. | Context-aware emotion cause analysis with multi-attention-based neural network | |
CN112131350B (en) | Text label determining method, device, terminal and readable storage medium | |
Huang et al. | Multimodal continuous emotion recognition with data augmentation using recurrent neural networks | |
Parvin et al. | Transformer-based local-global guidance for image captioning | |
He et al. | VGSG: Vision-Guided Semantic-Group Network for Text-Based Person Search | |
Liu et al. | Attribute-guided attention for referring expression generation and comprehension | |
Mozafari et al. | BAS: an answer selection method using BERT language model | |
Khan et al. | A deep neural framework for image caption generation using gru-based attention mechanism | |
Abdar et al. | A review of deep learning for video captioning | |
Guo et al. | Implicit discourse relation recognition via a BiLSTM-CNN architecture with dynamic chunk-based max pooling | |
do Carmo Nogueira et al. | Reference-based model using multimodal gated recurrent units for image captioning | |
Wang et al. | Reasoning like humans: on dynamic attention prior in image captioning | |
Li et al. | Graph convolutional network meta-learning with multi-granularity POS guidance for video captioning | |
do Carmo Nogueira et al. | A reference-based model using deep learning for image captioning | |
Qiu et al. | Semantics-consistent cross-domain summarization via optimal transport alignment | |
Jia et al. | Semantic association enhancement transformer with relative position for image captioning | |
Wu et al. | Learning cooperative neural modules for stylized image captioning | |
Qiu et al. | SCCS: Semantics-Consistent Cross-domain Summarization via Optimal Transport Alignment | |
Bacharidis et al. | Improving deep learning approaches for human activity recognition based on natural language processing of action labels | |
Lei et al. | Multimodal Sentiment Analysis Based on Composite Hierarchical Fusion | |
Dey et al. | How Machine Learning is Innovating Today's World: A Concise Technical Guide | |
Jeevitha et al. | Natural language description for videos using NetVLAD and attentional LSTM |