Parvin et al., 2023 - Google Patents

Transformer-based local-global guidance for image captioning

Parvin et al., 2023

Document ID: 3255039449670770759
Author: Parvin H; Naghsh-Nilchi A; Mohammadi H
Publication year: 2023
Publication venue: Expert Systems with Applications

External Links

Cited by

Snippet

Image captioning is a difficult problem for machine learning algorithms to compress huge amounts of images into descriptive languages. The recurrent models are popularly used as the decoder to extract the caption with significant performance, while these models have …

Continue reading at www.sciencedirect.com (other versions)

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30634—Querying
- G06F17/30657—Query processing
- G06F17/30675—Query execution
- G06F17/30684—Query execution using natural language analysis
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2765—Recognition
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2705—Parsing
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30781—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F17/30784—Information retrieval; Database structures therefor; File system structures therefor of video data using features automatically derived from the video content, e.g. descriptors, fingerprints, signatures, genre
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2809—Data driven translation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6267—Classification techniques
- G06K9/6268—Classification techniques relating to the classification paradigm, e.g. parametric or non-parametric approaches
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/36—Image preprocessing, i.e. processing the image information without deciding about the identity of the image
- G06K9/46—Extraction of features or characteristics of the image
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G06N99/005—Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computer systems utilising knowledge based models
- G06N5/02—Knowledge representation
- G06N5/022—Knowledge engineering, knowledge acquisition
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06Q—DATA PROCESSING SYSTEMS OR METHODS, SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES, NOT OTHERWISE PROVIDED FOR

Similar Documents

Publication	Publication Date	Title
Wang et al.	2018	Application of convolutional neural network in natural language processing
Lopez et al.	2017	Deep Learning applied to NLP
Karpathy et al.	2015	Deep visual-semantic alignments for generating image descriptions
Li et al.	2019	Context-aware emotion cause analysis with multi-attention-based neural network
CN112131350B (en)	2024-04-30	Text label determining method, device, terminal and readable storage medium
Huang et al.	2018	Multimodal continuous emotion recognition with data augmentation using recurrent neural networks
Parvin et al.	2023	Transformer-based local-global guidance for image captioning
He et al.	2023	VGSG: Vision-Guided Semantic-Group Network for Text-Based Person Search
Liu et al.	2020	Attribute-guided attention for referring expression generation and comprehension
Mozafari et al.	2019	BAS: an answer selection method using BERT language model
Khan et al.	2022	A deep neural framework for image caption generation using gru-based attention mechanism
Abdar et al.	2024	A review of deep learning for video captioning
Guo et al.	2019	Implicit discourse relation recognition via a BiLSTM-CNN architecture with dynamic chunk-based max pooling
do Carmo Nogueira et al.	2020	Reference-based model using multimodal gated recurrent units for image captioning
Wang et al.	2021	Reasoning like humans: on dynamic attention prior in image captioning
Li et al.	2022	Graph convolutional network meta-learning with multi-granularity POS guidance for video captioning
do Carmo Nogueira et al.	2023	A reference-based model using deep learning for image captioning
Qiu et al.	2022	Semantics-consistent cross-domain summarization via optimal transport alignment
Jia et al.	2022	Semantic association enhancement transformer with relative position for image captioning
Wu et al.	2022	Learning cooperative neural modules for stylized image captioning
Qiu et al.	2023	SCCS: Semantics-Consistent Cross-domain Summarization via Optimal Transport Alignment
Bacharidis et al.	2020	Improving deep learning approaches for human activity recognition based on natural language processing of action labels
Lei et al.	2024	Multimodal Sentiment Analysis Based on Composite Hierarchical Fusion
Dey et al.	2024	How Machine Learning is Innovating Today's World: A Concise Technical Guide
Jeevitha et al.	2020	Natural language description for videos using NetVLAD and attentional LSTM