[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

Parvin et al., 2023 - Google Patents

Transformer-based local-global guidance for image captioning

Parvin et al., 2023

Document ID
3255039449670770759
Author
Parvin H
Naghsh-Nilchi A
Mohammadi H
Publication year
Publication venue
Expert Systems with Applications

External Links

Snippet

Image captioning is a difficult problem for machine learning algorithms to compress huge amounts of images into descriptive languages. The recurrent models are popularly used as the decoder to extract the caption with significant performance, while these models have …
Continue reading at www.sciencedirect.com (other versions)

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/3061Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F17/30634Querying
    • G06F17/30657Query processing
    • G06F17/30675Query execution
    • G06F17/30684Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/27Automatic analysis, e.g. parsing
    • G06F17/2765Recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/27Automatic analysis, e.g. parsing
    • G06F17/2705Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/62Methods or arrangements for recognition using electronic means
    • G06K9/6217Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30781Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F17/30784Information retrieval; Database structures therefor; File system structures therefor of video data using features automatically derived from the video content, e.g. descriptors, fingerprints, signatures, genre
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/28Processing or translating of natural language
    • G06F17/2809Data driven translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/62Methods or arrangements for recognition using electronic means
    • G06K9/6267Classification techniques
    • G06K9/6268Classification techniques relating to the classification paradigm, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/36Image preprocessing, i.e. processing the image information without deciding about the identity of the image
    • G06K9/46Extraction of features or characteristics of the image
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N99/00Subject matter not provided for in other groups of this subclass
    • G06N99/005Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computer systems utilising knowledge based models
    • G06N5/02Knowledge representation
    • G06N5/022Knowledge engineering, knowledge acquisition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06QDATA PROCESSING SYSTEMS OR METHODS, SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES, NOT OTHERWISE PROVIDED FOR

Similar Documents

Publication Publication Date Title
Wang et al. Application of convolutional neural network in natural language processing
Lopez et al. Deep Learning applied to NLP
Karpathy et al. Deep visual-semantic alignments for generating image descriptions
Li et al. Context-aware emotion cause analysis with multi-attention-based neural network
CN112131350B (en) Text label determining method, device, terminal and readable storage medium
Huang et al. Multimodal continuous emotion recognition with data augmentation using recurrent neural networks
Parvin et al. Transformer-based local-global guidance for image captioning
He et al. VGSG: Vision-Guided Semantic-Group Network for Text-Based Person Search
Liu et al. Attribute-guided attention for referring expression generation and comprehension
Mozafari et al. BAS: an answer selection method using BERT language model
Khan et al. A deep neural framework for image caption generation using gru-based attention mechanism
Abdar et al. A review of deep learning for video captioning
Guo et al. Implicit discourse relation recognition via a BiLSTM-CNN architecture with dynamic chunk-based max pooling
do Carmo Nogueira et al. Reference-based model using multimodal gated recurrent units for image captioning
Wang et al. Reasoning like humans: on dynamic attention prior in image captioning
Li et al. Graph convolutional network meta-learning with multi-granularity POS guidance for video captioning
do Carmo Nogueira et al. A reference-based model using deep learning for image captioning
Qiu et al. Semantics-consistent cross-domain summarization via optimal transport alignment
Jia et al. Semantic association enhancement transformer with relative position for image captioning
Wu et al. Learning cooperative neural modules for stylized image captioning
Qiu et al. SCCS: Semantics-Consistent Cross-domain Summarization via Optimal Transport Alignment
Bacharidis et al. Improving deep learning approaches for human activity recognition based on natural language processing of action labels
Lei et al. Multimodal Sentiment Analysis Based on Composite Hierarchical Fusion
Dey et al. How Machine Learning is Innovating Today's World: A Concise Technical Guide
Jeevitha et al. Natural language description for videos using NetVLAD and attentional LSTM