Xue et al., 2018 - Google Patents

A better way to attend: Attention with trees for video question answering

Xue et al., 2018

Document ID: 10876338189840103258
Author: Xue H; Chu W; Zhao Z; Cai D
Publication year: 2018
Publication venue: IEEE Transactions on Image Processing

External Links

Cited by

Snippet

We propose a new attention model for video question answering. The main idea of the attention models is to locate on the most informative parts of the visual data. The attention mechanisms are quite popular these days. However, most existing visual attention …

Continue reading at arxiv.org (PDF) (other versions)

230000003935 attention 0 title abstract description 155

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2705—Parsing
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/21—Text processing
- G06F17/22—Manipulating or registering by use of codes, e.g. in sequence of text characters
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2765—Recognition
- G06F17/277—Lexical analysis, e.g. tokenisation, collocates
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/21—Text processing
- G06F17/24—Editing, e.g. insert/delete
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2809—Data driven translation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30781—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F17/30784—Information retrieval; Database structures therefor; File system structures therefor of video data using features automatically derived from the video content, e.g. descriptors, fingerprints, signatures, genre
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/50—Computer-aided design
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G06N99/005—Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computer systems utilising knowledge based models

Similar Documents

Publication	Publication Date	Title
Xue et al.	2018	A better way to attend: Attention with trees for video question answering
Li et al.	2020	Oscar: Object-semantics aligned pre-training for vision-language tasks
Deng et al.	2021	Syntax-guided hierarchical attention network for video captioning
Gao et al.	2021	Hierarchical representation network with auxiliary tasks for video captioning and video question answering
Yang et al.	2022	Reformer: The relational transformer for image captioning
CN111738004A (en)	2020-10-02	Training method of named entity recognition model and named entity recognition method
Yu et al.	2020	Bridging text and knowledge with multi-prototype embedding for few-shot relational triple extraction
Braud et al.	2016	Multi-view and multi-task training of RST discourse parsers
CN111666758B (en)	2022-03-22	Chinese word segmentation method, training device and computer readable storage medium
Chen et al.	2019	Generating video descriptions with latent topic guidance
Liu et al.	2022	Uamner: uncertainty-aware multimodal named entity recognition in social media posts
CN113449801B (en)	2023-05-02	Image character behavior description generation method based on multi-level image context coding and decoding
CN116958997B (en)	2024-01-23	Graphic summary method and system based on heterogeneous graphic neural network
Niu et al.	2023	A multi-layer memory sharing network for video captioning
Heo et al.	2019	Multimodal neural machine translation with weakly labeled images
Mishra et al.	2023	Dynamic convolution-based encoder-decoder framework for image captioning in Hindi
Xue et al.	2023	Lcsnet: End-to-end lipreading with channel-aware feature selection
CN114490954B (en)	2022-07-15	Document level generation type event extraction method based on task adjustment
Baruah et al.	2023	Character coreference resolution in movie screenplays
Li et al.	2024	Exploring Visual Relationships via Transformer-based Graphs for Enhanced Image Captioning
CN113901813A (en)	2022-01-07	Event extraction method based on topic features and implicit sentence structure
Jia et al.	2018	Improved discourse parsing with two-step neural transition-based model
CN111813927A (en)	2020-10-23	Sentence similarity calculation method based on topic model and LSTM
Dharaniya et al.	2022	Automatic scene generation using sentiment analysis and bidirectional recurrent neural network with multi-head attention
Vaishnavi et al.	2024	Video captioning–a survey