Xue et al., 2018 - Google Patents
A better way to attend: Attention with trees for video question answeringXue et al., 2018
View PDF- Document ID
- 10876338189840103258
- Author
- Xue H
- Chu W
- Zhao Z
- Cai D
- Publication year
- Publication venue
- IEEE Transactions on Image Processing
External Links
Snippet
We propose a new attention model for video question answering. The main idea of the attention models is to locate on the most informative parts of the visual data. The attention mechanisms are quite popular these days. However, most existing visual attention …
- 230000003935 attention 0 title abstract description 155
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2705—Parsing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/21—Text processing
- G06F17/22—Manipulating or registering by use of codes, e.g. in sequence of text characters
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2765—Recognition
- G06F17/277—Lexical analysis, e.g. tokenisation, collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/21—Text processing
- G06F17/24—Editing, e.g. insert/delete
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2809—Data driven translation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30781—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F17/30784—Information retrieval; Database structures therefor; File system structures therefor of video data using features automatically derived from the video content, e.g. descriptors, fingerprints, signatures, genre
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/50—Computer-aided design
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G06N99/005—Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computer systems utilising knowledge based models
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Xue et al. | A better way to attend: Attention with trees for video question answering | |
Li et al. | Oscar: Object-semantics aligned pre-training for vision-language tasks | |
Deng et al. | Syntax-guided hierarchical attention network for video captioning | |
Gao et al. | Hierarchical representation network with auxiliary tasks for video captioning and video question answering | |
Yang et al. | Reformer: The relational transformer for image captioning | |
CN111738004A (en) | Training method of named entity recognition model and named entity recognition method | |
Yu et al. | Bridging text and knowledge with multi-prototype embedding for few-shot relational triple extraction | |
Braud et al. | Multi-view and multi-task training of RST discourse parsers | |
CN111666758B (en) | Chinese word segmentation method, training device and computer readable storage medium | |
Chen et al. | Generating video descriptions with latent topic guidance | |
Liu et al. | Uamner: uncertainty-aware multimodal named entity recognition in social media posts | |
CN113449801B (en) | Image character behavior description generation method based on multi-level image context coding and decoding | |
CN116958997B (en) | Graphic summary method and system based on heterogeneous graphic neural network | |
Niu et al. | A multi-layer memory sharing network for video captioning | |
Heo et al. | Multimodal neural machine translation with weakly labeled images | |
Mishra et al. | Dynamic convolution-based encoder-decoder framework for image captioning in Hindi | |
Xue et al. | Lcsnet: End-to-end lipreading with channel-aware feature selection | |
CN114490954B (en) | Document level generation type event extraction method based on task adjustment | |
Baruah et al. | Character coreference resolution in movie screenplays | |
Li et al. | Exploring Visual Relationships via Transformer-based Graphs for Enhanced Image Captioning | |
CN113901813A (en) | Event extraction method based on topic features and implicit sentence structure | |
Jia et al. | Improved discourse parsing with two-step neural transition-based model | |
CN111813927A (en) | Sentence similarity calculation method based on topic model and LSTM | |
Dharaniya et al. | Automatic scene generation using sentiment analysis and bidirectional recurrent neural network with multi-head attention | |
Vaishnavi et al. | Video captioning–a survey |