[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

Xue et al., 2018 - Google Patents

A better way to attend: Attention with trees for video question answering

Xue et al., 2018

View PDF
Document ID
10876338189840103258
Author
Xue H
Chu W
Zhao Z
Cai D
Publication year
Publication venue
IEEE Transactions on Image Processing

External Links

Snippet

We propose a new attention model for video question answering. The main idea of the attention models is to locate on the most informative parts of the visual data. The attention mechanisms are quite popular these days. However, most existing visual attention …
Continue reading at arxiv.org (PDF) (other versions)

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/27Automatic analysis, e.g. parsing
    • G06F17/2705Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/21Text processing
    • G06F17/22Manipulating or registering by use of codes, e.g. in sequence of text characters
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/27Automatic analysis, e.g. parsing
    • G06F17/2765Recognition
    • G06F17/277Lexical analysis, e.g. tokenisation, collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/21Text processing
    • G06F17/24Editing, e.g. insert/delete
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/28Processing or translating of natural language
    • G06F17/2809Data driven translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30781Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F17/30784Information retrieval; Database structures therefor; File system structures therefor of video data using features automatically derived from the video content, e.g. descriptors, fingerprints, signatures, genre
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/3061Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/62Methods or arrangements for recognition using electronic means
    • G06K9/6217Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/50Computer-aided design
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N99/00Subject matter not provided for in other groups of this subclass
    • G06N99/005Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computer systems utilising knowledge based models

Similar Documents

Publication Publication Date Title
Xue et al. A better way to attend: Attention with trees for video question answering
Li et al. Oscar: Object-semantics aligned pre-training for vision-language tasks
Deng et al. Syntax-guided hierarchical attention network for video captioning
Gao et al. Hierarchical representation network with auxiliary tasks for video captioning and video question answering
Yang et al. Reformer: The relational transformer for image captioning
CN111738004A (en) Training method of named entity recognition model and named entity recognition method
Yu et al. Bridging text and knowledge with multi-prototype embedding for few-shot relational triple extraction
Braud et al. Multi-view and multi-task training of RST discourse parsers
CN111666758B (en) Chinese word segmentation method, training device and computer readable storage medium
Chen et al. Generating video descriptions with latent topic guidance
Liu et al. Uamner: uncertainty-aware multimodal named entity recognition in social media posts
CN113449801B (en) Image character behavior description generation method based on multi-level image context coding and decoding
CN116958997B (en) Graphic summary method and system based on heterogeneous graphic neural network
Niu et al. A multi-layer memory sharing network for video captioning
Heo et al. Multimodal neural machine translation with weakly labeled images
Mishra et al. Dynamic convolution-based encoder-decoder framework for image captioning in Hindi
Xue et al. Lcsnet: End-to-end lipreading with channel-aware feature selection
CN114490954B (en) Document level generation type event extraction method based on task adjustment
Baruah et al. Character coreference resolution in movie screenplays
Li et al. Exploring Visual Relationships via Transformer-based Graphs for Enhanced Image Captioning
CN113901813A (en) Event extraction method based on topic features and implicit sentence structure
Jia et al. Improved discourse parsing with two-step neural transition-based model
CN111813927A (en) Sentence similarity calculation method based on topic model and LSTM
Dharaniya et al. Automatic scene generation using sentiment analysis and bidirectional recurrent neural network with multi-head attention
Vaishnavi et al. Video captioning–a survey