Nian et al., 2017 - Google Patents

Learning explicit video attributes from mid-level representation for video captioning

Nian et al., 2017

Document ID: 13746635633709813387
Author: Nian F; Li T; Wang Y; Wu X; Ni B; Xu C
Publication year: 2017
Publication venue: Computer Vision and Image Understanding

External Links

Cited by

Snippet

Recent works on video captioning mainly learn the map from low-level visual features to language description directly without explicitly representing the high-level semantic video concepts (eg objects, actions in the annotated sentences). To bridge the semantic gap, in …

Continue reading at www.sciencedirect.com (other versions)

230000004044 response 0 abstract description 22

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2705—Parsing
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/21—Text processing
- G06F17/22—Manipulating or registering by use of codes, e.g. in sequence of text characters
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30781—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F17/30784—Information retrieval; Database structures therefor; File system structures therefor of video data using features automatically derived from the video content, e.g. descriptors, fingerprints, signatures, genre
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30861—Retrieval from the Internet, e.g. browsers
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30017—Multimedia data retrieval; Retrieval of more than one type of audiovisual media
- G06F17/30023—Querying
- G06F17/30038—Querying based on information manually generated or based on information not derived from the media content, e.g. tags, keywords, comments, usage information, user ratings
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F19/00—Digital computing or data processing equipment or methods, specially adapted for specific applications
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06Q—DATA PROCESSING SYSTEMS OR METHODS, SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce, e.g. shopping or e-commerce
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass

Similar Documents

Publication	Publication Date	Title
Nian et al.	2017	Learning explicit video attributes from mid-level representation for video captioning
Li et al.	2019	Know more say less: Image captioning based on scene graphs
Li et al.	2019	Visual to text: Survey of image and video captioning
Gao et al.	2021	Hierarchical representation network with auxiliary tasks for video captioning and video question answering
Rohrbach et al.	2016	Grounding of textual phrases in images by reconstruction
US11409791B2 (en)	2022-08-09	Joint heterogeneous language-vision embeddings for video tagging and search
Tu et al.	2021	Enhancing the alignment between target words and corresponding frames for video captioning
Liu et al.	2017	Hierarchical & multimodal video captioning: Discovering and transferring multimodal knowledge for vision to language
WO2020199904A1 (en)	2020-10-08	Video description information generation method, video processing method, and corresponding devices
Guo et al.	2020	LD-MAN: Layout-driven multimodal attention network for online news sentiment recognition
Islam et al.	2021	Exploring video captioning techniques: A comprehensive survey on deep learning methods
Yan et al.	2022	Multimodal sentiment analysis using multi-tensor fusion network with cross-modal modeling
Liu et al.	2021	Aligning source visual and target language domains for unpaired video captioning
Zhang et al.	2020	Image captioning via semantic element embedding
CN113392265A (en)	2021-09-14	Multimedia processing method, device and equipment
CN116958997B (en)	2024-01-23	Graphic summary method and system based on heterogeneous graphic neural network
Bansal et al.	2024	Multilingual personalized hashtag recommendation for low resource Indic languages using graph-based deep neural network
Perez-Martin et al.	2022	A comprehensive review of the video-to-text problem
Predić et al.	2022	Automatic image caption generation based on some machine learning algorithms
Zhao et al.	2022	Research on video captioning based on multifeature fusion
Wu et al.	2018	Hashtag recommendation with attention-based neural image hashtagging network
Yang et al.	2019	Visual Skeleton and Reparative Attention for Part-of-Speech image captioning system
Li et al.	2020	Screencast tutorial video understanding
Chu et al.	2018	The forgettable-watcher model for video question answering
Wang et al.	2022	MIVCN: Multimodal interaction video captioning network based on semantic association graph