Nian et al., 2017 - Google Patents
Learning explicit video attributes from mid-level representation for video captioningNian et al., 2017
- Document ID
- 13746635633709813387
- Author
- Nian F
- Li T
- Wang Y
- Wu X
- Ni B
- Xu C
- Publication year
- Publication venue
- Computer Vision and Image Understanding
External Links
Snippet
Recent works on video captioning mainly learn the map from low-level visual features to language description directly without explicitly representing the high-level semantic video concepts (eg objects, actions in the annotated sentences). To bridge the semantic gap, in …
- 230000004044 response 0 abstract description 22
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2705—Parsing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/21—Text processing
- G06F17/22—Manipulating or registering by use of codes, e.g. in sequence of text characters
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30781—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F17/30784—Information retrieval; Database structures therefor; File system structures therefor of video data using features automatically derived from the video content, e.g. descriptors, fingerprints, signatures, genre
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30861—Retrieval from the Internet, e.g. browsers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30017—Multimedia data retrieval; Retrieval of more than one type of audiovisual media
- G06F17/30023—Querying
- G06F17/30038—Querying based on information manually generated or based on information not derived from the media content, e.g. tags, keywords, comments, usage information, user ratings
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F19/00—Digital computing or data processing equipment or methods, specially adapted for specific applications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06Q—DATA PROCESSING SYSTEMS OR METHODS, SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce, e.g. shopping or e-commerce
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Nian et al. | Learning explicit video attributes from mid-level representation for video captioning | |
Li et al. | Know more say less: Image captioning based on scene graphs | |
Li et al. | Visual to text: Survey of image and video captioning | |
Gao et al. | Hierarchical representation network with auxiliary tasks for video captioning and video question answering | |
Rohrbach et al. | Grounding of textual phrases in images by reconstruction | |
US11409791B2 (en) | Joint heterogeneous language-vision embeddings for video tagging and search | |
Tu et al. | Enhancing the alignment between target words and corresponding frames for video captioning | |
Liu et al. | Hierarchical & multimodal video captioning: Discovering and transferring multimodal knowledge for vision to language | |
WO2020199904A1 (en) | Video description information generation method, video processing method, and corresponding devices | |
Guo et al. | LD-MAN: Layout-driven multimodal attention network for online news sentiment recognition | |
Islam et al. | Exploring video captioning techniques: A comprehensive survey on deep learning methods | |
Yan et al. | Multimodal sentiment analysis using multi-tensor fusion network with cross-modal modeling | |
Liu et al. | Aligning source visual and target language domains for unpaired video captioning | |
Zhang et al. | Image captioning via semantic element embedding | |
CN113392265A (en) | Multimedia processing method, device and equipment | |
CN116958997B (en) | Graphic summary method and system based on heterogeneous graphic neural network | |
Bansal et al. | Multilingual personalized hashtag recommendation for low resource Indic languages using graph-based deep neural network | |
Perez-Martin et al. | A comprehensive review of the video-to-text problem | |
Predić et al. | Automatic image caption generation based on some machine learning algorithms | |
Zhao et al. | Research on video captioning based on multifeature fusion | |
Wu et al. | Hashtag recommendation with attention-based neural image hashtagging network | |
Yang et al. | Visual Skeleton and Reparative Attention for Part-of-Speech image captioning system | |
Li et al. | Screencast tutorial video understanding | |
Chu et al. | The forgettable-watcher model for video question answering | |
Wang et al. | MIVCN: Multimodal interaction video captioning network based on semantic association graph |