[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

Nian et al., 2017 - Google Patents

Learning explicit video attributes from mid-level representation for video captioning

Nian et al., 2017

Document ID
13746635633709813387
Author
Nian F
Li T
Wang Y
Wu X
Ni B
Xu C
Publication year
Publication venue
Computer Vision and Image Understanding

External Links

Snippet

Recent works on video captioning mainly learn the map from low-level visual features to language description directly without explicitly representing the high-level semantic video concepts (eg objects, actions in the annotated sentences). To bridge the semantic gap, in …
Continue reading at www.sciencedirect.com (other versions)

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/27Automatic analysis, e.g. parsing
    • G06F17/2705Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/21Text processing
    • G06F17/22Manipulating or registering by use of codes, e.g. in sequence of text characters
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30781Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F17/30784Information retrieval; Database structures therefor; File system structures therefor of video data using features automatically derived from the video content, e.g. descriptors, fingerprints, signatures, genre
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30861Retrieval from the Internet, e.g. browsers
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30017Multimedia data retrieval; Retrieval of more than one type of audiovisual media
    • G06F17/30023Querying
    • G06F17/30038Querying based on information manually generated or based on information not derived from the media content, e.g. tags, keywords, comments, usage information, user ratings
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/3061Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30286Information retrieval; Database structures therefor; File system structures therefor in structured data stores
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F19/00Digital computing or data processing equipment or methods, specially adapted for specific applications
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06QDATA PROCESSING SYSTEMS OR METHODS, SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce, e.g. shopping or e-commerce
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N99/00Subject matter not provided for in other groups of this subclass

Similar Documents

Publication Publication Date Title
Nian et al. Learning explicit video attributes from mid-level representation for video captioning
Li et al. Know more say less: Image captioning based on scene graphs
Li et al. Visual to text: Survey of image and video captioning
Gao et al. Hierarchical representation network with auxiliary tasks for video captioning and video question answering
Rohrbach et al. Grounding of textual phrases in images by reconstruction
US11409791B2 (en) Joint heterogeneous language-vision embeddings for video tagging and search
Tu et al. Enhancing the alignment between target words and corresponding frames for video captioning
Liu et al. Hierarchical & multimodal video captioning: Discovering and transferring multimodal knowledge for vision to language
WO2020199904A1 (en) Video description information generation method, video processing method, and corresponding devices
Guo et al. LD-MAN: Layout-driven multimodal attention network for online news sentiment recognition
Islam et al. Exploring video captioning techniques: A comprehensive survey on deep learning methods
Yan et al. Multimodal sentiment analysis using multi-tensor fusion network with cross-modal modeling
Liu et al. Aligning source visual and target language domains for unpaired video captioning
Zhang et al. Image captioning via semantic element embedding
CN113392265A (en) Multimedia processing method, device and equipment
CN116958997B (en) Graphic summary method and system based on heterogeneous graphic neural network
Bansal et al. Multilingual personalized hashtag recommendation for low resource Indic languages using graph-based deep neural network
Perez-Martin et al. A comprehensive review of the video-to-text problem
Predić et al. Automatic image caption generation based on some machine learning algorithms
Zhao et al. Research on video captioning based on multifeature fusion
Wu et al. Hashtag recommendation with attention-based neural image hashtagging network
Yang et al. Visual Skeleton and Reparative Attention for Part-of-Speech image captioning system
Li et al. Screencast tutorial video understanding
Chu et al. The forgettable-watcher model for video question answering
Wang et al. MIVCN: Multimodal interaction video captioning network based on semantic association graph