Jin et al., 2024 - Google Patents
MtArtGPT: A multi-task art generation system with pre-trained transformerJin et al., 2024
- Document ID
- 16921547868524398835
- Author
- Jin C
- Zhu R
- Zhu Z
- Yang L
- Yang M
- Luo J
- Publication year
- Publication venue
- IEEE Transactions on Circuits and Systems for Video Technology
External Links
Snippet
Instruction tuning large language models are making rapid advances in the field of artificial intelligence where GPT-4 models have exhibited impressive multi-modal perception capabilities. Such models have been used as the core assistant for many tasks including art …
- 230000008901 benefit 0 abstract description 3
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30634—Querying
- G06F17/30657—Query processing
- G06F17/3066—Query translation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30731—Creation of semantic tools
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30861—Retrieval from the Internet, e.g. browsers
- G06F17/30873—Retrieval from the Internet, e.g. browsers by navigation, e.g. using categorized browsing, portals, synchronized browsing, visual networks of documents, virtual worlds or tours
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30017—Multimedia data retrieval; Retrieval of more than one type of audiovisual media
- G06F17/30023—Querying
- G06F17/30038—Querying based on information manually generated or based on information not derived from the media content, e.g. tags, keywords, comments, usage information, user ratings
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30781—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F17/30784—Information retrieval; Database structures therefor; File system structures therefor of video data using features automatically derived from the video content, e.g. descriptors, fingerprints, signatures, genre
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/21—Text processing
- G06F17/22—Manipulating or registering by use of codes, e.g. in sequence of text characters
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computer systems utilising knowledge based models
- G06N5/02—Knowledge representation
- G06N5/022—Knowledge engineering, knowledge acquisition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Gozalo-Brizuela et al. | ChatGPT is not all you need. A State of the Art Review of large Generative AI models | |
Gao et al. | Hierarchical representation network with auxiliary tasks for video captioning and video question answering | |
Yang et al. | Multi-sentence auxiliary adversarial networks for fine-grained text-to-image synthesis | |
Hao et al. | Integrating both visual and audio cues for enhanced video caption | |
He et al. | MF-BERT: Multimodal fusion in pre-trained BERT for sentiment analysis | |
Zhang et al. | Effective subword segmentation for text comprehension | |
Song et al. | Memorial gan with joint semantic optimization for unpaired image captioning | |
Lu et al. | Sentiment analysis: Comprehensive reviews, recent advances, and open challenges | |
Jin et al. | Weakening the dominant role of text: CMOSI dataset and multimodal semantic enhancement network | |
Wang et al. | A text-guided generation and refinement model for image captioning | |
Jin et al. | MtArtGPT: A Multi-task Art Generation System with Pre-Trained Transformer | |
Huang et al. | Recent advances in artificial intelligence for video production system | |
Ai et al. | DER-GCN: Dialog and Event Relation-Aware Graph Convolutional Neural Network for Multimodal Dialog Emotion Recognition | |
Mai et al. | A unimodal representation learning and recurrent decomposition fusion structure for utterance-level multimodal embedding learning | |
CN117216234A (en) | Artificial intelligence-based speaking operation rewriting method, device, equipment and storage medium | |
Zhao et al. | Leveraging pre-trained language model for summary generation on short text | |
Chen et al. | Robotic musicianship based on least squares and sequence generative adversarial networks | |
Fang et al. | Sense-aware bert and multi-task fine-tuning for multimodal sentiment analysis | |
Zhou et al. | Let’s all dance: Enhancing amateur dance motions | |
Kizhner et al. | The history and context of the digital humanities in Russia | |
Guo et al. | PrompTHis: Visualizing the Process and Influence of Prompt Editing during Text-to-Image Creation | |
Yi et al. | Diffusion models in text generation: a survey | |
Lu et al. | Multi-dimensional fusion: transformer and GANs-based multimodal audiovisual perception robot for musical performance art | |
Kleinberger et al. | Voice at NIME: a Taxonomy of New Interfaces for Vocal Musical Expression | |
Li et al. | AIGC in China: Current developments and future outlook |