Generative AI models can produce high-quality images from text prompts. These images often look like those created by human artists or photographers. While this technology is impressive, it also raises security concerns. AI-generated images can be used for fraud, misinformation, and unauthorized art creation. In this project, we systematically explore and detect AI-generated images in adversarial scenarios. We have created the ARIA dataset, which contains over 140,000 images in categories such as artworks, social media images, news photos, disaster scenes, and anime pictures. You can access our dataset here: ARIA Dataset For more details, see our paper: The Adversarial AI-Art: Understanding, Generation, Detection, and Benchmarking
- Total Images: 144,175
- Categories:
- Social Media Fraud:
- Ins: 26,265 Instagram-style images
- Fake News and Misinformation:
- News: 26,559 news images
- Disaster: 25,288 disaster scene images
- Unauthorized Art Style Imitation:
- Art: 43,075 artworks
- Pixiv: 22,988 anime pictures
- Social Media Fraud:
- Sources:
- Art: Best Artworks of All Time Kaggle Dataset
- Disaster: Disaster Dataset Kaggle Dataset
- News: N24News GitHub Repository
- Pixiv: Pixiv Top Daily Illustration 2018 Kaggle Dataset
- Instagram: Instagram Images Kaggle Dataset
- Collect image-to-text prompts
- Collect text-to-image data
- Collect image-to-image data
- User study
- Benchmarking detection
- ResNet-50 Classifier
-
Data Collection
- For each script, there are three phases: "test", "text", and "image". "test" generates one random image based on both text and image as a test. "text" is for exhaustive text-to-image generation. "image" is likewise for exhaustive image-to-image generation.
- Example to run the "test":
python collect_dreamstudio.py test
- New prompts can be collected with "scrape_prompt", "refine_prompt", and "rewrite_prompt". "refine" is for data cleaning. "rewrite" is feeding them to GPT-4.
- Collected image data will be saved in
./scrape_text2image
and./scrape_image2image
.
-
ResNet-50 Classifier To run the code, use the following command:
python your_script.py --mode [train|test|both] --num_epochs 300 --batch_size 64 --learning_rate 0.005 --early_stopping_patience 5
Here is the API information for each generator we have used. For detailed parameter information, please refer to our paper Section 3.3 AI-Art Generation:
- DreamStudio / stability.ai: gRPC API parameters
- OpenAI: API reference, Docs
- Midjourney: Parameters
- StarryAI: Guide
- To ensure the collection goes smoothly, the sleep time in the scripts may need to be changed according to your machine and network environment.
- Make sure the file directory in your code is correct.