This project is a prototype of a Fashion AI system that processes images, generates semantic relationships between them, simulates user interactions, and provides personalized recommendations. It uses a PostgreSQL database to store image metadata, navigation paths, user interactions, and generated recommendations.
- Generate placeholder fashion images.
- Extract metadata for these images using a Vision Transformer (ViT) model.
- Create semantic links (navigation paths) between images.
- Simulate user browsing behavior.
- Generate personalized recommendations based on user behavior and image semantics.
images/
: Contains placeholder images. Thegenerate_images.py
script (run by the system during initial setup) populates this directory.config.py
: Configuration file for Hugging Face model details (e.g., model name, image size) used byFashionTagger
.fashion_tagger.py
: Module containing theFashionTagger
class, which loads a pre-trained Vision Transformer (ViT) model from Hugging Face and extracts metadata (features, dominant colors) from images.generate_images.py
: (Helper script, not part of the main ETL flow but used for initial setup) Generates placeholder fashion images.etl.py
: Extracts metadata for images usingFashionTagger
and loads it into theimage_metadata
table in PostgreSQL.semantic_enrichment.py
: Analyzes image metadata to create potential navigation paths between images, storing them inimage_navigation_paths
.user_simulator.py
: Simulates user browsing sessions and interactions (clicks) with images, storing data inuser_interactions
.recommendation_engine.py
: Generates personalized recommendations for specific users based on their interaction history and the semantic navigation paths, storing results in therecommendations
table.schema.sql
: Contains the SQL DDL statements to create all necessary tables in the PostgreSQL database.db_setup_instructions.md
: Detailed instructions for setting up the PostgreSQL database using Docker.requirements.txt
: Lists the Python dependencies for this project.README.md
: This file, providing an overview and instructions.
- Python 3.8+
- Docker (e.g., Docker Desktop) for running PostgreSQL.
- Internet connection (for the first run of
etl.py
to download the Hugging Face model).
- Clone the repository (if applicable) or ensure all project files are in a local directory.
- Install Python packages:
Open your terminal or command prompt, navigate to the project's root directory, and run:
This will install all necessary packages, including
pip install -r requirements.txt
psycopg2-binary
(for PostgreSQL interaction),Pillow
(for image manipulation), and critically,transformers
andtorch
(for loading and using pre-trained models from Hugging Face).
-
Set up PostgreSQL using Docker: Follow the detailed instructions in
db_setup_instructions.md
to get a PostgreSQL instance running in Docker. This guide includes commands to start the container and set up the initial database. -
Set Environment Variables: Before running any of the Python scripts, you need to set the following environment variables in your terminal session. These variables allow the scripts to connect to the PostgreSQL database. Replace the example values with those you configured during the Docker setup (see
db_setup_instructions.md
).For Linux/macOS:
export DB_HOST="localhost" export DB_NAME="fashion_db" export DB_USER="myuser" # Or your chosen user from db_setup_instructions.md export DB_PASSWORD="mypassword" # Or your chosen password
For Windows (Command Prompt):
set DB_HOST="localhost" set DB_NAME="fashion_db" set DB_USER="myuser" set DB_PASSWORD="mypassword"
For Windows (PowerShell):
$env:DB_HOST="localhost" $env:DB_NAME="fashion_db" $env:DB_USER="myuser" $env:DB_PASSWORD="mypassword"
The scripts have default fallbacks for these variables (e.g., "postgres" for user/password, "fashion_db" for dbname, "localhost" for host), but it's best practice to set them explicitly, especially if your Docker setup uses different credentials.
Execute the scripts from the project's root directory in the following order. Each script builds upon the data generated by the previous ones. The generate_images.py
script is assumed to have been run by the system to create the initial images in the ./images
folder.
-
Run
python etl.py
to populateimage_metadata
: This script now uses theFashionTagger
module to extract metadata from images in the./images
directory using a pre-trained Hugging Face Vision Transformer model. The extracted metadata (including image features converted to tags, dominant colors) is loaded into theimage_metadata
table. This script also creates all tables defined inschema.sql
if they don't exist.python etl.py
Important Notes for
etl.py
:- First Run: The first time you run
etl.py
, it will download the pre-trained Vision Transformer model specified inconfig.py
(e.g.,google/vit-base-patch16-224-in21k
). This model can be several hundred megabytes, so an internet connection is required. Subsequent runs will use the cached model. - Processing Time: Image processing will now take longer due to the model inference step for each image.
- Metadata Quality: The default model (
google/vit-base-patch16-224-in21k
) is a general-purpose Vision Transformer trained on ImageNet. As such, thedescription
andstyle_tags
it generates will be based on general ImageNet categories. Specific fashion attributes likegarment_type
,accessories
, andgender
are currently set to placeholder values (e.g., "unknown",[]
, "unisex") byFashionTagger
as this base model is not fine-tuned for detailed fashion classification.
- First Run: The first time you run
-
Run
python semantic_enrichment.py
to populateimage_navigation_paths
: This script analyzes the metadata inimage_metadata
(now including ViT-based features) to establish semantic links or potential navigation paths between images. Results are stored inimage_navigation_paths
.python semantic_enrichment.py
-
Run
python user_simulator.py
to populateuser_interactions
: This script simulates user browsing behavior. It generates mock user sessions where users "view" and "click" on images, often following the paths defined inimage_navigation_paths
. These interactions are recorded in theuser_interactions
table.python user_simulator.py
-
Run
python recommendation_engine.py
to generate recommendations: This script uses the data fromimage_metadata
,image_navigation_paths
, anduser_interactions
to generate personalized recommendations for a predefined set of target users. The recommendations and the reasoning behind them are stored in therecommendations
table.python recommendation_engine.py
- Model Selection: The Vision Transformer model used by
FashionTagger
is defined inconfig.py
(HUGGING_FACE_MODEL_NAME
). You can experiment by changing this to other models available on Hugging Face. However, be aware that if the output structure of a different model varies significantly, adjustments infashion_tagger.py
might be necessary to correctly interpret the model's predictions. - Database Connection: Database connection parameters (host, name, user, password) are managed via environment variables, as detailed in the "Database Setup" section.
- Each script will print status messages, progress, and error information to the console.
- The primary output of the system is the data populated in the PostgreSQL database tables.
- The
recommendation_engine.py
script will print the generated recommendations for the target users to the console.
You can inspect the data in the database using psql
or any PostgreSQL client. Refer to db_setup_instructions.md
for psql
connection commands.
Example psql
queries:
-
Check table structure (e.g., for
image_metadata
):\d image_metadata;
-
Count rows in a table (e.g.,
user_interactions
):SELECT COUNT(*) FROM user_interactions;
-
View sample recommendations:
SELECT user_id, source_image_id, recommended_images, reasoning FROM recommendations LIMIT 5;
-
View navigation paths for a specific image:
SELECT * FROM image_navigation_paths WHERE source_image_id = 'img_001.jpg';
-
View interactions for a specific user:
SELECT image_id, clicked, timestamp FROM user_interactions WHERE user_id = 'user001' ORDER BY timestamp DESC;