GitHub - cmmyers/that_fashion_app: Trend analysis of user-generated social media content

This began as a capstone project for the Galvanize Data Science Immersive.

Guiding question: Can user-generated social media content be used to identify fashion trends?

Methodology: Using data from 8 years of user-created posts on the popular fashion site chictopia, specifically, users' textual descriptions of their outfits, I used NLP tools to identify references to garments, and created bigrams to extract significant adjective-noun (description-garment, eg 'floral jumpsuit') pairs. I then analyzed changes in term frequency year over year to identify specific garments' trend cycles.

Process:

I collected data from approximately 1 million photo posts spanning March, 2008 to November, 2016. These posts are composed of one or more photos, a title, date, and a description that might range from a few words to several paragraphs. The data was processed using BeautifulSoup and stored using MongoDB and JSON.
After adding year, quarter and month features to my dataset, and tokenizing the photo descriptions, I created a Word2Vec model for the data for years 2009-2012. I used this model to identify approximately 60 'garment words' based on their similarity to six 'basic' garment words: 'dress', 'skirt', 'pants', 'shoes' and 'bag.'
Using my ~60 identified garment words, I created bigrams that paired every occurrence of a garment word with the word that immediately preceded it. I then created new W2V models for each year, 2009 through 2012. I treated this as my initial training set.
I performed EDA using my 'bigramified' text. I collected lists of bigrams that the W2V models considered 'most similar' to the basic garments and examined the respective differences in these lists from year to year. I created similar lists using simple term frequency of bigrams. I then plotted the members of these disjunctive lists to determine whether members of the list could accurately be considered 'trends'. Rather than use a hard definition of 'trend', I visually examined the plots for significant peaks.

Moving forward, I intend to use natural language modeling alongside time-series clustering to attempt to predict emerging trends. With years 2014 through 2016 currently held out, I will have several opportunities to validated and refine my model.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
src		src
.gitignore		.gitignore
README.md		README.md
minidemo.ipynb		minidemo.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

About

Uh oh!

Releases

Packages

Languages

cmmyers/that_fashion_app

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages