nu-20winter-cs337-g28

setup

Run in python

$ import nltk
$ nltk.download('punkt')
$ nltk.download('averaged_perceptron_tagger')

Also download the given json data files

Sample

python sample_subset.py gg2013.json

to generate a small data set for faster development runs

To run

python main.py ggsample.json

which will load data, pre-preprocess, and output some basic guesses.

Additional work

Implement code to guess host, awards, nominees, presenters, and winners, using the framework.

Note that most of the tweets are fairly useless, so the approach is to use regex to look for specific kinds of tweets that would indicate the information, such as "[someone] won award for [award type]"

The basic framework is that the the data is loaded, then some preprocessing is done, mostly in tokenizing and in removing duplicate tweets. Then each tweet is fed to a function that will evaluate it and extract useful information. The extracted information is then stored in the "ideas" variable, which is basically a dictionary that holds various kinds of guesses and votes.

As additional rules are adding for extraction, the single function will probably become many functions, each specializing in extracting a particular kind of information, such as presenters or award names.

Once all the tweets are processed, the ideas variable should hold all relevant information, and a final output processing code can be used to select information based on votes, associate like terms, and eventually output the required answers.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
tmp		tmp
.gitignore		.gitignore
README.md		README.md
main.py		main.py
process_data.py		process_data.py
requirements.txt		requirements.txt
sample_subset.py		sample_subset.py
search_tweets.py		search_tweets.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

nu-20winter-cs337-g28

setup

Sample

To run

Additional work

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

ckpeter/nu-20winter-cs337-g28

Folders and files

Latest commit

History

Repository files navigation

nu-20winter-cs337-g28

setup

Sample

To run

Additional work

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages