Politiclimate 2.0

Politiclimate 1.0

Originally I was training my model on 1.4 million tweets gathered from the 25 most liberal and 25 most conservative urban counties filtered only by the location of the tweet. After training on the the collections of tweets, my models could predict whether a county was conservative or liberal based on a new corpus of gathered tweets.

I discovered some very important issues in my original process:

There was far too much noise. Sports teams, local events and the countless non-politically themed tweets created so much irrelevant data that the models did not train or test effectively – I needed to weed out the political tweets

Twitter’s streaming API does not allow for both location AND topic filtering

The time of the scraping of the original tweets only contains data relevant to the period and political climate. For example, there was notable mention of the Alabama election in the training set, but not so much in the test set gathered 2 weeks later

The Original Politiclimate Presentation

Politiclimate 2.0

Politiclimate 2.0 is the continuation of my General Assembly Capstone Project. The new goal is to create a website that contains a map based GUI live feed of politically charged tweets categorized as “Red” or “Blue” (Liberal or Conservative). Users can explore the hot-button issues discussed on social media in a given time and location and filter by political affiliation (with a margin of error of course).

Currently, the script can:

Run a daily scrape of relevant topical information using politically themed subreddits

Extract most potent topics using word count and LDA

Gather tweets based on up to 50 locations and 500 filters (topics scraped from political subreddits)

Store tweets in a remote Mongo Database

Clean the tweets using NLTK and custom scripts

Convert emojis into human-readable expressions and extract topics of interest

The next steps are:

Implementation of a training and testing pipeline for political leaning by developing a model for targeted sentiment analysis on topics for both conservative and liberal subreddits

Create a Graphical User Interface

Live feed of “Red” and “Blue” tweets populating a map

Statistics, charts and tables that display statistics of the data on Politiclimate.com (using React and possibly Django)

For example, explore the topical analysis of tweets in Alabama during the Moore/Jones election

Automate the process efficiently in 24 hour cycles:

Scrape new Liberal and Conservative topics (using LDA on subreddit headlines in political subreddits)