Leon Yin

About

I'm a Data Scientist and Research Engineer at the Social Media and Political Participation (SMaPP) lab and Center for Data Science at NYU.
I develop data-driven tools and methods to measure, mine, and model net politics.
I work on tangential projects as a volunteer at DataKind, and as a research affilate at Data&Society's Media Manipulation team.
Previously, I wrote scientific software at NASA, and wrangled data for Sony.

A low-level Json parser to work with large files (like Twitter Dumps) that don't fit in memory. This software utilizes generators and streaming (de)compression to transform so-called "big data" problems into smart data problems. Below is a tutorial on how to use SmappDragon to analyze links from questionable media sources from opensources.co.
Jupyter Notebook Tutorial Github Repo PyPi Page

I explore 15 years of biogeochemical seawater measurements along Antarctica's Palmer station sample grid. I analyze spatial-temporal variation within the water column, and calculate mixed layer depth and net community production.
Jupyter Notebook

What Makes a Drop of Seawater Unique?

The answer is in its chemistry! If we inspect the relationship between isotope-enrichment and salinity, we can trace the droplet to its landfall origin. Scientists have been collecting this data as far back as 1949. However, it existed in disparate sources. In 1999, a group of scientists lead by Gavin Schmidt centralized these sources. A decade and a half later, I contributed by building a data pipeline, performing anomoly detecton, and making some visualizations.
I did not know it at the time, but this was my genisis into data science.
Jupyter Notebook Presentation Poster at AGU 2015 d3.js map

Data Pipes and Web Scrapers

Coming soon!

Technology adopts historical mistakes and bias.
If any of the projects have room to improve please let me know via email or github :)
The next section contains a Javascript app that cycles through a collection of quotes I like.