Issue #259

Nov 8 2018

Editor Picks

That Time the City of Seattle Accidentally Gave Me 32m Emails for 40 DollarsThe first large batch of requests for email metadata were sent to the largest cities of fourteen arbitrary states in a trial run of sorts. In the end of that batch, only two cities were willing to continue with the request - Houston and Seattle. Houston complied surprisingly quickly and snail mailed the metadata for 6m emails.
Seattle on the other hand...

The Dankstimate: Cannabis Price EstimationFive years later, cannabis is legal for medicinal use in 31 states and recreational use in 9 states. There are thousands of dispensaries from which one can obtain pricing data to analyze. I thought it might be a good time to revisit cannabis pricing to build a model that outputs a price benchmark for dispensaries (a “dankstimate” in the vein of Zillow’s “zestimate”)...

A Message from this week's Sponsor:

Business Analytics at Clark University will give you the skills employers demand by teaching you how to synthesize data into powerful information. Whether you enroll in a full- or part-time master’s or accelerated certificate program, you will be equipped to transform data into something meaningful.

You don’t need a background in statistics or science to succeed here. We offer:

Data Science Articles & Videos

Using LDA to Build a Missing Yelp FeatureWhy is it that Yelp does not have a feature where the user could provide inputs on bars that they enjoy, and then expand on that search at a location of the user’s choosing? This search method would also be able to capture less tangible aspects like ambience. Therefore I decided that for my 4th Metis project, I would build this feature using unsupervised learning and NLP techniques...

Predicting overdose mortality per US countyThe opioid epidemic has turned into one of the major public health catastrophes for this generation of Americans. Similar to what tobacco/smoking or HIV/AIDS were to earlier generations, the opioid epidemic appears to be this era’s defining public health crisis. I wanted to see if it was possible to build a model to predict opioid-related mortality on a county by county basis, since this type of model might give insights into where and how to target interventions...

PocketFlow - from the Tencent AI LabAn Automatic Model Compression (AutoMC) framework for developing smaller and faster AI applications.PocketFlow aims at providing an easy-to-use toolkit for developers to improve the inference efficiency with little or no performance degradation. Developers only needs to specify the desired compression and/or acceleration ratios and then PocketFlow will automatically choose proper hyper-parameters to generate a highly efficient compressed model for deployment...

NVIDIA Launches Year-Long Research Residency ProgramIf you’re a researcher looking to deepen your exposure to AI, NVIDIA invites you to apply to its new AI Research Residency program. During the one-year, paid program, residents will be paired with an NVIDIA research scientist on a joint project and have the opportunity to publish and present their findings at prominent research conferences such as CVPR, ICLR and ICML...

How to build your own Neural Network from scratch in PythonAs part of my personal journey to gain a better understanding of Deep Learning, I’ve decided to build a Neural Network from scratch without a deep learning library like TensorFlow. I believe that understanding the inner workings of a Neural Network is important to any aspiring Data Scientist. This article contains what I’ve learned, and hopefully it’ll be useful for you as well!...

The MAME RL Algorithm Training ToolkitA Python library that can provide a Gym-like API around almost any old arcade game. They show how to set up new ROMS, and provide RL example for “Street Fighter III Third Strike: Fight for the Future...

Analyzing Experiment Outcomes: Beyond Average Treatment EffectsWe have found that calculating quantile treatment effects (QTEs) allows us to effectively and efficiently characterize the full distribution of treatment effects and thus capture the inherent heterogeneity in treatment effects when thousands of riders and drivers interact within Uber’s marketplace. In this article, we describe what QTEs are, how exactly they provide additional insights beyond ATEs, why they are relevant for a business such as Uber’s, and how we calculate them...

Jobs

Want to build an RL system with real money against business experts? Apply now!

PepsiCo operates in an environment undergoing immense and rapid change, driven by eCommerce and emergent retail technologies. To ensure continued success in the food and beverage space, PepsiCo has assembled a dedicated eCommerce team – tasked with optimizing eCommerce operations and developing innovations that will give PepsiCo a sustainable competitive advantage. While tied closely to broader PepsiCo, the eCommerce group more closely resembles a start-up environment; embracing the core values of having bias for action, being results oriented, maintaining a community-focus, and prioritizing people.

PepsiCo’s Data Science and Analytics group is a team of data scientists, technology specialists, and business innovators who operate within eCommerce to build industry-leading systems and solutions. By focusing on machine learning and automation, the Data Science & Analytics group is pushing the bounds of possibility for PepsiCo and its strategic partners...

Training & Resources

Apply Transforms To PyTorch Torchvision DatasetsLearn how to use the Torchvision Transforms Parameter in the initialization function to apply transforms to PyTorch Torchvision Datasets during the data import process, via a screencast video and full tutorial transcript...

Books

"The best single book on Data Science today. I handle the data analysis and BI for the delivery side of a huge internet-based retail company, and have been a fan of Foreman's since his "Analytics Made Skeezy" blog days. His explanations are clear, his examples are to the point, and throughout it all, he is results-oriented."...