Earlier this month, I gave an introductory talk at Data Philly on deep reinforcement learning. The talk followed the Nature paper on teaching neural networks to play Atari games by Google DeepMind and was intended as a crash course on deep reinforcement learning for the uninitiated. Get the slides below!

Attention has gotten plenty of attention lately, after yielding state of the art results in multiple fields of research. From image captioning and language translation to interactive question answering, Attention has quickly become a key tool to which researchers must attend. Some have taken notice and even postulate that attention is all you need. But what is Attention anyway? Should you pay attention to Attention? Attention enables the model to focus in on important pieces of the feature space. In this post, we explain how the Attention mechanism works mathematically and then implement the equations using Keras. We conclude with discussing how to “see” the Attention mechanism at work by identifying important words for a classification task.

Everyone has heard the feats of Google’s “dreaming” neural network. Today, we’re going to define a special loss function so that we can dream adversarially– that is, we will dream in a way that will fool the InceptionV3 image classifier to classify an image of a dreamy cat as a coffeepot.

Hogwild! is asynchronous stochastic gradient descent algorithm. The Hogwild! approach utilizes “lock-free” gradient updates. For a machine learning model, this means that the weights of a model are updated by multiple processes at the same time with the possibility of overwriting each other. In this post, we will use the multiprocessing library to implement Hogwild! in Python for training a linear regression model.

In in many classification problems, time is not a feature (literally). Case in point: healthcare. Say you want to predict if a person has a deadly disease. You probably have some historical data that you plan use to train a model and then deploy it into production. Great– but what happens if the distribution of people you predict on changes by the time your model starts (miss) flagging people? This covariate shift can be a real issue. In healthcare, the Affordable Care Act completely reshaped the landscape of the insured population, so there definitely were models out there that were faced with a healthier population than they were trained on. The question is: Is there anything you can do during training to correct for covariate shift? The answer is yes.