Building Intelligent Applications

Jon Peck

2 years ago

Algorithmia was delighted to speak at Seattle’s Building Intelligent Applications meetup last month. We provided attendees with an introductory view of machine learning, walked through a bit of sample code, touched on deep learning, and talked about about various tools for training and deploying models.

For those who were able to attend, we wanted to send out a big “thank you!” for being a great audience. For those who weren’t able to make it, you can find our slides and notes below, and we hope to see you at the next meetup on Wednesday, April 26. Data Scientists Emre Ozdemir and Stephanie Peña will be presenting two Python-based recommender systems at Galvanize in Pioneer Square.

The first step you might consider before diving into a machine or deep learning project is you need a basic understanding of statistics and linear algebra.

It helps you to understand the math behind the models and algorithms you are using.

You’ll recognize when your models are too good to be true (called overfitting).

It will help you understand other people’s implementations of models.

You’ll have a better understanding how to evaluate different models to find which ones are best suited to your use case.

BUT…more important than the math: understand the big picture of the models and how they apply to use cases and different datasets versus memorizing the math behind every algorithm.

The next step is to learn a programming language conducive to using common machine learning libraries. While there are machine learning libraries in Ruby and even Node the libraries aren’t as fully developed nor have the community support of languages like Python, Java, Scala and R.

While technical skills are important, so is having domain knowledge. Whether your problem domain is in the financial, educational or real estate domain, you should understand your business problem very well.

Finally we should talk about software engineering. No, you don’t have to be a software engineer to tackle machine learning but you will need to understand how a model will integrate into your current ecosystem and how will it impact performance.

For example how much CPU does your recommendation engine take?

Or GPU?

Will it create bottlenecks in your pipeline?

Does it need access to a database?

Is it processing user information in real time?

Do you need big data technologies such as Hadoop/Spark, will you need Kafka for real time streaming data?

What makes something an “intelligent” application? There are certain tasks which, until recently, we’ve always relied on humans to do. Many of these require some form of pattern recognition: reading a handwritten note, categorizing images, or understanding the meaning of a sentence. Writing the explicit code to do accurately perform of these tasks is very hard (or impossible), but if we can teach a computer to recognize certain patterns, then it can learn how to perform them without a programmer writing (much) discrete code.

Intelligent applications already saturate our day-to-day world: consider video filters which must recognize and track faces, smartphone assistants which understand our meaning (mostly) and perform tasks for us, drug-discovery tools which identify molecules that might bind to certain receptor sites for, or image enhancers which can figure out which colors ought to be present in a greyscale image.

When we build a classical software application, we pipe some user input into some explicit logic that a programmer has hand-coded, and this yields some immediately useful output. By contrast, Machine Learning is a two-step process.

First, we take some set of data we have on-hand, clean it up and split it into training and test data, pick a mathematical model to use, and then train the model to recognize patterns in the data. For example, we might have a set of images which we know to be either cats or dogs. We feed these images into our model, training it (via various techniques) to recognize whether each image it sees is either a cat or a dog, and build from this a “trained model”.

Next, we take this trained model and use it to react to new user input. If a user presents us with a new image, our model will give us back a best guess of “that’s a cat!” or “that’s a dog!”. We can do this repeatedly without retraining our model, so long as all we want to do it differentiate cats from dogs.

Of course, if our user gives us a picture of a hedgehog, our model will fail, because it only knows how to recognize cats and dogs. If we want to fix that, we’ll need to go back and retrain the model, including some images of hedgehogs this time around. Then we’ll take our retrained model and drop it into our application where the old model used to be.

Here’s a concrete example of replacing a “classical” software app with a machine-learning solution, using the task of detecting whether an image is likely to contain nudity — a problem which websites and social utilities often face if they need to ensure that their content is “safe for work”.

One classical approach first locates any face(s) in the image, samples the pixels around the nose area (which is usually unclothed) in order to determine the individual’s skin tone, then considers what percentage of the overall image is skin-colored. This works reasonably well for many images, but fails for a significant number of cases (for example, when there are many other objects near, but not covering, the nude individual.

A machine-learning approach which performs much better uses image tagging. First, we feed our model a large set of images, both nude and non-nude. We’ve tagged these images beforehand with the names of the items they contain: “car”, “scarf”, “nipples”, etc. The machine learns how to identify specific items, and when training is complete, it can be fed new images and it will give back a list of items it thinks the image probably contains, along with a confidence level for each item (“I’m 99% sure there’s a face in this image, but only 20% sure there’s a pair of pants”). Since we know that certain items imply nudity, we can then create a composite score for the image (“It contains a face, shirt, and pants, and no nipples… probably not nude”).

ML has a lot of sub-fields and uses, but some common ones that have broad applications are:

Image recognition, categorizing or finding specific features in images

Prediction, recognizing trends in data and figuring out what might happen next

Agents, which react to speech or text or actions with often human-like responses

Broadly, we can think of ML s having two main categories, Supervised and Unsupervised Learning.

In Supervised Learning, our data has usually been pre-tagged with Features (for example, the list of items in an image) and Labels (the outcome we expect the machine to give us)

Unsupervised learning doesn’t require pre-tagged data; instead, it is used to find patterns in data. For example, we might provide a set of customer demographics (age, weight, gender, etc) and ask the model to group them by similarity. Often, unsupervised learning is a precursor to a supervised step, but sometimes it stands on its own.

Before you begin working with your data, however, you’ll almost always need to clean it up. In both Supervised and Unsupervised learning, you may need to standardize the data (for example, grouping continuous data into buckets (age 10-20, 21-30, etc) or getting rid of datapoints which are clearly invalid (I’m only training a cat vs dog classifier, but somebody threw in a single hedgehog!)

With Supervised Learning, you’ll also need to label your data (“this picture is a cat, but that one is a dog”) and possibly narrow down the variables/information which the model will actually consider (for example, pre-cropping images or discarding certain irrelevant columns of the dataset) — having too many variables to consider can lead to overfitting, as well as slowing down the training process.

A few examples of Supervised Learning types/models:

In classification, we’re looking to place an inputted item into a single category: is this a cat or a dog? Does this image contain nudity?

In prediction, we’re looking to find trends, so that we can know what the next likely values will be

Neural Nets and Support Vector Machines are powerful models which can solve most problems, but often require more data and processing time than more narrow techniques

Unsupervised Learning is used to find similarities and group things together, or identify the specific characteristics which cause those groupings. We might be looking to find out what types of customers behave most like each other, or figure out which features tend to differ between those groups of individuals.

There are a lot of libraries available to help you in your ML pursuits (see the Resources section at the end for just a few). One common starting place is scikit-learn, a readily available Python package. Simply “pip install scikit-learn” to begin using it.

We’ll walk through a code example using the popular “iris” dataset (available inside scikit-learn, but also downloadable elsewhere if you’re using a different language or library). This data set pertains to three specific species of irises (the flower, not the eyes): i setosa, i versicolor, and i verginica. The heights and widths of these flowers’ petals and sepals tends to differ between species, so we’ll train a model to identify which species a given flower is, given these dimensions.

Our dataset has a bunch of rows, each of which is an individual flower which somebody measured in the field. “Iris.data” contains the dimensions of the petals and Sepals. “Iris.target” tells us what species each one is (encoded as 0, 1, or 2 for setosa, versicolor, or verginica).

We begin by importing sklearn’s datasets and loading the iris dataset into memory.

Before we actually make use of the data, though, we want to break it into a “training” portion (about 80% of the data) which we’ll use for creating our model, and a “testing” portion (the remaining 20%) which we’ll use to verify that the model is working properly. But… before doing even that, we need to shuffle the data to ensure that it is randomly distributed across our training vs testing portions. Now we have two groups of data: data_train (the petal and sepal dimensions) and target_train (the species of flower) will be used for training. Data_test and target_test will be used for testing the model.

Next, we’ll create a Support Vector Machine and train it on our data. This is a deceptively simple one-liner… we bring in svm from sklearn and call .fit(), handing it our training data and a bunch of other parameters. In reality, you may need to do this many times, changing the parameters until you get a model which fits your data well; look up the documentation for scikit-learn SVM to see exactly what those parameters are, and learn how to adjust them.

Once we’re done training, we have a model which can be used to predict the species for new rows of data. If we intend to use this model elsewhere, we can save it to a file using pickle.dumps(), and reload it using pickle.loads(). This is pretty common practice, since you’ll often train your model on one machine, then use/host the model elsewhere in your user-facing application.

We’ll also test our model (in reality, we’d do this before ever bothering to pickle it, so rearrange this code sample to fit your needs). This is done by simply calling model.predict() and handing it one or more rows of petal and sepal dimensions. It gives back row(s) of predicted species. Here, we compare these results against our known-correct target_test to see how well our model behaved. If it was poor, we go back and tweak our parameters.

If you’re looking to work with natural (human) languages, the Python Natural Language ToolKit has some great tools for parsing text, generating trees representing each sentence’s structure, identifying parts-of-speech, and other language-specific activities… as well has a bunch of sample texts to play with.

If you’re a Java fan, or want to start with a more visual exploration of your data, take a look at Weka. It contains a bunch of ML packages you can run in your Java app, but also has both command-line and visual user-interface modes of operation.

Deep learning is a type of machine learning that relies on an architecture that is modeled after the human brain. The way that information is passed between neurons in your brain and the pathways between them inspired an artificial neural network architecture that is currently being used to classify images, used in sentiment analysis and to transform audio to text.

Why model after the human brain?

Humans are really good at taking an abstract problem and inferring information from it.

For example a person can easily determine an image of a cat from an image of a dog.

Until recently, machines weren’t very good at this, but with the use of neural networks, made useable with the advent of GPU’s to run computationally intensive training, there has been a lot of progress in deep learning.

The reason we use neural networks instead of other machine learning models, is because deep learning is able to generalize better than other models.

What makes deep learning special?

It’s the idea of starting with small features to build a bigger picture, just like the human brain does.

Deep learning essentially learns from examples. Say you are trying to classify an image as Superman. But some images have the legs cut off from the viewpoint while others are of him flying and still others are of him standing or saving someone. A deep learning algorithm will try to learn the different representations of what Superman looks like from your data and then will be able to classify new data based on what it inferred as the features regarding what makes Superman, Superman.

There are of course some differences in how the human brain works and how artificial neural networks are designed.

For instance in certain deep learning architectures like the one in the slide (well, this one is kinda shallow since it has only a couple of layers), the information flows one way and each node can only be targeted by the layer before it. This particular example is called a feedforward neural network.

There are 3 basic layers in a neural network: the input, output and middle or hidden layers.

Using the example from the beginner Kaggle competition that attempts to predict Titanic survival, we’ll use whether a passenger survived (1) or died (0) based three features: age, ticket price and sex where gender is set to a 0 for male and a 1 for female.

Our net’s goal is to try to learn what features are important in determining the survival rate of passengers.

Something to remember, is that the inputs get the numeric values, but no computation is done in the input layer.

Notice the inputs are modified with unique weights that are randomly assigned by the algorithm.

When the weights of the input nodes get passed along to the hidden layer, each node in the hidden layer has an activation function which determines if the next layer of nodes will receive any information.

The next hidden layer of nodes (hidden layer 2) will receive the values (the sum of weights) of the previous layer. Note that the learning happens when there is the error calculated between the model’s output and the known sample output data, then the model adjusts the weights accordingly.

Finally when the difference between the known output and the model’s output are small enough, you reach the final output layer. A confidence score is is the output.

Then you can pickle your model and classify your data!

In the above slide there is a screenshot of playground.tensorflow, an interactive site that allows you to try different datasets to solve for different problems.

Notice the inputs x1 and x2. These inputs could be any numeric representation of data such as a vector that represents the numeric rgb values of an image.

In this example the inputs represent x, y spatial coordinates of randomly generated numbers from a gaussian distribution.

These inputs are being fed into a middle layer called the hidden layer which does the processing of data and within the network it discovers features that most likely represent the original dataset.

For instance the first neuron in the first hidden layer in the slide shows that the neuron is looking for a data point in the bottom left corner, the next one checks to see if it’s in the upper right and the so on.

The hidden layer is learning the important features of where a data point belongs in a Gaussian distribution.

The more neurons and more layers you have, the more features the model can learn – each layer builds more meaningful abstraction based on the features the layer before it learned.

And that’s your high level intro to deep learning and deep learning architecture.

Now that you know a little about machine and deep learning, I’ll describe how to select your model, where to train it and where to deploy it to make it useable on new data.

Size of Data

Smaller datasets will be managed better with certain models like a Naive Bayes classifier.

Smaller data does better with models that have high-bias and low-variance such as a Naive Bayes classifier to avoid overfitting (when your model cannot generalize past original dataset to unseen data).

Type of Data

Certain machine learning models are known to work well with certain types of data.

Image data often uses convolutional neural network which is a deep learning model.

What are your goals?

Are you interested in predicting y given x?

Linear regression

Finding hidden patterns in data?

Unsupervised models such as K-Means clustering algorithm

Classifying data?

Naive Bayes

Resources:

Time – do you want a pre-trained, pre-built solution that’s easy to implement?

Money – how much money do you have to throw at the problem?

Knowledge – do you like devops? If you do, then you can choose whatever model you want and deploy it on any platform that meets your specs, otherwise you’ll want to look at one of the pre-trained, pre-built solutions that are easier to implement and are managed for you.

Local Machine Option:

You’ve chosen your model and now want to train it on your dataset. While you’re comfortable with installing and managing dependencies, you don’t want to pay for a cloud instance, so you figure you can train it on your local machine. When is this a good idea?

When you have a smaller dataset (depends on how much RAM you have to spare).

Responsible for managing dependencies, but also figuring out instance sizes and security configurations.

Benefits similar to local machine: install basically anything you want.

All of the self managed services listed here have great documentation and tutorials, they all provide the ability to manage instances in both a console environment & API, and you can create custom instances in all of them as well:

AWS EC2/P2

Pay as you go – roundup to the hour.

Pricing differences between instance sizes & they have on-demand and spot instances which allow you to bid on spare instances which can save you up to 90% of EC2 costs.

Flexible – configurable. AWS has a ton of services available that integrate together to give you the stack and control you want.

Azure

Pay as you go – roundup to minute.

Pricing broken down by category such as General Purpose or High Performance Compute and instance size. Also provides description of what category would be useful in particular use cases.

Integration with Microsoft products like Visual Studio easy.

Google Cloud

Pay as you go – by minute-level increments.

Pricing differences between instance sizes and they offer predefined machines & custom machines which differ in price (custom often being cheaper since predefined machines are a set price).

Tend to be less flexible in offerings which can be good for if this is your first time working with VM’s because they narrow down your choices for you regarding what Google products you should use with your VM.

Rackspace (kind of falls between self managed and managed since they don’t offer any training models to use, but they are a managed cloud provider)

They are all reliable to a point, so they offer discounts when there are outages. Some don’t offer all of the add-ons you want, so make sure when you’re deciding between them that they support the pipeline you want.

Managed/UI:

There is also the option of the more plug-in-play variety. This option is suited for when you have a specific use case (due to the limitations on models available), but great if you don’t necessarily have the expertise to code from scratch to train your model. They all have basic models to train your data with, you can access your trained model via a REST API, they offer basic model evaluation and versioning if you want to update your model and the pricing tends to be broken down into training and prediction pricing.

Model evaluation available via basic training and testing set validation.

Access via their REST API.

Store your data in S3 or Redshift.

Pricing for AWS ML is broken down to training prices and prediction pricing.

Azure ML

Drag and drop – you can use the UI for model building, training.

Or train your model using R or Python.

You can create an R model and upload that into Azure ML then train.

Basic models are available like regression, K-means, boosted decision tree.

Model evaluation & visualization.

If you’re using Azure’s ML Studio it has free tier and standard tier for training.

Google Cloud ML Engine

In beta.

Python.

Looks like right now it only supports Tensorflow models.

Prices broken down into training tiers and prediction pricing.

Ability to tune hyperparameters (for instance layers in neural net).

Integration with Jupyter notebooks.

When you have a ton of data:

All of the above support running Spark.

Azure VM’s, AWS P2 and Google Compute Engine offers GPU instances.

Note that out of the managed options only Google Cloud ML offers GPU’s.

Rackspace only offers GPU on hosted machines not cloud instances.

Note For all of the self-managed and managed options, you will also need to store your data models and resources on their platform which sometimes adds to the price and does not include deployment costs.

Whether you have an unsupervised model that tags documents on the fly or a trained model that finds the sentiment of user input to properly route them in a call center, you’ll need a platform to deploy our model to make inferences or predictions. The next slide will review your options.

For deployment we’ll only cover the managed services rather than the self-managed solutions.

What we mean by deployment is that you take your trained model and you integrate it as a service into your current stack to make some sort of predictions or inferences on new unseen data.

For example you have a recommendation engine that when a user signs in, uses a trained model to predict what products the user would like based on some sort of features such as the user’s past purchase history.

All of these examples of managed services deploy your model via an API endpoint which is how you would use it to make predictions or inferences. Most are all pay as you go prediction pricing which is billed separately from training and hosting pricing even when you use the same platform except Algorithmia and Google Cloud ML where you can host for free, and then are charged for predictions.