Making Algorithms Discoverable and Composable

Stephanie Kim

1 year ago

Just like a music producer creates a beat, then combines it with instrumentals and a baseline to form something catchy that lyrics can be applied to… developers need a way to compose algorithms together in a clean and elegant way.

Whether you’re creating a sentiment analysis pipeline for your social data or doing image processing on thousands of photos, you’ll need an easy way to combine the various tools available so you aren’t writing spaghetti code.

It isn’t always easy to combine the libraries you need. Sometimes a library or machine learning model is written in a different language than the one you’re using. Other times there might simply be a performance difference between languages which (is why we chose Rust to create a Video Metadata Extraction pipeline). And even though GitHub offers thousands of libraries, frameworks, and models to choose from, it’s sometimes difficult to find the one you need to solve your problem.

To solve these problems — and allow you to write elegant code while using machine learning models — Algorithmia provides an easy way to find, combine, and reuse models regardless of language. Each one gets a RES API endpoint, so you can mix & match them with each other and with external code.

For instance, say you have social data that needs to be classified into categories based on locations. For each location, you want to find out the sentiment of the document. We’re going to work through how you might solve this problem with Algorithmia.

If you aren’t signed up for an account, go ahead and do so. Once you do that you’ll see the homepage that looks like this:

So notice that not only is there an “Explore Algorithms” tab in the top middle panel, but below there are tabs that say “Featured Algorithms”, “Starred Algorithms”, and “My Algorithms”.

The Featured Algorithms are ones that we surface due to popularity from the number of API calls while Starred Algorithms are ones that you’ve found and want to save in your own personal registry. My Algorithms are ones that you created!

In the screenshot above, you’ll want to click on the Explore Algorithms panel which will take you to the algorithm hub:

We know that our document labeling and classification problem falls under natural language processing, so we’ll check out Text Analysis, but notice there is also Machine Learning, Computer Vision, and Deep Learning panels. Below that there are other categories like Utilities and Time Series, as well as a section to browse all algorithms by Top Rated, Most Called, and Recently Added.

Next, click on the Text Analysis panel so you can reach this page:

Above, you’ll see some general information about natural language processing, interactive demos, and recipes that show how to combine algorithms together to build out pipelines.

This is also a good time to point out that there is a search bar at the top navbar. This is great when you know the name of the algorithm you’re searching for or the general topic such as “image classification”.

Now, let’s get back to our problem. We know that we have social data that we want organized based on location, but we don’t have any labeled articles to work with.

First we know we are going to have to get the documents labeled with a bit of information extraction. If we check under the Recipes section we can see that conveniently, there is one for finding the “Named Entities of Tweets”. Go ahead and click on that and you’ll end up on the recipe page in Dev Center where there is a link to the GitHub repo and the original blog post.

After learning more about the Named Entity Recognition (NER) algorithm, you realize that’s exactly what you need to get the location noun phrases in tweets. But — wait! — you need to call the algorithm in Java, not Python like the tutorial shows.

Luckily on the bottom of each algorithm description page there are code snippets for how to call the algorithms in your supported language:

You can now group your documents by the key-value pair {“entity”: “LOCATION”}.

Now you decide you are ready to peruse the sentiment analysis algorithms so you type that into the search bar and find this:

Wow! Twenty algorithms for your sentiment analysis query. You decide to use one of the top two that are most frequently called, and since the second one called Social Sentiment Analysis looks to be trained on social data, you check that one out.

Notice that both the named entity recognition algorithm and the social sentiment analysis algorithm inputs were simple JSON formatted objects. Those will be easy to work with in Java using a hashmap.

So you pass in your data for each labeled location you got from the Named Entity Recognition algorithm and get the sentiment scores for the tweets falling under each location.

That’s great, but you heard that a team down the hall wants to use your script that consumes the Twitter data, strips out unnecessary characters, aggregates tweets based on location and finds the sentiment of the tweets in each location.

Instead of handing them your script, you decide to create an algorithm in Java that will host your chained algorithms and the rest of your code. You aren’t worried that the other team only knows Python because Algorithmia supports Python and many other language clients. Once you create an algorithm, Algorithmia will convert it into an API endpoint so the other team will be able to call it in Python.

In order to share it easily and make it more discoverable to your team, you decide that you should create an organization which can be done by clicking on the “CREATE ORGANIZATION” link under the profile section “ORGANIZATIONS:

So now you have an organization that you and your co-workers belong to so they can see and use the algorithm you hosted. For more information about teams or organization, check out the docs.

Now that was pretty easy to find the algorithms you needed, learn how to use them, and build a pipeline of NLP algorithms hosting it on Algorithmia. You made it discoverable to your team and best of all, you didn’t have to beg the devops team to host your script and scale it for the various groups using your algorithm. Algorithmia handled the devops for you.

If you want more ideas for chaining together algorithms check out our Use Cases that highlight several image recognition algorithms or Recipes where you’ll find tutorials on everything from text analysis to computer vision.