Introducing Jupyter Notebooks in Azure ML Studio

Posted by Shahrokh Mortazavi, Partner Director of Program Management at Microsoft.

Azure ML Studio is a powerful canvas for the composition of machine learning experiments and their subsequent operationalization and consumption. Although the Studio provides an easy to use, yet powerful, drag-drop style of creating experiments, you sometimes need a good old “REPL” to have a tight loop where you enter some script code and get a response. I am delighted to announce that we’ve now integrated this functionality into ML Studio through Jupyter Notebooks:

Jupyter Notebooks run on any OS and modern browser. Notebooks, at a high level, consist of two main types of “cells” – markdown cells for documentation and executable code cells. After editing a cell, press Shift+Enter to run it:

Jupyter Notebooks also provide special commands (“magics”) that act as macros:

And also an escape character (“!”) to access the shell:

This only scratches the surface of what you can do with Notebooks – for a few short and longer tutorials, see the links at the bottom of this post.

Integration with Studio

In this preview, the Notebook service supports several core scenarios:

Want to know more about a dataset? Simply select it, then choose to open it in a Notebook and explore away. Your dataset is automatically available as a Pandas dataframe:

Inspect Intermediate Data in an Experiment

Sometimes you need to check out a dataset in between phases. There is now an easy way to do this, first, add a convert-to-csv node. Then right click on it and open in a Notebook. Your data will be available as a Pandas dataframe as in the above case:

Author Code Snippets for Python Modules in Experiments

Currently you can add R and Python code modules in your experiments by editing them directly in the embedded editor. While convenient for short snippets, it does not provide an execution environment. You can use Notebooks to author and debug your modules and then paste them back into the experiment nodes instead. In the future we’ll provide a way to insert the code directly into the script node in an experiment:

Your notebooks are persisted in your workspace and can be used in subsequent sessions. You can see a list of your notebooks by clicking on the Notebook tab. Notebooks can be renamed, deleted, copied, etc. from either the Studio or from Jupyter directly and both environments will sync up.

Azure ML Client SDK

Enumerating and exploring your datasets and experiments from within the notebook (or any IDE for that matter) is pretty easy:

You can actually slice, dice, and store the modified dataset back into Azure ML. These and similar functionality is available via the recently enhanced Azure ML Client SDK.

Additionally, you can use the Python Azure SDK to access a wide variety of services in Azure. These including operations such as Storage, Service Management, etc.:

Note: Both the Azure SDK and the Azure ML Client SDK are preinstalled for you.

Execution Environment

The Notebook environment currently supports Python 2 and Python 3. We will be adding full R support in the near future. When you start up a Notebook, you have the full Anaconda 64-bit distro available to you. The full list of pkgs can be found here. The most relevant ones are: numpy/scipy, pandas, matplotlib, scikit-learn. For the curious, the Notebook service runs on Ubuntu 14.04.02 under Docker. Shell commands are available via the “!” escape character.

If you are inactive for more than one hour, your Notebook Server will be reclaimed. Notebooks are check-pointed regularly and the latest saved version will appear in your Studio workspace. You can also manually click Save on the menu bar as well as download the Notebook to your local machine.

Preview Limitations

The following limitation currently exist and will likely be changed in the future:

Network access is limited to Azure. You can place your data in various stores in Azure and access them in Python (Azure SDK) or Azure ML Studio.

While the notebooks support Python 2 and Python 3, operationalization (web service) only supports Python 2.

Some of the Azure ML algorithms are not yet available while in Notebooks (use scikit-learn, pybrain, statsmodels, etc).

You can’t upload text files, create folders or terminals.

Roadmap

While there is a lot of functionality already available in this preview, we consider this a baby step. We have a lot of exciting plans for Notebook scenarios in the coming year. We have a close working relationship with the Jupyter team and will work with them to incorporate and rollout updated versions as soon as they’re stable. Some of the ideas we’re exploring include:

Full R support (RRE, RRO)

Deeper Azure ML Studio integration

Deeper intellisense

Integrated debugging

Dashboarding

Authoring experiments and publishing entirely from within Notebooks

PowerShell integration

Improved Notebook sharing support

Git integration

Publishing your sample Notebooks in the Marketplace

Help Us Improve Jupyter Notebooks on Azure

Want to make sure your idea is on the roadmap? Want to help us prioritize features? Please check out this 1 minute survey and let us know what you think:

Conclusion

Jupyter is one of the most important innovations in the data science and technical computing space in recent years. You now have full access to its power from any OS, from any modern browser directly from inside Azure ML Studio. You can choose whichever canvas makes the most sense at that particular moment. The two work together hand in hand to ensure a productive and delightful experience for you.

Any mechanism that I can use to install my own packages? Speciifcally, want to install Caffe….

3 years ago

Neil

Should this be showing up right now in my lab? I don’t see it there.

3 years ago

Shahrokh

@Govert – big fan of F# too. we are going to do R next. note that what we add in the notebook needs to be available on the ML’s backend side too. Right the backend supports R & Python. please take the survey:https://www.surveymonkey.com/s/JupyterOnAzureML and vote for your next language. thx.

3 years ago

Shahrokh

@chris it’s a temporary limitation due to azure-only network access. we’ll remove it soon & then you can.

3 years ago

Shahrokh

@neil – when you got to studio.azureml.net you dont see a "Notebook" tab on the left?

3 years ago

Ali Ahmadi - Bing Relevance

Any chance you can dd SFrames support from GraphLab (Dato) ?

3 years ago

Shahrokh

@Ali – we’ll take a look. ideally if we can get it thru Anaconda, then it’ll be much easier. so please ping/poke them as well 🙂

3 years ago

Dan - MSFT

@Neil – Are you an "owner" of the workspace? Only those invited as "owners" are able to use notebooks within the workspace. If you are not an owner, you must ask another owner to invite you again as an owner.

3 years ago

Anonymous

This post is authored by Raymond Laghaeian, Senior Program Manager at Microsoft.