New Year & New Updates to the Windows Data Science Virtual Machine

This post is authored by Gopi Kumar, Principal Program Manager in the Data Group at Microsoft.

First of all, a big thank you to all users of the Data Science Virtual Machine (DSVM) for your tremendous response to our offering in 2016. We’re looking forward to a similarly great year in 2017.

The new year also brings in some interesting new tools to our DSVM users, to help you be more productive with data science. In this post, we summarize key recent changes on the Windows Server side of our DSVM offering, below.

Microsoft R Server 9.0.1 (MRS9) developer edition, a major update to the enterprise scalable R extension from Microsoft, is now available on the VM. This version brings a lot of exciting changes including several fast ML / deep learning algorithms developed by Microsoft in a new library called Microsoft ML. There’s a new architecture and interface for deploying R models and functions as web services, this follows a paradigm and interface library very similar to Azure ML operationalization. The library is called mrsdeploy. We have some R deployment samples for both notebook and R Tools for Visual Studio (RTVS) and RStudio. The olapR package in Microsoft R Server lets you run MDX queries and connect directly to OLAP cubes on SQL Server 2016 Analysis Services from your R solution. SQL Server 2016 Developer edition and the associated Microsoft R In-DB analytics is also updated to Service Pack 1.

R Studio Desktop open source edition is now preinstalled into the VM, by popular demand.

R Tools for Visual Studio is now updated to version 0.5, bringing in multi-window plotting and SQL tooling to run R code on SQL Server 2016.

Microsoft Cognitive Toolkit (formerly called CNTK) is now on Version 2 Beta 6, and features several improvements and sample notebooks to perform fast deep learning using Python interface or the CNTK Brainscript interface.

Apache Drill, a SQL based query tool that can work with various data sources and formats (e.g. JSON, CSV), was part of our previous update. We now prepackage and configure drivers to access various Azure data services such as Blobs, SQLDW/Azure SQL, HDI and Document DB. See this tutorial in our gallery for information on how to query data in various Azure data sources from within the Drill SQL query language.

JuliaPro is available to DSVM users and is now pre-installed and pre-configured on the VM, thanks to Julia Computing (a company founded by the creators of Julia programming language). JuliaPro is a curated distribution of the open source Julia language along with a set of popular packages for scientific computing, data science, AI and optimization. The JuliaPro distribution comes with an Atom based IDE, Jupyter notebooks and several sample notebooks on the DSVM Jupyter instance to help you get started. Julia Computing also provides an Enterprise edition with commercial support.

The Deep Learning Toolkit for the Windows DSVM is an extension to help you jump start deep learning on Azure GPU VMs, and without having to spend time installing GPU framework dependencies and drivers or configuring the various deep learning tools. This extension has been updated to include the latest versions of CNTK 2, mxNet for GPU along with new samples. It also features the Windows version of TensorFlow.

We also offer a Linux Edition of the data science virtual machine and there will be a separate post on major updates there.

I’d like to end this post with a graphical summary of the DSVM, showing a [non-exhaustive] list of the various tools that are preinstalled. DSVM helps you focus more on data science and spend less time on installing, configuring and administering tools, thereby making you more productive. Give DSVM a shot today and send us feedback on how we can make it even better for your data science needs.

Thanks Subbu for the inputs. We will consider offering a Windows test drive. Meanwhile, Azure free trial can be used to spin up a Windows DSVM if you are new to Azure.

3 years ago

José Almeida

From my perspective, in a world of Data Science where there are tools like RStudio or Weka completely for free, this VM should be also given for free for developers.
The usage fees that apply for using this Virtual Machine are an obstacle for those who want to give it a try.
Hope MSFT can change this,
Thanks

Appreciate your feedback Jose and understand the obstacle for those wanting to try. As mentioned you can use the free Azure trial to give the DSVM and some of the other Azure services a try. We will also consider offering a Windows DSVM test drive. We are currently piloting a free test drive (8 hour at this point) for the Linux version of the data science VM. Thanks again for your interest. BTW – If you have an MSDN subscription, some free Azure credits may be available on your account to use every month when your subscription is active. The credits depend on level of subscription. Please confirm that on your MSDN subscription account.

Thanks for bringing up this question. Unfortunately we dont have an easy/automated way currently to upgrade existing instances since some of our refreshes are major changes and the unknown dependencies on libraries and config settings. Also in many cases the upgrades takes a lot more time than spinning up a new instance. In fact a lot of folks here in Microsoft use DSVM almost as a on-demand compute resources and have code outside on some repos like Git, Visual Studio Online and datasets on some shared Azure infrastructure like blobs, Azure file store, SQL DW. This way, other than programs nothing permanent is on the VM. I also understand that one would have to reinstall custom libraries, tools not already builtin and other config tweaks. Hopefully the package managers in R, Python, Linux and Windows update makes that a little less cumbersome (once you have an inventory / metadata of 3rd party libraries and packages saved somewhere). Of course, compatibility issues are tricky. Sorry, I did not exactly address your concerns. Hopefully some of the points above helps . Would love to hear some suggestions how we can support upgrades better.

Thanks Christopher. Our goal is to support a wide range of tools and languages to help you be most productive with data science and analytics.

All – Feel free to give us feedback on other popular tools that can be great candidates on the Data Science VM. We also have a forum to provide those inputs and any other questions at http://aka.ms/dsvm/forum