Talking About R… And SQL Server

Getting good insights from your data is essential for making the right business decisions and providing your customers with valuable services. That is why at Predica we focus on R Services implementation in various analytical projects.

This article starts the Talking About R series. In my next few posts, I will cover the application of R to the Microsoft environment. Today I will focus on using R Services in combination with SQL Server.

In Data Driven world, R is one of the most important programming languages for processing, analyzing and modeling data. That is why at Predica we focus on R Services implementation in various analytical projects.

Its applications range from relational databases to advanced and complex analytical solutions, including machine learning modeling. Today I will focus on using R in combination with SQL Server.

Microsoft has massively changed analytics after taking over the Revolution Analytics in 2015. R has subsequently been implemented in Power BI, SQL Server, in Azure as R Server, and in Azure Machine Learning. We can now also run R code in Azure Functions.

Taken together, these technologies give you uncountable facilities to create an advanced analytical solution that is easy to manage and monitor.

Why R?

R is an open source technology dedicated originally to data scientists, statisticians and people doing advanced analysis. Since the 90s, R has regularly increased in popularity and number of applications. As a result, we now have a huge community on the internet, many forums, blogs etc.

Currently, a number of Microsoft technologies have an R extension:

R Services

R Open and R Client

Power BI (R script, R visuals)

Azure Machine Learning (R script, R model)

R Server (Standalone)

Azure Functions

Moreover, you can use R to apply advanced analytics to on-premises, hybrid or cloud-based data.

Why run R scripts in SQL Server?

The use of R services introduces many facilities and removes many restrictions. Here are five reasons why we should work in R with SQL Server using R Services:

Providing analytics to the place of data storage, thus no data extraction is required

Overlooking limitations associated with performance and data size

Greatly reducing security risks

Avoiding costs of data duplication

Having an environment that is easy to manage and monitor.

#CostEffectiveness

This solution is very cost-effective for three reasons. It is included in SQL Server (starting with 2016 version), you can reuse and optimize your R code prepared earlier, and most importantly, it reduces costs related to data movement.

#SimplicityAndAgility

The simplicity of this solution makes it possible to combine SQL skills with R programming, and that is enough to work smoothly and implement an advanced analytics solution in near-DB infrastructure. Moreover, it gives you the possibility of parallel threading and processing.

#ScalabilityAndChoice

Using R Services, you can write once and deploy anywhere because you can easily switch the compute context or environment for solution deployment. The native implementation of open source R reads data into a data-frame structure, all of which is held in memory – R services break this limitation.

#CompatibilityWithRevoScaleR

The introduction of R to the Microsoft stack led to the development of the RevoScaleR package. In combination with R Server or R Services, it yielded satisfactory results. See the comparison on the linear model example below.

Open source R and RevoScaleR performance comparison

R Services – where to start?

Now we can explore how to begin the journey with R Services and plan work with this extension.

Here is a picture of the current R code implementation life cycle in DataBase:

Let’s go through it step by step.

1. Install R extension for SQL Server

The R extension in SQL Server is available from the 2016 version, so you should have one of the following components installed before using this feature:

SQL Server 2016 R Services

SQL Server 2017 Machine Learning Services, with R language installed

If you need more detailed information about the installation process, click on this link – Microsoft has prepared a great guide to setting up this environment.

#Hint: Since SQL Server 2017 version you can also configure Python in a very similar way – please consider!

2. Enable and verify

The next step is to enable the ability to run external scripts because it is disabled by default.

To do this, you should use this SQL command:

Once you run the above code, you need to restart the instance of SQL Server, then you can go to the verification process. The administrator should consider security aspects related to external scripts executions and assign appropriate roles and privileges to users.

To verify that your configuration is working correctly, you can use a very simple “Hello world” method.

3. Develop R Script

Now you can prepare R script in external IDE (e.g. R studio or R Tools for Visual Studio), where you will be able to switch the compute context between the local environment and SQL Server.

It is advisable to test your code before the implementation in SQL server stored procedure because you can dynamically debug and correct your code.

To learn more about switching contexts, please read this article about compute context and RevoScaleR package.

Context switching can take place not only in the SQL Database but also in other well-known engines like Hadoop or Teradata, on premises or in the cloud – the choice is yours.

There are many ways to switch compute contexts between your resources

4. Train and save the model

Preparing input for the stored procedure is an important step. It is much harder to test and debug code after input implementation. That is why you should prepare input first.

Moreover, at this stage, a model is prepared, tested, and parameterized, because only in the final configuration is it saved in the models’ database table – of course, in a serialized form.

5. Operationalize the model

At this point, the model script and other scripts used to prepare modeling data are encapsulated into the stored procedure. They will be run ad-hoc or on schedule. The structure of the sp_execute_external_script procedure parameters is presented below with an example of clustering implementation.

@input_data_1 – specifies the input data used by the external script in the form of SELECT statement.

For all the above parameters, remember to assign NVARCHAR values by adding N before quotation marks.

There are even more parameters among others to assign aliases for the input data, but the ones I have mentioned are necessary to use the script with input data. You can read more about the parameters here.

6. Tune and configure for production

This is the last step where the administrator assigns privileges to the appropriate users to invoke the procedure. Here, administrators have to direct all activities related to scheduling and the provision of this solution to the right group. Thus, imposing any rules related to securing the solution.

#Hint: You can do the versioning of code in R using Team Foundation Services because Visual Studio has R Tools For Visual Studio extension.

Conclusion

Using R Services you can get better data insights with fast data computation and little (if any at all) data movement. It also allows you to implement advanced analytics solutions, machine learning and even visualization in a consistent environment with SQL Server.

If you need any help with R Services implementation or you need advice on how to use it, then let me know. Together we will create a great solution that will ensure your business is one step ahead!

Key takeaways

Using R programming can significantly enhance data analytics capabilities of many Microsoft services

A combination of R and SQL Server allows to create a powerful and cost-effective analytics solution that reduces security risks

RevoScaleR can be utilized to enhance performance of your solution and switch compute contexts if required

Can We Stay In Touch? Managing Customer Consent In Dynamics 365

Why do you have to find a new job?

A Cloud Without Borders: Managing Mergers And Acquisitions [Part 2]

Hello!

Hey, my name is Dawid and at Predica I am the one to play a "Data Philosopher" role. My main duties are all around BI and Data Analytics projects. What's it all about? Designing and implementing advanced analytics solutions including machine learning algorithms, multidimensional/tabular data models OLAP, designing, developing and implementing data warehouses or databases, as well as building self-service reporting platforms and analytical dashboards (while coding in T-SQL, C#, R, DAX, MDX). Lots of incredible stuff to share with the world!