Improvements to machine learning capabilities in SQL Server 2019

Many organizations seek to do more with their data than pump out dashboards and reports. Applying advanced analytical approaches such as machine learning is an essential arena of knowledge for any data professional. While database administrators (DBAs) don’t necessarily have to become data scientists, they should have a deep understanding of the machine learning technologies at their disposal and how to use them in collaboration with other domain experts.

For those of us who work with SQL Server, there are many cool new capabilities to get familiar with in SQL Server 2019. At the heart of it all is a solution called Big Data Clusters, allowing you to create scalable clusters of SQL Server, Apache Spark, and HDFS containers running on Kubernetes.

That means flexibility in the ways you access the data and relational data side-by-side. Through the cluster, you can query data from external sources. You can also store big data in HDFS managed by SQL Server. At the end of the day, this makes more of your data available, faster and more easily, for machine learning, artificial intelligence, and other advanced analytical tasks.

SQL Server 2019 also provides expanded machine learning capabilities built in. It adds commonly requested features related to the use of R and Python for machine learning. For example, SQL Server 2019 enables SQL Server Machine Learning Services to be installed on Linux. Failover clusters are supported for greater reliability, and new and improved scripting capabilities open new options for generating and enhancing models.

Integration of Python with the SQL server database engine enables you to perform advanced machine learning tasks close to the data rather than moving it around. Insights generated from the Python runtime can be accessed by production applications using standard SQL Server data access methods.

With the addition of partition-based modeling, you can train many small models instead of one large model when using partitioned data. If you have data that breaks out easily using categories such as demographics or regions, partitioning enables you to get more granular with your models without having to break the dataset apart.

As the line between DBA and data scientist continues to blur, most of us will be expected to understand and manage these types of solutions. Microsoft clearly recognizes the importance of machine learning and the need to apply it more easily across different data types—while maintaining the performance and manageability benefits of using SQL Server.