SQL Server is one of Microsoft’s strongest enterprise software products. Both in terms of revenue and industry reputation, it is widely adopted for its ease of use while at the same time offering high end features in terms of security, performance, availability and reliability.

The SQL Server Family of Products
Nowadays, when I think of SQL Server, I also think of it more like a family of SQL Server products:
1. The retail SQL Server product used on-premises or in IaaS cloud environments.
2. The fully managed Database as a Service: Azure SQL Database.
3. The fully managed Data Warehouse as a Service: Azure SQL Data Warehouse.
And now we’re going to have a new member of the family inside the Azure SQL Database service!

Today at the Data Amp event Scott Guthrie (Executive Vice President of the Cloud and Enterprise group in Microsoft) shared the announcement of this new offering, here’s the slide he showed:

Currently being called Azure SQL Managed Instance (final name to be determined), it is here to make migrations of SQL Server to Azure easier than ever. And once you’ve migrated, it will make management easier than ever as well!

What is the use case of this new offering?
Let’s step back and look at why Azure SQL Managed Instances are being released. Currently there is a gap of functionality between the retail SQL Server and Azure SQL Database. For example: a client that has an application that uses multiple databases and lots of code using three part names. Azure SQL Database doesn’t support three part names, so if you wanted to migrate this instance to Azure, your only choice would be to deploy and manage it as a SQL Server VM on IaaS. And yes, you can enable backup and patching agents, but at the end of the day, the configuration of the VM, maintenance of the OS and all the responsibility for SQL Server is still in your hands.

This is where the Azure SQL Managed Instance comes in. It provides a fully managed, higher feature-compatible instance that is a lot more similar to the SQL Server instance concept that we all know from the retail product. You don’t have to be responsible of every bit like with IaaS, while at the same time, you can continue running as an instance, with the features that are not compatible with the Azure SQL Database single database model. This means migration is straightforward without having to worry about compatibility or features not supported. It’s also easier because you don’t have to build every single piece of IaaS.

If you have experience with Azure SQL Database you might be thinking, how does this compare to the elastic database pools? The difference is that the pools are still created from databases that are single entities and the only thing they share are the elastic resources. On the other hand, Azure SQL Managed Instance will hold the database inside the instance container so they will be able to use three part names, linked servers, and other features that we are accustomed to using on the on-premises SQL Server.

Best of all, because Azure SQL Managed Instance is built on top of the PaaS framework that powers Azure SQL Database, once it’s migrated, it is fully managed in terms of OS, built-in HA, backups, etc.

As we can see in the diagram below, Azure SQL Managed Instance sits between having a full VM running SQL Server and the fully contained individual database paradigm of Azure SQL Database. And just like Azure SQL Db, with a Managed Instance the client can enjoy the continuous improvement and release of features that come with Microsoft’s cloud-first development model.

Bottom Line

Here at Pythian we have seen an exponential increase in projects to help clients adopt and move applications and database workloads to the public cloud. We are very excited about this new offering that will make migrating and managing SQL Server in Azure that much easier. We also anticipate that a very large percentage of our clients’ current SQL Server workloads on-premises will be a perfect fit for Azure SQL Managed Instance.

If you are running SQL Server and planning to migrate to the cloud, I hope I have piqued your interest. In the coming months we will have blog posts sharing more details in terms of implementation, features, migration path and more. Stay tuned!

Storage Spaces Direct (S2D) is a new feature of Windows Server 2016 that takes the ‘software-defined storage’ concept to the next level. The value proposition is that instead of deploying expensive SAN hardware, similar results can be achieved by pooling commodity storage and controlling it with commodity Windows Servers. Whereas the 2012 Storage Spaces release required SATA or SAS connected disks or JBOD enclosures, this new 2016 release can work with local-attached SATA, SAS and NVMe drives to each node in the cluster.

This means that not only can cheap storage be used but it also means that we can deploy this feature with Virtual Disks in the cloud! The value here is that we have a cluster with non-shared storage that acts like a shared storage one, all using Microsoft software end to end. And with this capability in place, we can then put a cluster-aware application like SQL Server on top of it and have a Failover Cluster Instance in the cloud.

This scenario hasn’t been supported before without the use of 3rd party products like SIOS DataKeeper. Mixing SQL Server and Storage Spaces Direct has been referenced by Microsoft on this blog post where it mentions that it is indeed planned to support this configuration. I hope down the line they will update the SQL Server support in Azure KB article to add support for S2D as well as SIOS.

Doing Failover Clustering in the cloud opens new deployment possibilities like providing SQL Server HA for databases like the distribution database on a replication topology or providing local HA while protecting the entire instance instead of the databases-only approach of Availability Groups.

I believe that this same setup can be duplicated in AWS, however this example will be for building this type of cluster in Azure. With the disclaimer out of the way, let’s get started!

Section I: Base infrastructure and Machines

First, I built this infrastructure from scratch with a new resource group and storage account. Then proceeded with the following:

1- Created Virtual Network – 10.0.0.0\16 (65K addresses) and one Subnet 10.0.0.0/24 (256 addresses). On a Prod environment you would likely have one Public Subnet with a jumpbox and a Private Subnet with the actual SQL Cluster but for demo simplicity I have only one Public Subnet.

2- Created a Windows Server 2016 Standard D1V2 VM, this will be our Active Directory VM. Nothing too powerful since it’s not really needed for an AD and DNS VM.

3- Added a disk to the AD VM

4- Formatted the disk to an NTFS volume F:

5- Changed the IP for the AD to be static – 10.0.0.4. We can use the portal or Powershell to make a NIC with a static IP.

7- Add AD Services and DNS, go through the installer. I created a domain called lab.warnerlab.com.

8- Restart.

If you need a more detailed guide to setting up your domain, you can do that on your own following a guide like THIS ONE.

8- Once the domain is up and running, go back to the VirtualNetwork in the Portal and assign the DC as the DNS server.

Before we go into the steps for creating the SQL cluster node VMs there is one major thing I want to point out. I used DV2 machines and regular HDD storage for the disks. I did this because I’m setting this up as a DEMO and don’t want to break my Azure bank. In a real production scenario I would recommend people use DSV2 or GSV2 VMs so that they can use Premium Storage. Then you can pool SSD disks for high performance cluster storage or mix some HDD in there as well for archive filegroups or whatever your workload demands.

I assume some familiarity with working with Azure so I’m not going to walk step by step on how to create the SQL VMs. Here are the high level steps for the SQL node machines:

1- Created a new D2V2 VM called sqlfcinode1.

2- Add it to the virtual network and subnet we created previously.

3- Add it to an availability set called sqlfci (2 fault and 2 update domains). This will protect our cluster VMs from simultaneous failures.

4- Add two disks to the VM to be part of the cluster storage pool. S2D requires at least 3 disks for a pool, so I’m putting 2 on each node. Again, I’m using 2 HDD for DEMO, on a real Production deployment you could have more and use SSD. Only attach them to the VM, do not format them! S2D will not add to a pool drives that have already been formatted.

5- Set the NIC of this node to be STATIC – 10.0.0.5.

6- On the VM, go to Server Manager – Local Server – Workgroup – Change the server to be joined to the LAB domain.

6- Repeat the steps above for a second node: sqlfcinode2 – 10.0.0.6.

Section B: Setting up the Cluster and Storage Spaces Direct

At this point, we have our virtual network, DNS, Active Directory and two Windows Server VMs with two data disks on each. It’s time to tie this all together into the actual cluster, enable S2D and create the storage pool and Cluster Shared Volumes we will use for the SQL instance! I will be doing some steps in PowerShell and others on the Windows GUI. These can be run on either node but for consistency, I’m running all the configuration from node 1 and only switch to node 2 as necessary. If something needs to be done on BOTH nodes it will say so as well. Let’s start:

1- On both nodes we need to enable the Clustering and File Services roles:

Note that I’m using a static address for the cluster itself as well as specifying that there is no storage to pull in as available drives at the moment.

4- We have a two-node cluster so we need a witness to have a Node Majority with Witness Quorum. I will take advantage of this opportunity to try out the new Azure cloud witness as well!

On Failover Cluster Manager, connect to the cluster. Go to More Actions on the right panel.

Go to Configure Cluster Quorum Wizard – Select the quorum witness.

Then we can select to configure a cloud witness.

You will need your azure storage account name and access key, you can get them from the portal.

5- Now we have a functioning cluster, it’s time to setup Storage Spaces Direct:

Enable-ClusterS2D

As a note of Warning, this cmdlet will fail if you have less than 3 disks on the cluster nodes. In this scenario I have two per node for a total of 4.

Once it completes, it will create a storage pool in the cluster, which you can see on the Failover Cluster Manager GUI.

You can also verify it with Powershell:

Get-StorageSubsystem Clus* | Get-PhysicalDisk

In our example, I get the 4 virtual disk drives as the members:

A small note on S2D storage options. It offers both two way and three way mirror, by default two-way will be selected unless specified otherwise. This can be done through the PhysicalDiskRedundancy or the NumberofDataCopies parameters. A two way mirror leaves more space available but can only sustain the loss of one disk, whereas 3-way can sustain the loss of two drives at the cost of more storage used.

There is also the option of disk parity vs mirroring. Parity can be more efficient with space but will use more compute, parity also requires 3 servers for single parity and 4 for dual parity. The whole gamut of options for configuring your storage tier using S2D is well beyond the scope of this guide, I will however refer you to this great blog post.

6- Now, we can create a couple of cluster shared volumes on these drives:

I’m creating two volumes, one for data and one for logs. I’m using the usual best practice of 64KB allocation unit size, fixed provisioning and using NTFS. SQL 2016 supports also using REFS however I still haven’t seen any definitive documentation that recommends REFS over NTFS for SQL Server workloads.

We can verify the cluster shared volumes were created in the FCM GUI

You can also verify with PowerShell:

Get-ClusterSharedVolume

7- The CSV disks will show in Windows as two mountpoints under C:\ClusterStorage called Volume1 and Volume2.

Note that this is a Cluster Shared Volume managed by S2D, so when I run the rename commands above, it automatically reflects the changes even if I’m viewing the volumes from node #2.

At this point, we’re good to go on the cluster and storage side. Time to install SQL Server!!

Section C: Installing the SQL Server Failover Cluster Instance

With all the clustering and storage pieces in place, the SQL FCI installation is pretty much the same with a few exceptions that I will note below. Let’s see:

1- From node sqlfcinode1, downloaded the media for SQL 2016 SP1.

2- Start the setup and select the install Failover cluster option.

3- You can go through the installation in the same way, with only some minor differences. I will be creating a SQL Network Name called SQLFCI.

4- On the validation, it will complain about the cluster validation having some Warnings, this is expected.

5- On the cluster disk selection, we select both data and log disks to be part of this SQL instance resource group.

6- On the network configuration, specify a non-used IP and note it down. I’m using 10.0.0.11.

This IP will be used later to create the Internal Load Balancer for the SQL cluster name.

7- The next thing that is worth noting is the data, log and tempdb directories. For data and logs, verify that they are using the CSV mountpoints we created before.

8- For tempdb I tried using the local temporary D drive with the installer but failed with permission errors.

So to be able to complete the install I just put tempdb on the CSV data mountpoint and then after the install we can move it to the D drive.

If you do want tempdb on that D drive then there are two options:

a) Make a folder and a script on both nodes as referenced here: https://blogs.technet.microsoft.com/dataplatforminsider/2014/09/25/using-ssds-in-azure-vms-to-store-sql-server-tempdb-and-buffer-pool-extensions/

b) Make the SQL service account a local admin on both nodes and move the files to the root of D.

Like I said, the installation failed for me when I tried to do this right from the installer, so I recommend using the data drive for tempdb and then moving them after if you want.

9- At this point you can finish going through the installer and finish the installation for sqlfcinode1.

10- Once installation completed on this node, I moved on and ran the installer from node 2.

11- In this case we select the option of adding a node to an existing Failover cluster instance. For node 2 there are no differences from a regular cluster install.

Once node 2 has been installed we’re ALMOST done. This is what it looks like with a SQL name of SQLFCI on Failover Cluster Manager:

Section D: SQL Cluster Name Connectivity

The final step is to make sure connectivity works and we have an Internal Load Balancer setup to be the SQL cluster name endpoint.

Because of the way Azure networking is done (no Gratuitous ARP), the ILB is required, otherwise clients will not be able to connect through the SQL cluster name after a failover.

To create the Internal Load Balancer, we can follow the instructions Microsoft has for creating an ILB for an Availability Groups Listener on this documentation page. To adapt to our use case, simply replace the Listener information with the information of the SQL cluster name (in our case the IP for SQLFCI is 10.0.0.11).

Here are the steps for the creation of the ILB:

1- Create the ILB resource.

We’re obviously placing the ILB on the same Virtual Network and Subnet. Also, we’re creating it with the same IP we used for the SQL Cluster name (10.0.0.11) and we are setting the IP assignment to be STATIC.

2- Adding a backend pool to the ILB:

Here we make sure to add the Availability Set and the two VMs in our cluster.

3- Adding the health probe to the ILB (note this is NOT the SQL port, it’s a generic unused port number). In this case, we just use port 59999:

This health probe is used by the ILB to determine if a machine that is part of the backend pool is not healthy any more.

4- Then add a Load Balancing Rule to the ILB (note this one IS the SQL port). Make sure you enable direct server return:

The ILB rule ties the ILB IP, the backend pool and the health probe together. Direct server return also allows the servers to continue working directly with the client once the ILB has done the job of doing the routing.

5- At this point, we need to run one final piece of PowerShell to configure the cluster to use the load balancer IP address.

WARNING: The properties were stored, but not all changes will take effect until SQL IP Address 1 (sqlfci) is taken offline and then online again.

So on Failover Cluster Manager, take the IP resource down and back up. This will also cause your SQL Server to go down because of the dependency, you can bring it back up as well.

7- There’s one more security layer that needs to be opened and that is the Windows Firewall on each node of the cluster. We need to open access to TCP 1433 (SQL Server) and TCP 59999 (the load balancer probe). If these are not opened, you will run into a ‘Network Path Not Found’ error when trying to connect from other machines on the same Virtual Network other than the active node.

Section E: Testing our SQL Server Failover

We got the infrastructure, SQL Server FCI and connectivity configured. Finally we are ready to test!!

Here’s the test:

1- I downloaded and restored the new sample Wide World Importers database for SQL 2016 and restored it on the instance. This is the RESTORE statement I used, notice that there’s nothing special about it, it’s just doing a MOVE of the files to the CSV mountpoint:

Under the covers, S2D will replicate the data over to the second node of the cluster and work transparently with SQL Server.

2- I’m connected from SSMS from another VM.

3- I start a workload and leave it running.

4- I stop the active node from the Azure Portal to simulate a node going down.

5- As expected, we lose connection on our current session, however, we can easily retry and we reconnect once SQL is online on the other node.

On the videos below you can see the failover testing recording. First test was moving the SQL resource gracefully like it was a patching maintenance and the second one I just stop the active node VM from the Azure portal like it was a lost VM.

Final Thoughts

There you have it. SQL Server 2016 Failover Cluster instance running on Windows Server 2016 on Azure, no 3rd party products.

The caveat here of course is that Storage Spaces Direct is a Windows Server DataCenter Edition feature and thus it will be quite a bit more expensive than regular Windows Server Standard.

If you do require SQL Server Failover Clustering then the choice to make is to either go with SIOS DataKeeper which will likely be less expensive or pay for Windows DataCenter and keep all the software stack and support under the Microsoft umbrella. In all fairness, S2D has other features that I didn’t explore that make it attractive on its own. For example, you could use the tiered storage capability of S2D to keep hot data on expensive P30 Premium Storage volumes and cold data on P10 or regular HDD volumes. You could also create larger pools of storage with Scale-Out File Server and create SQL data files through an SMB interface.

Regardless what the case may be, it’s very cool to see new technology synergies between the Server 2016 product and SQL 2016 and expand the reach of both tools.

Hello 2017!

Well that went by quick, didn’t it? As I’m typing this blog post it’s already November 2016 and it’s time to look forward to our plans for 2017. With that in mind, I’m going to outline the goals I’m setting out for my clients in 2017. Same as our personal goals, some of them will happen, some won’t, but knowing where we want to go will help us progress regardless.

I have two caveats around this list. First, I’m a Microsoft Data Platform consultant at Pythian so the examples will apply to that platform, however, most of the advice translates easily to other technology stacks. Second, these are my guidance and opinions, the beauty of working at a place like Pythian is that lively technology-related debates are encouraged and literally found around any corner. If any of the following resonates with you, please reach out to your Pythian main point of contact and start a conversation.

On to the list!

No more unsupported versions

For SQL Server shops this means that we need to urgently come up with a plan to get away from anything older than SQL 2008. So you have some legacy app that was never updated by the vendor and you’re stuck on this older version. Well then let’s move to a newer version while trying to maintain the older compatibility model (if possible) or worst case, virtualize and keep it running with the smallest footprint affordable.

And if you are running SQL 2008, let’s start planning now to move to a newer version or a cloud PaaS service.

What’s in it for your business: unsupported versions put you in a difficult position in the event of hitting a major product bug or issue. More importantly, once the software is unsupported, the vendor stops issuing security patches.

New product versions will also bring in new features that you can leverage for faster application response, quicker insights or improving the developer experience.

Are you sick and tired of this version catch-up game? Then let’s talk about leveraging cloud PaaS services where patching and upgrading is done transparently by the provider.

Virtualize Everything

Now I’m going to catch some heat for this one from some of my colleagues (you know who you are). Yes, if you’re running a big SMP low latency OLTP workload or a large 128 CPU analytical one this will not apply. However, for many other small and medium workloads, there’s simply no reason to deploy them directly on bare metal. Virtualization allows you to provision capacity more effectively, automate the creation and configuration of environments, prototype, and test faster, and provide a built-in first layer of HA. You don’t want to deploy virtualization on-premises? Great! Any of the public cloud providers will be more than happy to provide Infrastructure as a Service capabilities.

What’s in it for your business: better configuration management through golden images that are fast to deploy. This velocity decreases your time to develop, test, prototype (potentially fail) and release. Compute resources can be leveraged more efficiently and on top of that, you get a basic layer of HA.

Automate and Alert

You have proper run-books with well-documented procedures and deviations. That’s awesome, let’s take it to the next level and drive for automation. If you’re hesitant about automating the processes that you know work very well in a manual fashion, that’s OK. I understand your hesitation, and that’s why all the automation catches exceptions and errors and sends them to an operations resource that understands what’s going on. If everything goes well, no human intervention. On the odd case where it doesn’t, you send an alert and then that person can improve the automation to catch that case next time.

What’s in it for your business: automation frees people up for more valuable work. It also reduces the tediousness of repetitive work and the potential for human error. The automate and alert feedback loop forces people to improve their understanding of the process and product (SQL Server, Oracle, MySQL, etc.), improve their error handling and detection of edge cases. In other words, it keeps them SHARP.

Effective and Actionable Monitoring

Myself and my colleagues on the Pythian’s Managed Services team have accumulated countless on-call hours for many Mission Critical systems. If there’s one thing I really don’t like it’s a noisy monitoring system that produces alerts that are not actionable. When your monitoring system is asking for a human being (in-house or a service like us, doesn’t matter) to spend their precious time to look at an alert, you have to make sure that there is actually something valuable that the human can do to either improve the service or prevent an issue.

We don’t want the human target of the alerts to constantly ignore the noise from the tool or to acknowledge situations that they can’t fix. This means that the monitoring system is in need of optimization. In the case of Pythian, I personally don’t want our teams to be spending time on noise, I want them actively engaged on real alerts as close to 100% of the times as possible.

What’s in it for your business: there are two benefits of tuning monitoring to the max. First, if the monitoring is generating lots of noise then people will get desensitized and will be more likely to miss a real alert when it happens. Second, you’re wasting valuable resource time and energy that would be better spent elsewhere.

Disaster Recovery

Yes, it’s painful to have idle resources and rent data center space just in case a major disaster happens around your main data center. For years, this has been the bane of many IT managers to justify any DR investment. Well guess what, the cloud has killed all those excuses. Even if you don’t want to have a warm standby copy in the cloud, I propose the following:

Pick a cloud region that makes sense for you latency wise.

Upload your backups there.

Once a month boot up a VM, restore that backup and alert if something goes wrong.

Shut down the VM.

All automated.

What’s in it for your business: Not only do you get off-site backup storage at cheap cloud prices, you also get to test your restores and DR capability, that’s a 3×1! And you end up with a DR strategy that makes sense cost-wise and provides a decent RTO and RPO.

Let’s talk Cloud

I left this one last because really, it could be a blog post on its own. Some clients are barely dipping their toes by offloading storage and backups. Others are starting to leverage multi-region redundancy in their IaaS workloads. Other clients are already knee deep in PaaS services for IoT streaming, data visualization, managed big data or data warehousing. Wherever it is that you find yourself, we want to sit down with you and draw out your 2017 plans. We are uniquely positioned to be exposed to all the providers, all their offerings and you can learn from the lessons we have learned.

What’s in it for your business: I see the cloud as a vehicle for freeing people and resources from mundane tasks and allowing them to focus on delivering more value to a business. Some believe the cloud means running your business IT for fewer dollars but while that is possible, a lot of times it won’t be the case. However, properly implemented, a public cloud provider should be able to bring redundancy, faster time to market, free up resources and provide the lowest possible cost of curiosity.

Parting Words

Needless to say, it’s not an exhaustive list. For other clients, the goals we’ve setup for 2017 involve In-Memory technologies, better analytics, applying machine learning, real-time data ingestion, moving systems to IaaS or trying out cloud PaaS services. Everyone is at a different point in this road and we love helping people navigate the challenges of the ever-expanding world of data.

Let’s start playing offense, tackle our technical debt and innovate in the New Year. Here’s to reaching all our goals in 2017!

It is no secret that we are generating increasing amounts of data every day. Hundreds of apps on our phones that we carry with us every day, amass a collection of data from connected devices. Hundreds and thousands of different computer systems enable the operations of every single company in the planet. We are at the point where we generate more data than we sometimes know what to do with. Previously, organizations would, at best, case archive it, and worst case, simply delete it and carry on. The question is though, what if that data actually held valuable insights that could provide tangible, practical benefits to the operation of the business?

What is Azure Data Lake?

Azure Data Lake (ADL) is a Platform as a Service offering on Microsoft Azure. The service is split into two independent but closely related subservices: Azure Data Lake Store and Azure Data Lake Analytics.

Azure Data Lake Store

Azure Data Lake store was built to overcome some of the limitations that exist on the regular Azure Blob Storage. First, there are no limits to the amount of files, size of a file, or the aggregated size of all files in the account. Second, the storage is optimized for Big Data and analytics. This means lots of writes and large read scans.

A great feature of ADL storage is that it provides a Web-HDFS API so you can create an HDInsight cluster, attach the ADL store and then run the entire set of Hadoop tools on your data.

That is the beauty of a service like Azure Data Lake. You can store and accumulate this data at a low cost in the cloud and not pay any charge to analyze it until you’re ready to do so. Once you’re ready to analyze, when you deploy a managed Hadoop cluster either on Azure, AWS or GCP, you pay for the compute while the cluster is up, even if you’re not actively using it. Azure Data Lake takes this concept to the next level with ADL Analytics.

Azure Data Lake Analytics

For the analytics piece of the service, Microsoft wanted users to have an experience that was cost efficient to scale and use, and easy to pick up for data pros that were not Hadoop experts.

To achieve the cost efficiency piece, the analytics service is 100% on-demand and does not require deploying any cluster or destroying it when you’re done. You submit an analytics job and you pay by the minute based on the compute power you assign to the job. Compute power is measured in Analytic Units, each unit represents a compute container that can process data in parallel.

With the service, you can accumulate data continuously all throughout the month and only pay for storage. You can then fire up an analytics end-of-month job, get your aggregation or any other results and you only paid for that one job in terms of compute.

To achieve the second objective of ease of use for data pros, Microsoft released a new language to run analytics called U-SQL.

What is U-SQL?

U-SQL is the language used for writing ADL Analytics jobs. It’s a mix of T-SQL and C# so it is instantly familiar to anyone that has done any SQL or object oriented programming with C# or Java. The use of U-SQL makes it easy for someone to approach the service and not have to be aware of all different tools and methods to run analytics in the Hadoop ecosystem, they only need to know some SQL, some basic programming and they can start coding their own Big Data analytics jobs. I can’t stress enough how big this is for lowering the barrier of entry to doing Big Data analytics. I have personally demonstrated U-SQL to groups of DBAs and shown them how to get up and running with it and they get very excited with the service. What looked like a long and intricate learning curve suddenly becomes very approachable through this new service and a familiar language.

Future Direction

At this point, the biggest gap that I can see in ADL is the lack of an interactive session type of experience. Right now the analytics have to be run in batch mode through jobs and always getting the output as a file in the ADL storage. Down the line, I hope we’ll see the support for interactive sessions similar to what you can do with Google BigQuery or with a Spark cluster and a notebook.

Demo Time!

Now, let’s go to the video and the demo and get an introduction of the Azure Portal experience for Azure Data Lake and a quick look at some basic U-SQL. Enjoy!

Once you’ve tackled the challenge of understanding if you should move to the public cloud, you’re faced with the next big question: which cloud provider and solution will meet my business’ needs?

There are many major providers in the market with their own unique features, each having their own strengths and weaknesses. It is important to understand the basics of all available offerings so you do not complete a majority of your migration project to find out you picked a suboptimal solution.

Database Platform as a Service offerings are very tempting because they enable developers to roll out their own database infrastructure in minutes with minimal management overhead. However, this same amount of flexibility also comes with the challenges of picking the right tool, on the right provider and with the proper expectations.

This presentation explores these solutions by discussing:

What Database-as-a-Service means

The key providers and their offerings

The primary purpose and strengths of each offering

Things to watch out for per provider

This presentation is perfect for anyone who is still deciding if the cloud is right for them. By listening to this presentation, you can begin to narrow down which technologies are best suited to meet your project’s unique requirements.

What comes next?

Now that you have chosen the right technology, are you ready to start planning your cloud project? If so, check out this proven five phase framework that our cloud experts follow to reduce risks and avoid costly mistakes associated with a cloud migration project.

SQL Server 2016 hit general availability on June 1st 2016 and comes with a big set of new features and improvements. From built-in JSON support to mobile and tablet friendly Reporting Services, as usual there is a lot of investment done by Microsoft before a major release of the flagship database product.

In this post I want to focus on the Columnstore indexes feature and give you an idea of why it’s important and where you can leverage this functionality. SQL 2016 continues to build on this feature and it’s one of the best ways for Enterprise license customers to get the most bang for their buck.

What is a Columnstore?

Columnstore indexes are a different way of organizing and processing data for a database. Instead of organizing the records where all the fields are stored together, a Columnstore will store column values together and then reconstruct the record based on the different column values. Columnar storage has been implemented by most database vendors especially targeting the data warehousing space. Microsoft, Vertica, Teradata, Oracle, Amazon and Google have all implemented their version of this type of data storage.

There are two main advantages to running Columnstores in SQL Server:

Compression: due to column values being stored together, data compression can yield great benefits to space savings and increasing data density in RAM.

Batch mode: SQL Server has a special processing mode used for Columnstore data where it will process multiple rows (about 1000) at the same time in one CPU instruction. This yields great performance benefits.

The use case for Columnstores is large data warehousing tables or as an analytics-friendly index on top of OLTP data. It is not meant to be the main structure supporting an OLTP system as it’s not efficient in updating or deleting data.

What is the history of the Columnstore feature?

SQL Server first introduced Columnar Storage with the SQL 2012 Enterprise release. In this release, Columnstores were read-only indexes, so it required to drop the index, load the table or partition and then rebuild the index to refresh it with the latest data.

SQL Server 2014 upgraded Columnstores with full read-write capabilities, allowing the Columnstore to become the ‘clustered’ index for the table and hold all the data instead of just being one more index on top of row-organized data. 2014 also introduced many improvements to batch operations so more pieces of an execution plan could take advantage of this faster processing mode.

What’s new in SQL Server 2016?

With SQL 2016 the Columnstore story just keeps getting stronger. Here are the new features in this release:

Batch mode is enabled even for single-thread queries and covers more execution operators.

Updateable non-clustered Columnstore on top of a heap, row-based clustered index or In-Memory table. This is the scenario called Real-time analytics.

Non-clustered b-tree indexes on a clustered Columnstore. This allows creating PKs and FKs on the Columnstore in a trade-off of integrity for performance.

Non-clustered Columnstores can have filter and compression delays defined in order to fine-tune when a row is moved to the Columnstore

Keep in mind that the latest Columnstore technology is available not only in SQL Server 2016 but also on Azure SQL Database (for real-time analytics or small datamarts) or Azure SQL DataWarehouse (full data warehouse support).

Demo

In the demo I’m going to highlight the new feature of PK and FK constraints on Columnstores. Check it out below!

Despite being well documented for several years now, every now and then we still run into clients that have bad experiences because of SQL injection attacks. If you’re not familiar, a SQL injection attack happens when an attacker exploits an application vulnerability in how they pass queries and data into the database and insert their own malicious SQL code to be executed. If you want to see different examples and get the full details, the Wikipedia page is very comprehensive.

Depending on how the application is configured, this kind of attack can go all the way from enabling attackers to see data they shouldn’t, to dropping an entire database if your application is allowed to do so. The fact that it’s an application based vulnerability also means that it really depends on proper coding and testing of all inputs in the application to prevent it. In other words, it can be very time-consuming to go back and plug all the holes if the application wasn’t securely built from the ground up.

Built-in Threat Detection

To attack this issue, and as part of the ongoing security story of SQL Server, Microsoft has now invested in the feature called Database Threat Detection. When enabled, the service will automatically scan the audit records generated from the database and will flag any anomalies that it detects. There are many patterns of injections so it makes sense to have a machine be the one reading all the SQL and flagging them. MS is not disclosing the patterns or the algorithms in an effort to make working around the detection more difficult.

What about on-premises?

This feature right now is only available on Azure SQL Db. However, we all know that Azure SQL Db is basically the testing grounds for all major new features coming to the box product. I would not be surprised if the threat detection eventually makes it to the on-premises product as well (my speculation though, nothing announced about this).

Pre-requisites
For this new feature you will need Azure SQL Db, you will also need to have auditing enabled on the database. The current way this works is by analyzing the audit records so it’s 100% reactive, nothing proactive. You will need a storage account as well since that’s where the audit logs get stored. The portal will walk you through this whole process, we’ll see that in the demo video.

Current State
As I mentioned, right now the tool is more of a reactive tool as it only lets you know after it has detected the anomaly. In the future, I would love to see a preventive configuration where one can specify a policy to completely prevent suspicious SQL from running. Sure, there can always be false alarms, however, if all the application query patterns are known, this number should be very low. If the database is open to ad-hoc querying then a policy could allow to only prevent the queries or even shut down the database after several different alerts have been generated. The more flexible the configuration, the better, but in the end what I want to see is a move from alerting me to preventing the injection to begin with.

In the demo, I’m going to go through enabling Azure SQL threat detection, some basic injection patterns and what the alerts look like. Let’s check it out!

One of the cool things about SQL Server is that it comes bundled with all the Business Intelligence services with the core database engine license. Reporting Services (which includes the Mobile Report Publisher), Analysis Services, and Integration Services are all integrated, and are ready to get going as soon as you install SQL Server. This has made it not only cost-efficient for many organizations to deploy BI, but it has also contributed to a wide adoption among the SQL Server customer base.

What is the Mobile Report Publisher?

Currently in preview, the Mobile Report Publisher is a new report and dashboard editor that publishes reports to Reporting Services, and it’s part of the bigger road map that Microsoft has for their Business Intelligence On-Premises story. We all know that in the cloud, Power BI has been getting a large amount of investment, but with on-premises there was a big gap that was getting wider and wider, until now.

With this upcoming SQL 2016 release, the Microsoft team is focusing on bringing Reporting Services into the future as a one-stop integrated solution for BI deployment so that cloud BI or a competitor’s product (*cough*Tableau*cough) are not the only modern alternatives.

This Reporting Services refactor is the biggest change made to the product since SQL Server 2005 was released over 10 years ago. Leveraging the best parts of the Datazen acquisition, the Microsoft team is looking to provide a cohesive BI story that integrates web-based and mobile reports, Report Builder reports, and Power View style of modern visuals.

How is it different?

You’re probably used to working with SSRS Report Builder and are wondering what the idea is with Mobile Report Publisher. The demo below will make it very clear, but let’s just say that reports have now been split in two types:

1. Paginated reports: this is the ‘legacy’ style report that is built by Report Builder. It looks more flat, has the same controls as before, and is fully compatible with all the existing reports you have already deployed.

2. Mobile reports: Even though the name says ‘mobile’ these reports work just as well on desktop, tablet and mobile. If you’re familiar with web design, then the appropriate term would be that these reports are “responsive”. They can be done once, and will display nicely across devices. This is the new experience that the Mobile Report Publisher is targeting.

Where do we go from here?

Right now the current version is SQL Server 2016 CTP 3.2. We still have several different versions that will be released before SQL 2016 goes RTM later on this year. Currently, a lot of the planned functionality is not entirely there for the main SSRS interface, and you will be asked to switch to the ‘”classic” view often.

The Mobile Report Publisher experience is also very much targeted towards creating a visualization, and there’s no data modeling to be found. You pretty much need to have your datasets pre-made and in the format that the tool will expect, or you won’t get very far. Hopefully, at some point the team will add modeling capabilities like the robust experience we already have on the Power Bi desktop tool.

If you want to practice and get your feet wet with this release you can do it now, and for free by visiting SQL Server 2016 CTP 3.2. You can also preview the Mobile Report Publisher. Like I said before, this is a big departure so most people will want to train up to be ready when the new versions go live.

Demo

In the demo below I’m going to walk through the new SSRS interface preview in CTP 3.2, and then walk through creating a quick dashboard for both desktop and mobile consumption using the publisher. Let’s check it out!

As the amount of data generated around us continues to grow exponentially, organizations have to keep coming up with the solutions of our new technological landscape. Data integration has been part of this challenge for many years now and there are many tools that have been developed specifically for these needs. Some tools are geared specifically from moving data from point A to point B, other tools provide a full ETL (Extract-Transform-Load) solution that can work with many products using all kinds of different drivers.

For many years, the first party tool of choice for SQL Server professionals has been SSIS. Interestingly, even though it’s called SQL Server Integration Services, SSIS is really a general purpose ETL tool. If you want to extract data from Oracle, transform it with the full expressive capabilities of .NET and then upload it to a partner’s FTP as a flat file, you can do it in SSIS!

As we continue our journey into the cloud and hybrid environments, more tools will start coming up that will work as an ETL PaaS offering. You won’t have to manage the pipeline’s OS, hardware or underlying software, you’ll just create your data pipelines and be off to the races.

What is it?
Azure Data Factory (ADF) is Microsoft’s cloud offering for data integration and processing as a service. You don’t have to install any bits or manage any software, you’re only responsible of creating the pipelines. Since it’s developed to run inside the Azure the tool also has some pre-made hooks that make it really easy to interoperate with other Azure services such as blob storage, HDInsight or Azure Machine Learning.

On premises you would need a machine (VM or physical), you would need a license for your ETL tool (let’s say SSIS), then you would need to keep SSIS patched up, the machine up to date, think about software and hardware refreshes and so on. Using ADF, you can focus on the pipeline itself and not have to worry about what underlying sofware and hardware is actually making it work. The service supports a wide array of sources and targets (and continues to grow) and also robust options for scheduling the pipeline or running continuously to look for new slices of data.

When should you use it?
If you’re thinking about creating a new SSIS package and find that your sources are all web or cloud based then ADF is a good choice. Build a prototype of your pipeline, make sure that it supports your expected transformations and then you can operationalize it on the cloud. As a PaaS offering, it takes away the time, cost and effort of having to deal with the underlying bits and you can just focus on delivering quality data pipelines in a shorter timeframe.

Service Limitations

Like all new things in Azure, there are still some service limitations. The biggest one at the moment is that the service is only available in the West US and North Europe regions. If you don’t have resources in those regions and will be moving a lot of data then I would advise to start learning the service and prototyping but not put in production the pipelines. The reason for that is that any data movement from outside the region will have an outbound transfer cost. If your resources are in those regions then there’s no charge and you can ignore this warning.

Demo
In the Demo video we’ll look at the user interface of Azure Data Factory, how to add a source and target, scheduling and checking the status of the pipeline. Enjoy!

Security is on everyone’s mind these days in the IT (and the real) world. Either because they’re dealing with compliance, risks or mitigation, etc. at work or because they just saw on the news yet another item about some big leak/breach happening. It is said that it’s not a question of if your systems will be attacked but when. As part of the SQL product family, Microsoft has now released a new feature called AlwaysEncrypted to continue risk mitigation and strengthen the security story of the product. And I mentioned the SQL ‘product family’ instead of just SQL Server because this feature is also available on Azure SQL Database.

What is it?
AlwaysEncrypted is the latest in the set of features that enables encryption inside SQL Server. Let’s look at the list so far:

Column level encryptionThis targets specific columns in specific tables, with the encryption/decryption happening at the server.

Transparent Database Encryption (A.K.A TDE): This targets entire databases and is transparent to the calling application. It’s also transparent to any user with proper access to the data.

AlwaysEncrypted: This also targets specific columns in specific tables, with the encryption/decryption happening ON THE CLIENT.

This is the big difference of this new feature, that the operations to encrypt/decrypt happen on the client NOT on SQL Server. That means that if your SQL Server is compromised, the key pieces to reveal the data are NOT with the server. This means that even if your DBA wants to see the data, if they don’t have access to the CLIENT application then they won’t be able to see the values.

How Does it Work?
This feature can be enabled through T-SQL or through a wizard in Management Studio. The actual data manipulation is done by the latest version of the ADO .NET client and during configuration, the client will read all of the data, perform the encryption and send it back to SQL Server for storage. The latest 4.6 release of the .NET framework is required. There’s a Column Master Key that will have to be stored in a Windows certificate store, Azure Key Vault or other 3rd party key storage software. During normal application operation, the ADO client will read this master key and use it to decrypt and encrypt the values.

There are two options for this type of encryption:

Randomized
This will make the same source values encrypt into DIFFERENT encrypted values. Useful for columns that could be correlated by looking at them and won’t be used for searching.

Deterministic: This will make the same source values encrypt into the SAME encrypted values, thus allowing for indexing and searching.

For the demo, check the video below where we’ll use the SSMS Wizard to enable AlwaysEncrypted on a column and will show the decryption happening in SSIS using the ADO .NET client!