Archive for: June 15th, 2017

dplyr 0.7.0 is a major release including over 100 improvements and bug fixes, as described in the release notes. In this blog post, I want to discuss one big change and a handful of smaller updates. This version of dplyr also saw a major revamp of database connections. That’s a big topic, so it’ll get its own blog post next week.

Read on to learn about tidy evaluation and the Star Wars data set. There’s a lot to wrap your head around in this release.

The most glaring change from what was announced earlier, is Power BI Report Server can only connect to analysis services data sources, both tabular and multidimensional. If you want to connect to SQL Server, Oracle or Excel or all three, use the Power BI Web Service. Only going to the cloud version will users be able to create a data mashup or connect to anything but SQL Server.

Connecting to one data source is not what was promised when the Power BI Report Server was announced in May. Various Power BI Product members held a session at the Microsoft Data Summit where attendees were able to ask questions. I asked “When are we going to be able to use Power BI Report Server with data sources other than analysis services?” In a room full of people, I was assured that it was a top priority of the team to release the same data connectivity functionality for Power BI Report Server that currently exists for Power BI Services and the current plan was to release this functionality the next release.

This is the most glaring flaw with Power BI Report Server at the moment. Unfortunately, that probably makes it DOA for my purposes, at least until they introduce SQL Server relational as a valid data source.

I’m excited to share that a new technical whitepaper I co-authored with Chris Webb is published. It’s called Planning a Power BI Enterprise Deployment. It was really a fun experience to write something a bit more formal than blog posts. My interest in Power BI lies in how to successfully deploy it, manage it, and what the end-to-end story is especially from the perspective of integration with other data assets in an organization. Power BI has grown to be a huge, wide set of features so we got a little verbose at just over 100 pages.

It is a beefy whitepaper. I haven’t had a chance to read it yet, but it’s now on my list.

In rxInstallPackages function use computeContext parameter to set either to “Local” or to your “SqlServer” environment, you can also use scope as shared or private (difference is, if you install package as shared it can be used by different users across different databases, respectively for private). You can also specify owner if you are running this command out of db_owner role.

This updated installation method is certainly easier than the prior method, which included incantations and sacrificing a chicken.

Now, whether you go by the old standard or the new one, you do have to support the possibility that someone will use all the characters allowed. Which means you have to use 254 or 320 characters. But what I’ve seen people do is not bother researching the standard at all, and just assume that they need to support 1,000 characters, 4,000 characters, or even beyond.

So let’s take a look at what happens when we have tables with an e-mail address column of varying size, but storing the exact same data:

This is a good argument against automatically using VARCHAR(8000) (much less MAX) when creating columns.

I had a recent project to enable backup encryption on all our servers. Then question from the storage team came up will this required additional space. Well by then I had already enabled in all our test servers so I wrote a query that would compare the average size of backups before encryption to after encryption. Keep in mind we do keep only two weeks of history in our backup tables so this is a fair comparison. If you don’t have maintenance tasks to clean up your backup history then you should have backup_start_time to the where clauses to get more accurate numbers and setup a maintenance tasks to keep your msdb backup history in check.

Unfortunately, Tracy leaves us in suspense regarding whether they did increase.

If you’ve been using DBCC CLONEDATABASE at all, you might have run into a cannot insert duplicate key error (or something similar) when trying to clone a database:

Database cloning for ‘YourDatabase’ has started with target as ‘COPY_YourDatabase’.Msg 2601, Level 14, State 1, Line 1Cannot insert duplicate key row in object ‘sys.sysschobjs’ with unique index ‘clst’. The duplicate key value is (1977058079).

Most of us understand that clustered columnstore index can typically provide 10x data compression and can speed up query performance up to 100x. While this sounds all so good, the question is how do I know which tables in my database could potentially benefit from CCI? For a traditional DW scenario with star schema, the FACT table is an obvious choice to consider. However, many workloads including DW have grown organically and it is not trivial to identify tables that could benefit from CCI. So the question is how can I quickly identify a subset of tables suitable for CCI in my workload?

Interestingly, the answer lies in leveraging the DMVs that collect data access patterns in each of the tables. The following DMV query provides a first order approximation to identify list of tables suitable for CCI. It queries the HEAP or the rowstore Clustered index using DMV sys.dm_db_index_operational_stats to identify the access pattern on the base rowstore table to identify tables that meet the criteria listed in the comments below:

It was at this point that I realized just how many candidates there are for “most useless” things lying around the product. So I decided to create my own tweet. I asked for the most useless feature, anytime between version 7.0 (which I would call the beginning of SQL Server’s “modern era”) and now. I received quite a few suggestions, and so I have decided to catalog them here—along with a bit of personal commentary on each one.

The list that follows is mostly unordered and culled straight from what I received on Twitter. Hopefully I haven’t missed anything due to Twitter’s weird threading and “priority” mechanisms. And please let me know in the comments if your favorite useless feature is missing, or you’d like to add a comment and/or argument about one of these. Perhaps we can find some way to turn these dark and ugly corners into things of beauty? Well, we shall see…

I almost completely agree with Adam’s opinions on this long list. I’d emphasize, though, that In-Memory OLTP is by no means useless.