Let me just say, this thing is going to be great. It’s the biggest advancement in the platform for some years, and I don’t think any competitor will come close.

However, it is a preview feature, and it does have some issues. Hopefully some of these at least will get looked at before it goes GA.

Supported Syntax

I expected simple SELECT col1, ..., coln FROM table1 INNER JOIN table2 ON ... to work, but I didn’t expect to be able to use PIVOT, APPLY, sub-queries etc which do all seem to work as expected 😀

In fact, the only query syntax I regularly use that I’ve spotted not being supported yet is CTEs. This means that querying hierarchical data is still easier in FetchXML (for now), but for other ad-hoc queries then SQL seems the obvious way to go for me.

You need aliases!

You can run this query quite happily:

SELECT *
FROM account

This works perfectly too:

SELECT *
FROM account AS a
INNER JOIN
contact AS c
ON a.primarycontactid = c.contactid

Msg 40000, Level 16, State 1, Line 2
The multi-part identifier "account.primarycontactid" could not be bound.
The multi-part identifier "contact.contactid" could not be bound.

It appears that, whenever you use a join, the tables MUST be aliased. Not the end of the world, but in easy gotcha that could confuse new users.

Missing tables & fields

Some data is not accessible through this endpoint. My best guess is that it’s only the data that’s stored in SQL that’s available, as the ones I’ve spotted that are missing are stored elsewhere now, e.g.

annotation.documentbody

activitymimeattachment.body

audit

plugintracelog

Even more confusingly, the documentbody and body fields appear in the results if you run SELECT * FROM annotation or SELECT * FROM activitymimeattachment respectively (although they are always null), but they do not appear in the metadata for the tables.

Ideally this data would be available too, but if not I’d like some consistency between the actual results and the metadata.

OAuth timeouts

I’m not sure if this is a general problem with Active Directory authentication for SQL Server or something specific to this implementation, but once your connection has been open for an hour any more queries will receive the error:

Msg 40000, Level 16, State 1, Line 7
The OAuth token has expired

At this point you need to force it to reconnect (in SSMS, clicking “New Query” appears to be sufficient) and carry on.

Given that this is (presumably) not running direct SQL commands, I’ve been pleasantly surprised by the range of SQL that it implements.

The biggest announcement of MBAS for me by a long way was the new T-SQL endpoint. For those who missed it, go and check out the recordings.

In short, Microsoft have added a (preview) way of querying your CDS data using T-SQL, which anyone familiar with SQL Server will already know. It was originally done to support live querying from Power BI, but because they used the same TDS protocol that SQL Server uses, immediately the entire ecosystem of SQL Server tools is opened up. So you can now use SQL Server Management Studio amongst others to query your CDS / Dynamics 365 data. Cool!

I can’t deny I didn’t feel a bit put-out by this announcement coming only a few days after I’d released the latest version of my own SQL 4 CDS tool, but I’m sure I’ll get over it eventually…

Connecting

Obviously the first thing I did was carefully read the documentation 😜 before moving on to try and connect to my own instances.

The key bit of information you need when connecting is the server name (the same as you see in the address bar of your browser when you’re in your app) and the port number 5558. Put these together with a comma, select the Active Directory - Password authentication option and you should be away.

The result was a bit of a let-down:

TDS protocol endpoint is disabled for this organization

I assumed at this point that this new preview feature just wasn’t available for me yet. A few people suggested creating new instances in the Canada (crm3) region as they seem to get new features earlier, but I got the same error again and eventually gave up for a few days.

Update 2020-05-24: This error message has been updated to give some more help on fixing it now:

Enabling the TDS protocol endpoint

Andrew Bibby then helpfully pointed me to some more documentation on how to enable the new endpoint. Unfortunately, even when I got the correct command line command to run it still didn’t work. Instead it gave me the error:

Error occurred in OrgDBOrgSettings and the error details are GDS resource provider is unable to get instances for tenantId: '<guid>', response status:'NotFound' and reason 'Not Found'

Instead I ended up actually using SQL 4 CDS to update the orgdborgsettings attribute of my organization entity:

select orgdborgsettings, organizationid from organization

I then added <EnableTDSEndpoint>true</EnableTDSEndpoint> at the end of the list of settings and updated it using:

A bit of a mouthful of a title! I had a great time earlier today presenting at my first ever virtual event, D365UG UK. Many thanks to the organising team for a smoothly run event!

My topic today was using a variety of free tools to unearth some data quality issues in your CRM installation that you may be unaware of, and some suggestions on common causes and possible resolutions.

The tools I used were:

Advanced Find

FetchXML Builder

SQL 4 CDS

Data Export Validation Tool

Those last three are all free XrmToolBox tools, so head over to the tool store to install them today.

The slides should be available on the D365UG UK forum shortly, but in the meantime I promised to share the script that you can run in SQL 4 CDS to check many of the common scenarios we looked at. Many of these could also be run in either Advanced Find or FetchXML Builder if you prefer.

If you haven’t come across it before, SQL 4 CDS is an XrmToolBox tool to query and manipulate your CDS data using regular SQL queries, making it accessible to a much wider range of users. You can install it today from the XrmToolBox Tool Library.

It takes quite a lot to jump from a version 1.0.x to a version 2, so what’s new?

Beyond FetchXML

Version 1 translated your SQL query into FetchXML and executed it, which made it easier to run the queries but didn’t give you the full power of SQL. There were still plenty of queries you couldn’t run because there was no FetchXML equivalent.

While still not supporting every bit of T-SQL (that would be near-enough impossible!), this release brings it a lot closer. While it still uses FetchXML as much as possible, it can now also transform those results to implement more SQL queries than before.

For example, you can now run queries that compare two field values, execute common functions such as DATEDIFF and DATEADD, use a HAVING clause, sort data with linked tables without having to worry about the order of your joins and much more.

Some example queries that you can run now that you can’t do with FetchXML alone:

Find accounts with more than 10 contacts

This query would not normally be possible as FetchXML does not have an equivalent for the HAVING clause, but SQL 4 CDS 2 will handle it:

SELECT a.name
FROM account AS a
INNER JOIN
contact AS c
ON a.accountid = c.parentcustomerid
GROUP BY a.name
HAVING count(*) > 10

Find accounts that haven’t been modified since being created

FetchXML doesn’t have an option for comparing the values of two fields. SQL 4 CDS 2 handles this again, but watch out when running this on large data sets as it’s got to retrieve all the accounts in order to filter them itself.

SELECT name
FROM account
WHERE createdon = modifiedon

Find contacts created over a year after their account

A variation on the one above, you can now use more complex expressions such as selected functions, mathematical expressions etc. as part of your query too.

Increment the score for all contacts called Mark

These same improvements can also be used as part of assignments in UPDATE statements, e.g.:

UPDATE contact
SET cdi_score = cdi_score + 1
WHERE firstname = 'Mark'

Sort by related tables

FetchXML imposes a restriction on sort orders. They have to be applied at the top-level entity first and then work down through joined tables. You can now apply sorts in whatever order you like:

SELECT a.name,
c.fullname
FROM account AS a
INNER JOIN
contact AS c
ON a.accountid = c.parentcustomerid
ORDER BY
c.fullname,
a.name

Notifications

If you run a query that can’t be executed fully as FetchXML, you’ll see a warning like this:

“Extra processing required” notification

Be sure to not simply copy & paste this FetchXML into your own code. Running this FetchXML directly will give you all the information you need to calculate the results, but you will need to do some post-processing on the results.

Intellisense

This release also includes Intellisense-style suggestions as you type. There are a few caveats to be aware of – as you type a SELECT statement, the tool won’t offer suggestions until you start entering the FROM clause, so if you want suggestions on the SELECT clause itself then just enter SELECT * FROM, enter the FROM clause and then go back.

The metadata used to drive the suggestions is loaded in the background, so you may not get suggestions immediately. Give it a little time after entering the FROM clause and you should start seeing suggestions appear.

I’ve tried to get the suggestions to appear when they feel natural to me. I’m interested in any feedback on this feature and how you’d like to see it work.

Managing many-to-many relationships

You can already use INSERT, UPDATE and DELETE to edit your data, but this release improves on that to cover the intersect entities used in many-to-many relationships. For example, you can use this query to add all your contacts called Mark to a specific marketing list:

I really have no idea what this is trying to say – if you can enlighten me, please do! Given that all the joins it does are based on the attributes being equal, and therefore the two values must be identical, I can’t see what this is trying to describe.

As I can’t tell from the docs what it is trying to do, I’ve tried to work it out by running some queries and having a look at the generated SQL (thank you on-premise!) and the results.

My conclusion: natural is a synonym for inner.

If you substitute natural for inner, you will get exactly the same results, as it generates exactly the same SQL.

Many to many relationships

One thing I did wonder based on the docs for the natural join type was whether it would try to imply the join attributes by name, i.e. you could miss out the from and to attributes on the link-entity if they were both the same and it would imply them from the metadata. This failed, but it did produce this intriguing error message:

No system many-to-many relationship exists between account and contact. If attempting to link through a custom many-to-many relationship ensure that you provide the from and to attributes.

So maybe CDS doesn’t support all 10 join types yet, but it does appear to support some we didn’t have before:

In

MatchFirstRowUsingCrossApply

Exists

I also tried the same thing on an on-premise installation, and that does only support the original 3: inner, outer and natural, so this is online-only for now at least.

So, in the absence of any documentation on these, or the ability to see what SQL gets generated for them, what are we to do? Well, these all seem to have directly equivalent SQL keywords, so starting from a simple query:

MatchFirstRowUsingCrossApply

Exists

SELECT name
FROM account
WHERE EXISTS (
SELECT *
FROM contact
WHERE firstname = 'Mark' AND
parentcustomerid = account.accountid
)

This is just conjecture at this point, but if correct it would mean:

Each of the new join types would only produce one row per account that has a contact called Mark, regardless of whether there was one such contact or 100. The standard inner join type however would produce one row for each contact

The in and exists versions would not be able to include any attributes from the linked entity (lastname in this example), as it is only referenced from the WHERE clause, but the inner and matchfirstrowusingcrossapply would be able to

So, let’s give it a go!

I’ve created two accounts, one with no contacts called Mark and one with two:

In

Executing the exact same query but replacing inner with in gave the error:

An error occured: In doesn't support attribute inside linkentity expression.

Removing the <attribute> from inside the <link-entity> gave the expected result:

name

Join Operator 2

So far, so good!

Exists

Going through the same process with exists, again we get the same error about not being able to use an <attribute> within the <link-entity>. Removing that gives us the same results again!

MatchFirstRowUsingCrossApply

Trying again with matchfirstrowusingcrossapply, and it almost works as I expected. I can run the query with the <attribute name="lastname" /> element inside the <link-entity> without an error, but I don’t actually get that column back in the result set – I get the same results again as I did with in and exists!

Given how close the other predictions were, I’m going to chalk this down overall as a success, with the missing column in the results possibly being due to a CDS bug that will hopefully be fixed at some point. As always though, I’d be very happy to get any firm information on it one way or another!

Why?

So now we have a reasonable idea of what these new join operators are doing, why might we want to use them? We’ve managed quite happily with inner and outer until now, thank you very much.

Uniqueness

The main thing that these operators give us that we couldn’t do before is removing the duplication problem. We saw earlier that we can use distinct to remove duplicate rows in query results that were introduced by joining to multiple child records, but this also removed the automatic primary key from the results. Because these operators do not produce duplicate rows regardless of the number of child records, there is no need to use distinct. This was never that much of a problem though, as we could always include the primary key column explicitly anyway. So what else is there?

Single Child Entity

The matchfirstrowusingcrossapply option looks the most powerful, although as I haven’t managed to actually get any attributes out of the linked entity I can’t realise that yet. It would be most powerful if we could combine that with a sort order, so you could get a list of accounts along with the “best” contact for each account:

That looks like it could give us a unique list of accounts, along with the details of the top contact in the account according to the ClickDimensions score. Unfortunately, running that gives us an error:

Performance

So what else? How about performance? If we compare the four simple examples we looked at before, we get three quite different query plans:

Comparative performance of different join operators

Even when adding in a DISTINCT option to the simple inner join type, this takes effectively no time at all to execute. All three of the other new options take significantly longer, with the matchfirstrowusingcrossapply option taking the vast majority of the time. So there’s no quick performance win here by just turning one of these options on blindly.

My best guess then is that these options have been added following some performance analysis of other, more complex queries that Microsoft have seen from their telemetry. This could possibly be for their 1st party Dynamics 365 apps, AI or other applications that have a particular need for more generating specific queries. This would probably explain the lack of documentation as much as anything else.

As there are still some other join types listed in the documentation which the server doesn’t appear to support yet, it could also be the case of the documentation getting ahead of the code and we’ve got some more exciting platform extensions coming in a future release!

The intent here is to get a list of accounts that have a contact called Mark. However, if an account has two Marks, that account will be in the resulting list twice. Adding the distinct attribute makes sure the user gets what they were probably expecting, which is a list of unique accounts.

Unique Identifiers

One other interesting side effect of using distinct is that you no longer get the primary key of the entity back automatically. If you run the query:

As well as getting the account name back, you’ll also automatically get the accountid as well.

When you apply distinct, this goes away. This makes sense if you think about it. Imagine you have two accounts with the same name. If the accountid was automatically included, that would make the records unique. You would still get two rows back from your query, regardless of whether or not you used distinct.

If the purpose of using distinct is to eliminate duplicate rows introduced by a join onto a child entity as shown above though, you probably do want to get a row back for each of your top-level entity type. In that case you will need to explicitly include the primary key attribute in your query, e.g.

Truncation

One “gotcha” to be aware of. Any text values will be truncated to 2,000 characters before the distinct operation is applied. If you have longer values, e.g. descriptions or the contents of attachments in the annotation entity, any extra data after the first 2,000 characters will be lost.

Quite why this would be is a bit of a mystery. My best guess is that these fields used to be stored in a SQL Server ntext field which couldn’t be used in a SELECT DISTINCT query, so to work around this problem Dynamics CRM would cast these values to an nvarchar(2000) value which could be used instead. You can see this in the generated SQL:

select
DISTINCT top 10 cast( "annotation0".DocumentBody as varchar(2000) ) as "documentbody"
from
AnnotationBase as "annotation0"

As these columns are now stored using an nvarchar(max) field which can be used directly in a SELECT DISTINCT query I guess this behaviour is still in place for backwards compatibility reasons, although it does give about a 35% performance improvement:

SQL Equivalent

Apart from this once caveat of long text fields, this is directly equivalent to the SQL DISTINCT keyword.

When you use top to limit the number of records, you don’t have the option to move on to subsequent pages. You can’t use this to increase the number of records beyond 5,000 – you can only use a number between 1 and 5,000.

In contrast, count and page allow you to move through multiple pages of results and control the size of each page:

If you are only ever interested in the first n records (e.g. you just want the first record that matches your query), use top as you have no need to retrieve multiple result pages.

On the other hand, if you do need to retrieve all the results (or at least, more than “just” the first 5,000), use page to indicate which number page you want. In addition you can optionally use count to control the size of each page – you might set this after some trial-and-error testing of the performance of your query.

SQL Equivalents

In SQL terms, top is equivalent to the SELECT TOP (n) clause. count and page are similar to the OFFSET ... FETCH ... clause.

If you’re using any hierarchies in your data, you’ve probably noticed the icon in your main grid views that you can click on to show the hierarchy view when records have a parent or children. It’s simple to work out if a record has a parent – just check if the parent lookup attribute contains data. But have you ever stopped to think about how it works out if a record has got any children?

Hierarchical records highlighted in grid view

It would be horribly inefficient to execute a separate query for each row to get the number of children. Instead, it will use a FetchXML feature called rowaggregate.

You don’t have any control within your query of what is meant by “children” in this context. This is defined by the hierarchy configuration of the entity being queried.

Hierarchical relationship setting

You can only apply the rowaggregate option to an attribute that is the primary key in a hierarchical relationship (the “Related (One)” side in the screenshot above). You’ll receive an error if you try to apply it to any other attribute, e.g. if you try to apply it on the telephone1 attribute:

I’ve just released SQL 4 CDS 1.0.9. Please update it when XrmToolBox prompts you to take advantage of these latest features:

Open Source!

I’m very pleased to be able to release SQL 4 CDS as open source, thanks to the support of Data8. You can now grab all the code on GitHub, submit your bug reports or suggestions as issues or even open a pull request with your own changes!

XrmToolBox 1.2020.2.36

An error occured when trying to display this tool: Method not found: 'McTools.Xrm.Connection.ConnectionDetail ConnectionUpdatedEventArgs.get_ConnectionDetail()'.

after installing the latest version of XrmToolBox, please update SQL 4 CDS to 1.0.9 as well to fix it.

Cancel running queries

If you need to stop a query while it’s running, just click the 🟥 button in the toolbar to stop it.

Improved results display

The grid view showing the results of a query is improved with:

total record count at the bottom

record numbers on the left

right-click options to copy data with or without headers

click on lookup guid values to open the related record or quickly create a SELECT query to get the details of the record

columns in a SELECT * query are sorted by name to make it easier to find what you’re looking for

option to show lookup values as the name of the related record instead of the GUID

option to show times as your local time zone instead of UTC

Improved query support

You will now get the expected results when running queries that combine * and a field name in the SELECT clause, e.g. SELECT name, * FROM account, and when ordering a SELECT * query, e.g. SELECT * FROM account ORDER BY name

Double quotes

By default, SQL treats double quotes as enclosing the name of a table or column, and single quotes for strings, so you can’t write:

SELECT name
FROM account
WHERE telephone1 = "01513554555"

as the account entity doesn’t contain a field called 01513554555

You can now switch this behaviour so double quotes and single quotes are both used to enclose strings by unticking the new “Quoted Identifiers” option in the settings window.

SQL

This is a common mistake I see, and one you can make easily if you try to build this query in Advanced Find. When the query is run, it tries to find invoices that have an invoice item that has a product that has the product number “A” and “B”. Since it’s impossible for a product to have two different product numbers in the same field, you’ll never get any results for this query.

The key change we need to make is to have two separate links to the invoicedetail entity, one filtered for product A and the other for product B:

With this change the query will now find invoices that have an invoice item that has a product that has the product number “A”, and also an invoice item that has a product that has the product number “B”. It’s more long-winded to write but gets us to the right answer.

Unfortunately it’s not possible to build this query in Advanced Find, so if you need to use a query like this in your app you’ll need to get the query ready in FetchXML Builder and save it from there as a view or marketing list (but see the workaround at the end).

Combining with “Not In” Queries

Taking another example:

Visits that have a page view of the checkout page BUT NOT the “thank you” page?

If you’re using ClickDimensions you get a lot of powerful web tracking information recorded in custom entities. Two of these are Page View, which records an individual web page that someone has viewed, and Visit, which represents a session that can cover multiple pages. Each Page View is linked to a Visit. We can find abandoned shopping carts by querying these to find sessions which include a visit to the shopping cart page but not to the page that it shown after a purchase is complete. This makes use of the “not in” pattern of using a left outer join and null query.

SQL

SELECT *
FROM account AS a
INNER JOIN
contact AS info
ON a.accountid = info.parentcustomerid
AND info.emailaddress1 LIKE 'info@%'
LEFT OUTER JOIN
contact AS noninfo
ON a.accountid = noninfo.parentcustomerid
AND noninfo.emailaddress1 NOT LIKE 'info@%'
WHERE noninfo.contactid IS NULL;

Advanced Find

I mentioned earlier that you couldn’t build these queries in Advanced Find, and that’s mostly true. If you try to create one of these queries exactly as I’ve shown you’ll get an error when you try to add the second instance of the related entity:

The relationship you are adding already exists in the query.

But with a bit of creativity we can switch the query around to something that Advanced Find can handle, although as a database person it’s not as “nice”.

Instead of having our main entity listed only once, we can join back to it from the related entity. Instead of joining from invoice to invoicedetail twice, we join from invoice to invoicedetail, back to invoice and on to invoicedetail again:

Finding invoices with two products – attempt 2. Longer but gets the right answer!

This works fine for this case, but you may not get the results you want if you try this in combination with a “not in” type query. That would also apply to all the other link entities you chain off it, and could have the effect of changing an “in” query to a “not in”.