Your manager is happy because you’re getting historical data to him quickly. Your DBA is happy because she doesn’t have to clean up any performance killing triggers that replicate a temporal table’s functionality. Everything with temporal tables has made your life better.

Except that time when you accidentally inserted some bad data into your temporal table.

Whoops

The good news is that all of your data is still intact — it’s been copied over to the historical table. Phew!

Now all you need to do is rollback this inadvertent row insertion and make your tables look just like you did before you started breaking them.

This should be easy right?

Well not exactly — there’s no automatic way to roll back the data in a temporal table. However, that doesn’t mean we can’t write some clever queries to accomplish the same thing.

Let’s make some data

Don’t mind the details of this next query too much. It uses some non-standard techniques to fake the data into a temporal/historical table with “realistic” timestamps:

You see those two rows in the top temporal table? Those are the ones I just added accidentally. I actually had a bug in my code *ahem* and all of the data inserted after 2017–05–18 is erroneous.

The bug has been fixed, but we want to clean up the incorrect entries and roll back the data in our temporal tables to how it looked on 2017–05–18. Basically, we want the following two rows to appear in our “current” temporal table and the historical table to be cleaned up of any rows inserted after 2017–05–18:

Fortunately, we can query our temporal table using FOR SYSTEM_TIME AS OF to get the two rows highlighted above pretty easily. Let’s do that and insert into a temp table called ##Rollback:

DROP TABLE IF EXISTS ##Rollback
SELECT
*
INTO ##Rollback
FROM
dbo.CarInventory
FOR SYSTEM_TIME AS OF '2017-05-18'

-- Update the SysEndTime to the max value because that's what it's always set to in the temporal table
UPDATE ##Rollback SET SysEndTime = '9999-12-31 23:59:59.9999999'

You’ll notice we also updated the SysEndTime — that’s because a temporal table always has its AS ROW END column set to the max datetime value.

Looking at ##Rollback, we have the data we want to insert into our temporal table:

This is the data we want!

Now, it’d be nice if we could just insert the data from #Rollback straight into our temporal table, but that would get tracked by the temporal table!

So instead, we need to turn off system versioning, allow identity inserts, delete our existing data, and insert from ##Rollback. Basically:

ALTER TABLE dbo.CarInventory SET ( SYSTEM_VERSIONING = OFF);

SET IDENTITY_INSERT dbo.CarInventory ON;

DELETE FROM dbo.CarInventory WHERE CarId IN (SELECT DISTINCT CarId FROM ##Rollback)

Now we see our temporal table at work: we updated the rows in dbo.CarInventory and our historical table was automatically updated with our original values as well as timestamps for how long those rows existed in our table.

It’s totally possible for someone to have driven 73 or 488 miles in a Chevy Malibu in under 4 minutes…ever hear the phrase “drive it like a rental”?

Our temporal table show the current state of our rental cars: the customers have returned the cars back to our lot and each car has accumulated some mileage.

Our historical table meanwhile got a copy of the rows from our temporal table right before our last UPDATE statement. It’s automatically keeping track of all of this history for us!

Continuing on, business is going well at the car rental agency. We get another customer to rent our silver Malibu:

UPDATE dbo.CarInventory SET InLot = 0 WHERE CarId = 2

Unfortunately, our second customer gets into a crash and destroys our car:

DELETE FROM dbo.CarInventory WHERE CarId = 2

The customer walked away from the crash unscathed; the same can not be said for our profits.

With the deletion of our silver Malibu, our test data is complete.

Now that we have all of this great historically tracked data, how can we query it?

If we want to reminisce about better times when both cars were damage free and we were making money, we can write a query using SYSTEM_TIME AS OF to show us what our table looked like at that point in the past:

SELECT
*
FROM
dbo.CarInventory
FOR SYSTEM_TIME AS OF '2017-05-18 23:49:50'

The good old days.

And if we want to do some more detailed analysis, like what rows have been deleted, we can query both temporal and historical tables normally as well:

-- Find the CarIds of cars that have been wrecked and deleted
SELECT DISTINCT
h.CarId AS DeletedCarId
FROM
dbo.CarInventory t
RIGHT JOIN dbo.CarInventoryHistory h
ON t.CarId = h.CarId
WHERE
t.CarId IS NULL

C̶o̶l̶l̶i̶s̶i̶o̶n̶ Conclusion

Even with my car rental business not working out, at least we were able to see how SQL Server’s temporal tables helped us keep track of our car inventory data.

I hope you got as excited as I did the first time I saw temporal tables in action, especially when it comes to querying with FOR SYSTEM_TIME AS OF. Long gone are the days of needing complicated queries to rebuild data for a certain point in time.

Starting with the 2016 release, SQL Server offers native JSON support. Although the implementation is not perfect, Iamstillahugefan.

Even if a new feature like JSON support is awesome, I am only likely to use it if it is practical and performs better than the alternatives.

Today I want to pit JSON against XML and see which is the better format to use in SQL Server.

Enter XML, SQL’s Bad Hombre

Full disclosure: I don’t love XML and I also don’t love SQL Server’s implementation of it.

XML is too wordy (lots of characters wasted on closing tags), it has elements AND attributes (I don’t like having to program for two different scenarios), and depending on what language you are programming in, sometimes you need schema files and sometimes you don’t.

SQL Server’s implementation of XML does have some nice features like a dedicated datatype that reduces storage space and validates syntax, but I find the querying of XML to be clumsy.

All XML grievances aside, I am still willing to use XML if it outperforms JSON. So let’s run some test queries!

Is JSON SQL Server’s New Sheriff in Town?

Although performance is the final decider in these comparison tests, I think JSON has a head start over XML purely in terms of usability. SQL Server’s JSON function signatures are easier to remember and cleaner to write on screen.

Data Size

So XML should be larger right? It’s got all of those repetitive closing tags?

SELECT
DATALENGTH(XmlData)/1024.0/1024.0 AS XmlMB,
DATALENGTH(JsonData)/1024.0/1024.0 AS JsonMB
FROM
dbo.XmlVsJson

Turns out the XML is actually smaller! How can this be? This is the magic behind the SQL Server XML datatype. SQL doesn’t store XML as a giant string; it stores only the XML InfoSet, leading to a reduction in space.

The JSON on the other hand is stored as regular old nvarchar(max) so its full string contents are written to disk. XML wins in this case.

INSERT Performance

So XML is physically storing less data when using the XML data type than JSON in the nvarchar(max) data type, does that mean it will insert faster as well? Here’s our query that tries to insert 100 duplicates of the row from our first query:

SET STATISTICS TIME ON

INSERT INTO dbo.XmlVsJson (XmlData)
SELECT XmlData FROM dbo.XmlVsJson
CROSS APPLY
(
SELECT DISTINCT number
FROM master..spt_values
WHERE number BETWEEN 1 AND 100
)t WHERE Id = 1
GO

INSERT INTO dbo.XmlVsJson (JsonData)
SELECT JsonData FROM dbo.XmlVsJson
CROSS APPLY
(
SELECT DISTINCT number
FROM master..spt_values
WHERE number BETWEEN 1 AND 100
)t WHERE Id = 1
GO

And the results? Inserting the 100 XML rows took 613ms on my machine, while inserting the 100 JSON rows took 1305ms…XML wins again!

JSON ain’t looking too hot. Wait for it…

I’m guessing since the XML data type physically stores less data, it makes sense that it would also write it out to the table faster as well.

If you look at the execution plans for these last two queries, it’s easy to see that XML has a lot more to do behind the scenes to retrieve the data:

XML:

JSON:

Create

We saw above that inserting rows of XML data is faster than inserting rows of JSON, but what if we want to insert new data into the object strings themselves? Here I want to insert the property “mileage” into the first car object:

JSON doesn’t take any time to reload and wins against XML again 50ms to 159ms.

Read Part 2: Indexes

So above we saw that JSON was faster than XML at reading fragments and properties from a single row of serialized data. But our SQL Server’s probably have LOTS of rows of data — how well does indexed data parsing do in our match up?

First let’s expand our data — instead of storing all of our car objects in a single field, let’s build a new table that has each car on its own row:

XML is able to filter out 96 rows in 200ms and JSON accomplishes the same in 9ms. A final win for JSON.

Conclusion

If you need to store and manipulate serialized string data in SQL Server, there’s no question: JSON is the format of choice. Although JSON’s storage size is a little larger than its XML predecessor, SQL Server’s JSON functions outperform XML in speed in nearly all cases.

Is there enough performance difference to rewrite all of your old XML code to JSON? Probably not, but every case is different.

One thing is clear: new development should consider taking advantage of SQL Server’s new JSON functions.