July 24, 2017

IBM holds a monthly math/programming challenge called “Ponder This”.This month’s problem (July 2017) is an interesting one because it can (mostly) be solved using SQL and highschool math:

“Find a way to assign a unique digit to each letter such that the 863,172nd digit after the decimal point of sin(PONDER) and of sin(THIS) are not zeros, but more than a fifth of the first million digits of sin(PONDER)*sin(THIS) are zeros.

“The parameters are in degrees, and we count the digits from 1, so for example, sin(100) = 0.9848077530122… so the fifth digit after the decimal point of sin(100) is zero.”

If you’re up to the challenge, use any technique you like, but you’ve only got a week. Next week (August 1st) after IBM publishes their solution, I’ll publish the SQL based solution that I used.

Update: My Solution

First I realized that PONDER and THIS represent integers greater than 360 degrees. That’s okay. All I need to do is find two numbers A and B between 1 and 360 with the desired properties and then take that solution and add 360 to either number until I get a six digit number and a four digit number with no shared digits.

So my guess is that SIN(A) * SIN(B) is a regular number.
I can look for that using SQL:

Where HasDuplicate is a CLR function I wrote that returns whether a number has repeat digits.
This gives me a few solutions, for example PONDER = 102978 and THIS = 3654.

I get a kick out of these puzzles and as an added bonus, I get to exercise my SQL skills.

Update 2

Stephen pointed out in the comments that I didn’t even check to see whether the 863,172nd digit of SIN(PONDER) or SIN(THIS) is nonzero. How did I do that? Well, I didn’t use SQL. I used google. Searching for “What’s the exact value of SIN(18)” gives me this page Exact Value of sin 18°. The internet’s amazing. So I trust that the exact value is sin(18) = ( -1 + sqrt(5) ) / 4 which has the same 863,172nd digit as sqrt(5) / 4. The same reasoning works for SIN(54) because sin(54) = ( 1 + sqrt(5) ) / 4.

Rather than figure out how to calculate the 863,172nd digit of sqrt(5) / 4, I searched for and found the first million digits of sqrt(5). The internet’s amazing. I know that when you divide a number by four, any digit in that number usually has a limited scope of influence on the digits in the answer. I divided the digits around the 863172 place by 4 to convince myself that the 863172nd digit of sqrt(5) / 4 is not 0.

In 2015, Postgres added support for ON CONFLICT DO UPDATE to their INSERT statements. Their docs say that using this syntax guarantees an atomic INSERT or UPDATE outcome; one of those two outcomes is guaranteed, even under high concurrency.

Now, that is one amazing feature isn’t it? Postgres worked really hard to include that feature and kudos to them. I would trade ten features like stretch db for one feature like this. It makes a very common use case simple. And it takes away the awkwardness from the developer.

But if you work with SQL Server, the awkwardness remains and you have to take care of doing UPSERT correctly under high concurrency.

I wrote a post in 2011 called Mythbusting: Concurrent Update/Insert Solutions. But since then, I learned new things, and people have suggested new UPSERT methods. I wanted to bring all those ideas together on one page. Something definitive that I can link to in the future.

The Setup

I’m going consider multiple definitions of a procedure called s_AccountDetails_Upsert. The procedure will attempt to perform an UPDATE or INSERT to this table:

I have no concurrency concerns here but because of this issue – not to mention performance concerns – it’s no longer my first preference.

Performance and Other Concerns

When describing each pattern so far, I haven’t paid attention to performance, just concurrency. There are many opportunities to improve the performance of these solutions. In particular, pay very close attention to your average use case. If your workload UPDATEs a row 99% of the time and INSERTs a row only 1% of the time, then optimal implementation will look different than if the UPDATE and INSERT frequency is reversed.

July 16, 2017

Today marks my tenth anniversary working for D2L and it also marks ten years since I specialized in databases full time.
I’ve never worked anywhere this long. I always told myself that I’d stick around as long as there were new and interesting challenges to tackle and there’s no end in sight.

So I thought I’d compile a list of things I didn’t know in 2007.

1. Database Performance Improvements Are Often Dramatic

A poorly performing database query can run orders of magnitude slower than a tuned query. Compare that to poorly performing application code. Improvements there are usually more modest.

But this can work out. It makes you look like a genius. You get to brag how that report that once took 32 hours now takes 10 minutes. Even though that missing index suggestion in the query plan turned out to be right this time, take the credit, you deserve it.

3. People Love Pictures With Their Content

I’ve been writing here for nearly ten years. One day, I decided to include an illustration with each blog post. For some reason it makes each post just a little more compelling than a wall of text. Drawing has been a very rewarding hobby for me. I highly recommend it.

4. There Are Dozens of Walkable Lunch Options In Downtown Kitchener

Here are a few of my favorites

Kava Bean: Excellent all day breakfast

City Cafe: Awesome bagels in the morning.

Darlise: You have to hunt for it, but it’s great food

Holy Guacamole: Especially if you’re a fan of cilantro

If you’re in town, send me an email. I’ll introduce you to one of these places.

5. Don’t Overestimate Normal Forms

Knowing normal forms is a lot less useful than I thought it would be. Normal forms are an academic way of describing a data model and it’s ability to avoid redundant data. But knowing the difference between the different normal forms has never helped me avoid trouble.

Just use this rule of thumb: Avoid storing values in more than one place. That will get you most of the way. Do it right and you’ll never hear anyone say things like

We need a trigger to keep these values in sync

We need to write a script to correct bad data

6. “How Do You Know That?”

This is one of my favorite lines. I use it when troubleshooting a technical problem. It’s a line that helps me avoid red-herrings or to maybe find out if that info is coming via telephone game. It’s kind of my version of Missouri’s “you’ve got to show me”. It might go something like this:

Say a colleague says “I/O is slow lately.”

So I’d ask “How do you know?”

They might say something like:

“Hinky McUnreliable thought that’s what it was. He told me in the hallway just now”

7. Reduce Query Frequency

The impact of a particular procedure or query on a server is simply the frequency of a query multiplied by the cost of that query. To reduce the impact on the server it’s often more valuable to focus on reducing the number of calls to that query in addition to tuning it.

Look for duplicate queries. You’d be surprised how often a web page asks for the same information.

Cache the results of queries that are cache-able. I rely on this one heavily.

8. Schema Drift Is Real

If you have multiple instances of a database then schema will drift over time. In other words, the definition of objects like tables, views and procedures will be out of alignment with what’s expected. The drift is inevitable unless you have an automated process to check for misalignment.

We wrote our own but tools like Redgate’s Schema Compare can help with this kind of effort.

9. Get Ahead Of Trouble

Software will change over time and it requires deliberate attention and maintenance to remain stable. Watch production and look for expensive queries before they impact end users.

When I type it out like that, it seems obvious. But the question really then becomes, how much time do you want to invest in preventative maintenance? My advice would be more.

Not enough time means that your team gets really good practice firefighting urgent problems. I work directly with a dream team of troubleshooters. They’re smart, they’re cool during a crisis and they’re awesome at what they do. Fixing issues can be thrilling at times, but it pulls us away from other projects.

And what do they say about people who can’t remember the past? Sometimes we have to relearn this lesson every few years.

10. D2L Is A Great Employer

D2L values learning of course and that includes employee professional development. They have always supported my blogging and speaking activities and they recognize the benefits of it.

The people I work with are great. I couldn’t ask for a better team.

And we’ve got a decent mission. I’m really glad to be part of a team that works so hard to transform the way the world learns. Thanks D2L!

July 6, 2017

A .Net tick is a duration of time lasting 0.1 microseconds. When you look at the Tick property of DateTime, you’ll see that it represents the number of ticks since January 1st 0001.
But why 0.1 microseconds? According to stackoverflow user CodesInChaos “ticks are simply the smallest power-of-ten that doesn’t cause an Int64 to overflow when representing the year 9999”.

Even though it’s an interesting idea, just use one of the datetime data types, that’s what they’re there for. I avoid ticks whenever I can.

But sometimes things are out of your control. If you do want to convert between datetime2 and ticks inside SQL Server instead of letting the application handle it, then you need to do some date math. It can get a bit tricky avoiding arithmetic overflows and keeping things accurate: