First, we identify the user and ‘current’ song to start with (red line)

Next, we identify the other users who have also listened to this song (green line)

Then we find the other songs which those other users have also listened to (blue, dotted line)

Finally, we direct the current user to the top songs from those other songs, prioritized by the number of times they were listened to (this is represented by the thick violet line.)

The algorithm above is quite simple, but as you will see it is quite effective in meeting our requirement. Now, let’s see how to actually implement this in SQL Server 2017.

Click through for animated images as well as an actual execution plan and recommendations for graph query optimization (spoilers: columnstore all the things). They also link to the GitHub project where you can try it out yourself.

Related Posts

John Mount explains the vtreat package that he and Nina Zumel have put together: When attempting predictive modeling with real-world data you quicklyrun into difficulties beyond what is typically emphasized in machine learning coursework: Missing, invalid, or out of range values. Categorical variables with large sets of possible levels. Novel categorical levels discovered during test, cross-validation, or […]

Kenneth Fisher explains a couple of database name functions in SQL Server: I’d never seen ORIGINAL_DB_NAME until recently and I thought it would be interesting to highlight it out, and in particular the difference between it and DB_NAME. I use DB_NAME and DB_ID fairly frequently in support queries (for example what database context is a query running from or what database are […]