Yesterday I wrote article about SQL SERVER – Introduction to Aggregate Functions. I received one email that four of the aggregate functions are statistical function and I should write something about that. VAR, STDEVP, STDEV, VARP are statistical functions as well they absolutely fit in the definition of aggregate function as well. The usage of this function is pretty simple so instead of explaining them I will go to example right away.USE AdventureWorks;GOSELECT VAR(Bonus) 'Variance',STDEVP(Bonus) 'Standard Deviation',STDEV(Bonus) 'Standard Deviation',VARP(Bonus) 'Variance for the Population'FROM Sales.SalesPerson;GO All the functions returns result as datatype float. VAR and VARP can only be applied to numeric well all other can be applied to all numeric data type except INT datatypes.

I just wanted to add the difference between the versions that aren’t “Population” and those that are. The ones that aren’t “Population” (Such as STDev and Var) are called “Sample” functions are estimates of the population based on the sample.

Long story short, Standard deviation and variance are different ways of representing the same thing (variance/deviation – google it for more info). But trying to work out how much the data varies based on only a few pieces of data is difficult, so an error factor is added in that helps out. This error factor gets smaller and smaller with more data so that with very large data sets the population and sample functions will return almost identical results.

When to use which one? Rule of thumb is you rarely or never use the population one because you almost never have, know or able to measure every measurement from all possible measurements.

Let’s say you are mearsuring the height of children in a class. They are “samples” of children of their age/type and therefore use the sample functions. You shouldn’t use the population function unless you can measure and have a record of all children of ages that could be in your class across all time.

In fact, very few times will you ever have the entire population of data so very few times will use use the population functions, and if you did have all the data chances are the sample functions would return the same results.

As for Mode and other statistical functions, I offer an sql server SQL Math library that provides those abilities. sqladmintools.com

Pinal Dave is a SQL Server Performance Tuning Expert and an independent consultant. He has authored 11 SQL Server database books, 21 Pluralsight courses and has written over 4000 articles on the database technology on his blog at a https://blog.sqlauthority.com. Along with 16+ years of hands on experience he holds a Masters of Science degree and a number of database certifications.