What is the Law of Small Numbers? Meaning, Examples & More.

Often we don’t realize but the game of investment can be closely connected with hardcore psychology. You must have heard many people say: research before you invest.

While that is absolutely true, you must also know what to research for. While being on a searching spree, generally, you find many online (or offline?!) resources which tend to favor a kind more than the other. Enter, The Law of Small Numbers.

A data has multiple dimensions and each one of them is responsible for inferring the data in a different context. Statisticians tend to choose the attributes (read dimensions) very carefully. The choice of attributes is not random, of course. A couple of data mining techniques such as “Decision Tree Induction” can carry out specific attribute selection methods to take exact inference.

Coming back to the real context, “The Law of Small Numbers” is actually a law confirming fallacy. It says that the length of data is an important consideration for a data as the probability of it being relatable is directly correlated to the length of the data. Read more of this article to get to know about “The Law of Small Numbers”.

The Loophole in the Probabilistic Inference:

Imagine our whole world running on the rules of probability! However fancy it might seem, it is not actually possible otherwise you would be having a fair chance to everything, wouldn’t you? The point is that probabilistic results are always relative. You can neither expect nor confirm the extent of its reality.

The source of the data that you are picking for your hypothesis and inference is of great importance here. Rather than completely shifting your focus to what does a data infer, you might want to shift your focus to where the data has been picked from (and possibly how).

Random Sampling is one of the famous probabilistic sampling techniques which goes with a very basic rule: Each element has an equal and a fair chance of getting picked. Now, nothing in this world is absolutely defined by itself and so the fair & equal chance is quite a hypothesis.

Causal Explanation/Causal Narrative:

A Causal Narrative is something which is derived from general human behavior. You must have not realized but we humans have always indulged in pulling out inferences from the pieces of information that are provided to us. The data, however, is not always full-proof.

To put it in clearer words, the data can be churned out as a result of random sampling which makes it even more difficult to put a word to the analysis done through its inferences. You must note that Random Sampling inferences are sometimes misinterpreted as the whole account of the data is not clear and was in fact collected randomly. Similarly, a causal explanation of a data which has been collected or sampled randomly does not always pull out exact inferences.

Why? – Because we are trying to infer a cause for something which has no cause (by its nature).

Hence, equal attention must be paid to the method which has been used for collecting the data.

Sparse Population & an Example:

In a famous statistical puzzle related to kidney cancer among the 3143 counties in the US, the data had two interesting (& confusing) inferences at the same time.

Inference 1: US Counties with the lowest rates of Kidney Cancer have the following attributes:

Mostly Rural

Sparsely Populated

Located in traditionally Republican states in the Midwest, the South, and the West.

Inference 2: US counties with the highest rates of Kidney Cancer have the following attributes:

Mostly Rural

Sparsely Populated

Located in traditionally Republican states in the Midwest, the South, and the West.

Now, how two exactly opposite (in nature) inferences can have exactly the same attributes?

The key attribute or factor here in this example is the sparse population of the data collected in the first place. In fact, a misinterpretation of the data source has been done because of the sparsely populated data as a small population (generated with a random chance) is inclined to show greater extremes in terms of deviations.

Let’s understand this context better in terms of another famous example:

Take an example where a jar of equal red and green marbles is placed and you need to pick 4 marbles out of the jar randomly. The possible outcomes noted are:

2R/2G <- Actual Population Mean

3R/1G or 3G/1R

4G/4R <- extreme outcome (has 12.5% chance)

When the Sample sizes are increased and for example let’s say we’d be picking out 7 Marbles instead of 4 then the probability of extreme deviation (i.e. picking out 7 same colored marbles) is reduced to only 1.8%.

This result is in fact derived from the very famous law –“The Law of Small Numbers”.

Conclusion

The law of small numbers explains the Judgmental bias which occurs when it is assumed that the characteristics of a sample population can be estimated from a small number of observations or sample data.

Therefore, while studying any survey, the length of data should be given an important consideration as the probability of it being relatable is directly correlated to the length of the sample.