The use of charts and historical data is commonplace in private investment but the use of statistical analysis is more typically associated with quantitative investment and active fund management techniques.

Warren Buffett

The so-called "Oracle of Omaha" Warren Buffett is perhaps best known for his "get greedy when everyone else is scared, and get scared when everyone else is being greedy" line.

Warren Buffett: the 'Oracle of Omaha' was a pioneer of statistical analysis for investing: Wikimedia Commons

But he was a pioneer of analysis – starting out with tips sheets on the racetrack.

He graduated to stock trading, and part of his approach is to work out what price is right for him, compared with what profits that company could be expected to be earning in 10 years.

And there are methods of extrapolating this from historical data using some, or a combination of the techniques listed below.

Buffett's ex-daughter in law, Mary Buffett, wrote in her book on his trading style: "Warren has found, if the company is one of sufficient earning power and earns high rates of return on shareholders' equity, created by some kind of consumer monopoly, chances are good that accurate long-term projections of earnings can be made."

Limitations to data analysis techniques

Statistical analysis has its limitations. There's little room to account for "black swan" events – those sometimes catastrophic occurrences that no amount of number crunching can predict.

And during such periods – as in the volatile market conditions that followed events such as the dotcom bubble, 9/11 and the 2008 financial crisis – statistical analysis becomes something of a blunt tool, its predictive power neutered by unpredictability.

Statistical analysis tools

Below, then, are several techniques in the arsenal of statistical analysis.

We've removed most of the hard sums to leave you with just the ideas that make these tools useful.

Should these ideas prompt the desire for a deeper understanding of these types of investment methods, we've included further reading list at the end of the article

Measures of central value

There are three measures of central tendency in statistical analysis: the mean, median and mode. All three are summary measures that attempt to best describe a whole set of data in a single value that represents the core of that data set's distribution.

1. Mode

This is the most commonly occurring value in a data set.

Consider the following data set of the ages of 10 children:

4, 5, 5, 6, 6, 6, 7, 8, 8 and 9

The mode here is 6, as this is the most commonly occurring value. The mode, however, won't necessarily reflect the central value of a data set. Also, it is possible for there to be two or more modes in a data set or, indeed, no mode at all.

2. Arithmetic mean

The mean is the average value of a data set.

Consider the following data set:

2, 4, 5, 8 and 9

The arithmetic mean is arrived at by adding all the numbers together and then dividing the total by the number of data points in the set.

So, by adding 2+4+5+8+9 = 28, which we then divide by 5 (the number of data points, or numbers, in that set) we arrive at 5.6.

Mean values are useful in many circumstances in business.

Internet shopping sites always ask for your age range when you set up an account. This is not only useful to them, but also to other retailers and manufacturers of goods for targeting advertising to certain age groups.

Market research uses statistical analysis in identifying age groups for targeted marketing: Shutterstock

In investment, particularly for institutions, it's becoming increasingly important to know the average buying prices at certain times of day to know whether your institution is arriving at best execution on its asset purchases.

3. Median

The media is the middle number in a data set.

Consider the same data set as above:

2, 4, 5, 8 and 9

The median is simply the number in the middle = 5. This is easily arrived at if the data set is an odd number, as above. But what if the data set were:

1, 2, 4, 5, 8 and 9

In the case of data set with an even number of data points, we take the average of the middle two numbers.

So, 4+5/2 gives us a median of 4.5.

Median values are useful in statistical analysis because they are less prone to be skewed by anomalies or other unusual appearances in a data set. Consider the following set:

2, 4, 5, 8 and 798

In reality, such an extraordinary thing isn't likely to happen in such a small set, but the median of 5 is much more representative of the majority of that data set than the arithmetic mean of 163.4.

Imagine the example of salaries in a company. Let's say there are three broad ranges of salary: 80% of those salaries are for semi-skilled and unskilled workers, while 15% are for skilled workers and supervisors, while just 5% is represented by senior managers and executives.

Salary breakdowns are often best represented by median earnings within companies: Pixabay

That top 5% skews the average salary upward.

A semi-skilled worker earning £30,000 a year isn't likely to be impressed to learn that the mean salary where he works is £45,000 a year. He knows he earns more than an unskilled worker, but the mean salary makes his route up the corporate ladder seem a terribly long one.

His salary is likely to be more closely related to the median given the percentage of workers in that group of the data set.

Probability theory

4. Mathematical expectation

This is also called the expected value (EV), is the number in probability theory one may arrive at when a task with random variable outcomes is performed many times – such as rolling a single dice.

The data set here is 1, 2, 3, 4, 5 and 6 and probability of any of those numbers turning up on a single throw is 1 in 6, or 1/6 or, expressed as a decimal, 0.16666.

Probability theories like mathematical expectation work out the likelihood of outcomes of random variables: Pixabay

The mathematical expectation or EV is arrived at by multiplying each of the possible outcomes by the probability of it occurring and adding the sums of all those values. Hence, with a dice roll:

1x0.166666+2x0.16666 . . . +6x0.16666 = 3.5

Simply, the expected value is the arithmetic mean of all possible outcomes, so:

(1+2+3+4+5+6)/6 = 3.5

The law of large numbers dictates that the more often the dice is thrown, the nearer the mathematical mean value of those throws approaches EV. This is called convergence.

In business and investment terms, expected value is used by risk managers in scenario analysis when calculating whether an investment is worth the appropriate level of risk the firm is willing to take on.

The quality and depth of statistical analysis now made possible by computing means EV can be calculated on data sets that were previously regarded as unworkably massive.

These can be of enormous value in helping investment professionals to arrive at forecasts for investment returns, particularly when used in conjunction with measures of variance and standard deviation (see below).

Distribution models

5. Normal distribution

Normal distribution is also called standard normal distribution or Gaussian distribution model.

Normal distribution can be charted along a single horizontal axis that represents the total spectrum of values within a given data set.

Half of that data set will have values that are higher than the mean and half will have values lower than the mean. Most data points will lie close to the mean and the rest will tail off in each direction.

The shape described by plotting this data will be a bell curve, as below.

Normal distribution, when plotted on a chart looks like a regular bell curve: Wikimedia Commons

Normal distribution patterns in historical returns don’t tell an investor that much, other than that the asset is apparently miraculously well-behaved and that its returns mostly reflect the historical average.

6. Skewness

Skewness measures the symmetry, or asymmetry of distribution.

In a standard normal distribution, as above, the skewness will be zero.

Negative skewness will distort the bell curve to the left and positive skewness will have the opposite effect.

When examining an asset's annual returns over a period of time, the professional investor will look for investments that show positive skewness – returns that are greater than the historical average.

This has, in some circumstances, proved disastrous for investors, however. When market bubbles form, an asset can show positive skewness, prompting investors to buy at the top of the market. Then, when the skew turns negative, they may be tempted to sell at a loss.

Statistical analysis is only as intuitive as the person using it.

7. Kurtosis

Kurtosis is another measure of deviation from normal distribution, but looks at the extremes. This introduces the well-known investment term "tail risk".

A distribution model that is said to have a fat tail is a sign of kurtosis. Tail risk arises when the possibility that an investment could move more than three standard deviations (see below) from the mean is greater than a normal distribution model.

Divergence from the mean

8. Variance

Variance is used as a data analysis tool to examine how each individual value in a set of numbers differs from the arithmetic mean of that data set.

If you take the data set 2, 4, 5, 8 and 9, the arithmetic mean (adding all and dividing by number of data points, i.e. 5) is 5.6. If you simply take the deviation from the mean by subtracting it from each number, i.e.: 2 - 5.6, 4 - 5.6 etc, you get -3.6, -1.6, -0.6, 2.4 and 3.4.

The sum of all these numbers, and any other set of numbers will always be zero. To arrive at the variance, take the difference between each number in the data set and the arithmetic mean and square it. Hence:

-3.6x-3.6; -1.6x-1.6 . . . etc, to arrive at the variance set of 12.96, 2.56, 0.36, 5.76 and 11.56 and then take a new arithmetic mean of this new set. The variance is therefore, 6.64.

Variance is also used in risk management to help determine the level of risk an investor might take when purchasing a certain asset, but usually as the square of standard deviation, which we'll examine next.

9. Standard deviation

The standard deviation is simply the square root of variance, but is one of the most important measures in statistical analysis.

When applied to annual returns on an investment, standard deviation can help determine the historical volatility of that investment.

Standard deviation can help determine the historical volatility of your investments: Wikimedia Commons

Once you have worked out the variance, it is simple. The variance of the set 2, 4, 5, 8 and 9 as above is 6.64. The standard deviation of this set is the square root of 6.64, which is 2.577.

Standard deviation is a fundamental risk measure in investment that most professional fund and portfolio managers use to help calculate likely returns from an investment.

Knowing the returns on an investment over several previous years, the mean or average return can be calculated, and from that the standard deviation tells the investment manager the likely volatility on the average return.

If the return each year has been within the standard deviation then it is a stable investment. If the return in some years is outside the standard deviation it is more volatile.

Measures of similitude

9. Covariance

Traders use statistical analysis to plot the returns on risky investments in a portfolio. When two or more risk assets move in tandem, they are said to have high, or positive covariance.

Positive covariance isn't particularly welcome in an asset portfolio. One can expect a higher degree of returns from risk assets, but also a higher degree of losses when things go wrong – and you don't want two or more risky assets going wrong at the same time.

Low, or negative covariance provides an asset portfolio with greater diversification, because when one risk asset is not performing well, other risk assets should be offsetting that poor performance.

10. Correlation coefficient

Simple correlations can be seen when comparing two charts side by side. The eye can spot simple matches between peaks and troughs.

Often even spurious events will show correlation, which is why the correlation coefficient is used: Wikimedia Commons

For a more accurate gauge of correlation, however, the correlation coefficient can be worked out by dividing the sum of the covariance of the variables in question by the sum of their standard deviations.

The answer should come between the range of 1 and -1. A positive value means there is a positive correlation between the two variables. The closer to 1, the more highly correlated the two are. The opposite effect will be seen in a negative coefficient.

This type of statistical analysis is used by fund managers to determine how well their fund is performing compared to its benchmark index.

11. Regression

The best-known regression model in finance is the capital asset pricing model (CAPM) which helps investors arrive at asset pricing and cost of capital.

Simply put, regression is the degree to which the price of an asset, or other variable, is influenced by another set of variables.

For example, it is possible using regression formula to work out the probable effect on an Australian gold miner's shares from rising gold prices, rising domestic interest rates and a fall in the US dollar.

12. R-squared

R-squared is the statistical analysis of the relationship between a fund, particular asset or security and its benchmark index.

For example, an equity fund will have a firm relationship with the index it tracks - if the fund is sector based, then it should have a close resemblance to that sector's sub-index on a main stock index.

R-squared values are measured in percentages, so an R-squared relationship of 100% would mean that security, asset or fund had no other influence than its benchmark index, and that its performance matched that of the index.

An R-squared value of less than 70% is usually said to indicate there is little relationship between the security and the index.

Conclusions and further reading

Remember that without some knowledge also of the market conditions in which certain assets and securities thrive, statistical analysis alone is of little reliable use.

If you’re only basing your investment decisions on hunches – you’re as much in the dark.

But together with analysis of economic factors such as balance sheet and profit and loss, or historical returns, statistical analysis can help reassure investors on those hunches.

Use as much information as is readily available before making your investment decisions.

Please use this article, and any further reading on statistical analysis – of which we’ve provided some examples below – in conjunction with our courses on trading and other related features.

Capital Com (UK) Limited is registered in England and Wales with company registration number 10506220. Authorised and regulated by the Financial Conduct Authority (FCA), under register number 793714.

Capital Com SV Investments Limited is a Cyprus Registered Company with Company Registration Number HE 354252. Authorised and regulated by the Cyprus Securities and Exchange Commission (CySEC), under license number 319/17.

The information found on this website isn’t applicable to residents of the USA or Belgium and shall not be used or distributed in any country or jurisdiction where it goes against domestic legislation and regulations.

CFDs are complex instruments and come with a high risk of losing money rapidly due to leverage. 78% of retail investor accounts lose money when trading CFDs with this provider. You should consider whether you understand how CFDs work and whether you can afford to take the high risk of losing your money.