CORRELATION ANALYSIS

The term "correlation" is used to indicate the degree of interrelation between two or more variables. The procedure of calculating quantitatively the degree of the interrelation is called correlation analysis. Correlation analysis can be carried out for both continuous variables and discrete data, and the analysis for discrete data most often found in engineering practice and digital calculations is described below.

Autocorrelation function. Assuming a discrete time-series with finite number of samples of N and an average value of
,

(1)

the autocorrelation function of which is defined as:

(2)

which effectively averages all possible products of the time-series and its time-shifted version separated by a time lag k. In practice, formula (2) is preferred in its normalized form

(3)

The value of ρxx is such that – 1 ≤ ρxx ≤ 1. The autocorrelation function is an average measure of the time-domain properties of the time-series, and is related to the power spectral density function in the frequency domain by the Fourier transform (see Spectral Analysis). If the magnitude of the autocorrelation function ρxx decreases with increasing time lag k, there is some degree of randomness in the time series. If the ρxx changes sign at regular time intervals, then the time-series is periodic, and a combination of the two may imply that the time-series is quasi-periodic, which is often the case in real engineering problems.

Cross-Correlation Function

The cross-correlation function for two sets of time-series data

is defined as

(4)

The correlation function and cross-spectral function are equivalent measures in time and frequency domains which are related to each other by the Fourier transform (see Spectral Analysis).

Correlation Coefficient

The correlation coefficient is defined as the normalized version of formula (4) and is given by

(5)

the value of which at a particular time corresponding to k is a measure of similarity of the strength of components in xn and yn at that time. The value of ρxy is such that – 1 ≤ ρxy ≤ 1 , and the larger the ρxy the more strongly correlated are the xn and yn at a given time.

It should be emphasized that the concept of correlation is different from that of regression. The procedure of finding a best fit curve is called regression, whereas the accuracy of the regression curve is measured by correlation.