You are here

Statistics in Engineering: Linear Regression (Part 2 of 8)

Sun, 03/30/2014 - 08:38 — will

Preface
This series is aimed at providing tools for an electrical engineer to gain confidence in the performance and reliability of their design. The focus is on applying statistical analysis to empirical results (i.e. measurements, data sets).

Introduction
This article will introduce linear regression on a data set using the R Project software. This is useful if your data is "on a line" rather than a Gaussian distribution.

If you are not familiar with statistics or need a brush up I recommend Schaum's Statistics. It provides a good overview of material without a lot of time spent on proofs and lots of examples.

ConceptsLinear Regression: Quite simply it means to take a data set and derive a function. Once you have a function you can calculate how close the data is to the function.

The equation for linear regression is of the form y=m*x+b, where b is the y-intercept (or offset) and m is the slope of the line.

Importing Your Data Set
I will use the R software package for statistical analysis. It is cross platform, free and open source. There are several Excel plugins which are good and if you have/can use SAS by all means use it.

The first row of your data set should be the titles for each column. Each column can contain anything but for building a distribution we can assume a single column with a row for each measurement.

NOTE: The following assumes we are testing an implementation of a 5V 8-bit DAC on a PCB. The data set contains two samples. In a real product I would probably want many samples to build a distribution.

We can see from the coefficients that there is a DC offset of about 0.15V and a slope of 0.0195. It is important to note that a plot of DAC input versus output for a perfect 5V, 8bit DAC has a slope of 0.0195. So we can see here that the slope error is nearly zero. The DC offset is considerable, several DAC bits (5*(1/2^8)=0.0195V). Maybe the DAC analog rail is high by 0.15V.

Coefficient of Determination
We can also check how well our function fits the data set by calculating the coefficient of determination. The value is between 0 and 1:
> summary(dac_out.lm)$r.squared
[1] 0.9996259

This function fits the data exceptionally well. These numbers were cooked, I'd never expect to see this in a real data set.

Calculating Output Values
I find that I never need to know the values of a Gaussian distribution equation (only area under the curve), but frequently do with linear data sets. We can use our function to estimate the DAC output based on DAC input value:

DAC Level 0:
> coeffs[1]+coeffs[2]*0
(Intercept)
0.1478863

DAC Level 127:
> coeffs[1]+coeffs[2]*127
(Intercept)
2.628082

DAC Level 256:
> coeffs[1]+coeffs[2]*256
(Intercept)
5.147337

Significance Test
If we print the entire summary we can see the p-values for this data set and check how well the output tracks to the input. Hypothesis testing and p-values are covered in a later section.
> summary(dac_out.lm)

If a single bit varies by 0.02V then at at 0.99 confidence level this DAC is quite enough to produce 8 bits of resolution (2.652158-2.643065=0.009093). However we cannot ignore the offset which will produce an error in our output.

This sample has some serious problems. There is a massive negative offset indicating a threshold that must be overcome before any output is observed. Also the slope of 0.0234 cannot be ignored. Compared to a perfect slope of ~0.0195 it will introduce a pretty big error over the entire DAC range.

Next Up
Next article will show how hypothesis testing can provide insight into your debug efforts by questioning how much effect a circuit modification really has on your design.