After you create a table of data, and aset of random centroids from which to start your analysis,you need to create a series of formulas thatcalculate the distance between each data point and each centroid.We'll use the CalculateDistance.xlsx file as our sample file for this movie.The formulas we'll create, calculate what'scalled the Euclidian distance between two points.Okay, what's that?Well, you probably remember the Pythagorean Theorem, where the twosides of a triangle tell you the length of the hypotenuse.

That equation is a squared plus b squared equals c squared.So the formulas we use in this movie should look a little familiar.If you're working through chapter two as a series of exercises, as opposed to usingthe sample files, then you might see different values in your centroids table.If that's the case, don't worry about it, everything will stillwork, it's just your numbers will be a little bit different.We have three columns in our main table thatallow us to calculate the distance to the centroids.

Distance to centroid one, centroid two and centroid three.Let's create the formula for the distance to centroid one first.So I click in cell C3, and then I'll start entering my formula.Now remember that c squared equals a squared plus b squared.So that means to find the distance c from a particular point to acentroid, we need to find the square root of a squared plus b squared.And that formula looks like this, it's equal and thensqrt, which is the square root function, then two leftparentheses, A3, which is the x axis value for our firstpoint, minus I3, which is the x axis value for the firstcentroid, and I don't want that changing, so I'll press F4to make it an absolute reference.

Then a right parentheses.And then I'm going to square that difference.So that is caret, which is a shift 6, and then 2, which is squaring.Now I'll type a plus sign, and this is the b squared component of the equation.So I'll type a left parenthesis, then B3 minus J3, and again, makingthat an absolute reference by pressing F4, a right parenthesis, andthen squared, again caret two, and a right parentheses and enter.

And there I have all of my values, and because I'm in anExcel table, Excel filled in the formulas to the rest of the column.Now I need to create similar formulas for D3and E3, representing distances to centroid two and centroid three.So the distance is centroid two isequal, squared root, two left parenthesis, A3, minus,and now this time it's I4 because weare calculating the distance of the second centroid.So that's I4, F4 to make it an absolute reference.

Right parenthesis.Squared.Plus, left parenthesis B3 minus J4.F4 to make it an absolute reference.Right parentheses squared, right parentheses close, and tab.And we'll do the final thing for the distance to C3, and thistime I'll just type it because I have already talked you through it twice.Two left parentheses, A3 minus I5, F4for absolute, right parentheses, excuse me, squared.

Plus, B3 minus J5, absolute, right parentheses,squared, right parentheses, and enter.So there we have our formulas.We have now calculated the Euclidean distance between each point andeach centroid and now we can find which centroid is closest.

Resume Transcript Auto-Scroll

Author

Released

5/28/2014

Learn how to use Excel's built-in data management and computation functions to identify clusters of data points—with little or no VBA! Author Curt Frye shows you how to set up a worksheet for cluster analysis, create formulas that identify the closest focal point (centroid) for each row, and analyze your results in an Excel table or XY scatter chart. Members who don't mind recording macros or writing VBA code can learn how to automate some parts of the procedure.