Bray-Curtis dissimilarity

Objective

The non-metric Bray-Curtis dissimilarity (Bray & Curtis 1957) delivers robust and reliable dissimilarity results for a wide range of applications. It is one of the most commonly applied measurements to express relationships in ecology, environmental sciences and related fields.

Equation

Bray-Curtis is a modified Manhattan measurement, where the summed differences between the variables are standardised by the summed variables of the objects. The general equation of the Bray-Curtis dissimilarity is:

In the equation dBCD is the Bray-Curtis dissimilarity between the objects i and j, k is the index of a variable and n is the total number of variables y. The Bray-Curtis similarity dBCS is a slightly modified equation. It can be directly calculated from the dissimilarity value:

dBCS = 1 - dBCD

In opposite to the dissimilarity approach a dBCS value of 0 means a complete absence of relationships.

Synonyms

Bray-Curtis similarity and dissimilarity values are often multiplied by 100 and given as percentile proportions. It is very similar to the definition of the Sørensen distance. Sometimes the term Czekanowski’s coefficient is erroneously used for Bray-Curtis indices.

Usage

When investigating data covering a wide range it might be useful to use a transformation beforehand. It must be considered that Bray-Curtis is not metric when choosing a statistic for the evaluation of the output matrix. When data are ≥0 the Bray-Curtis similarity is within the range of 0 to 1. A value of 1 indicates a complete matching of the two data records in the n-dimensional space. Both dBCD and dBCS are sometimes multiplied by 100 and given as percentile values.

Higher values impact the result of the Bray-Curtis similarity more dominant and imply that these variables are the likely to discriminate between objects. It is not affected by joint zeros (Field et al. 1982), but the result is undefined, when the variables among two objects are entirely 0. In this case the denominator becomes 0 and Clarke et al. (2006) suggest to use a zero-adjusted Bray-Curtis coefficient that includes a virtual dummy variable being 1 for all objects. In the numerator this variable subtracts to zero and in the denominator it sums to 2:

The effect is that objects with variables being entirely zero now have one variable in common and zero is returned.

Algorithm

The algorithm controls whether the data input matrix is rectangular or not. If not the function returns FALSE and a defined, but empty output matrix. When the matrix is rectangular the Bray-Curtis dissimilarity will be calculated. Therefore the dimensions of the respective arrays of the output matrix are set, and the titles for the rows and columns set. As the result is a square matrix, which is mirrored along the diagonal only values for one triangular part and the diagonal are computed. When errors occur during computation the function returns FALSE.

To calculate the Bray-Curtis similarity the Bray-Curtis dissimilarity matrix is computed first and thereafter transformed.

//copy the respective titles For RunnerY := Low (InputMatrix.RowTitle) to High (InputMatrix.RowTitle) do Begin // names for rows and columns are the same in this triangualary matrix OutputMatrix.RowTitle [RunnerY] := InputMatrix.RowTitle [RunnerY]; OutputMatrix.ColTitle [RunnerY] := InputMatrix.RowTitle [RunnerY]; end;

// compare every object For RunnerY := Low (OutputMatrix.Cells) to High (OutputMatrix.Cells) do Begin // with every other For RunnerX := Low (OutputMatrix.Cells) to RunnerY do Begin Numerator := 0; Denominator := 0; //use all variables of each object under comparison For i := 0 to High (InputMatrix.Cells [0]) do Begin FirstVal := InputMatrix.Cells [RunnerX, i]; SecondVal := InputMatrix.Cells [RunnerY, i];

Although the Euclidean distance between the objects Case1 and Case3 is the same as between Case4 and Case5, the Bray-Curtis dissimilarity indicates a higher relationship between the objects Case4 and Case5. This is due to the fact that the analysis gives more weight to variables with higher values. Thus, it is very useful when interested in analyses, where high joint presences are more important than sparse ones. This effect can be weakened by initial transformations.