ELKI: "Environment for Developing KDD-Applications Supported by Index-Structures" is a development framework for data mining algorithms written in Java. It includes a large variety of popular data mining algorithms, distance functions and index structures.

Its focus is particularly on clustering and outlier detection methods, in contrast to many other data mining toolkits that focus on classification. Additionally, it includes support for index structures to improve algorithm performance such as R*-Tree and M-Tree.

The modular architecture is meant to allow adding custom components such as distance functions or algorithms, while being able to reuse the other parts for evaluation.

This package also includes the source code, since this software is meant for the rapid development of such algorithms, not so much for end users.

New beta release, including some new algorithms (ODIN, PINN, full O(n^3) Hierarchical Clustering, new cluster extraction methods from hierarchies), new index structures (in-memory k-d tree, LSH, projected indexes, PINN), new visualizations and much more.

This release requires Java 7, for the new visualizations also JOGL will be needed.

This is mostly a bug fix release. A lot of small issues have been
fixed that improve performance, make error reporting a lot better,
ease the use of sparse vectors and external precomputed distances, for
example.

This will be the last ELKI release to support Java 6. The next ELKI
release will require Java 7.

Algorithms

Some new LOF variants (LDF, SimpleLOF, SimpleKernelDensityLOF)

Correlation Outlier Probabilities (ICDM 2012)

A naive mean-shift clustering

Single-link clustering (SLINK algorithm) should be significantly
faster due to optimized data structures

"Benchmarking" algorithms for measuring the performance of index structures

Index layer

Bulk loading R-Trees should be faster - in particular Sort Tile
Recursive can work very well.

The full changelog is not yet up. Here is an excerpt of the new
functions in 0.5.0
- further speed improvements
- R-Tree flexibility: multiple new split strategies, bulk loaders,
insertion strategies, so that ELKI can now do many R-Tree variations,
including the original Guttman R-Tree, not only the R*-Tree.
- K-Means flexibility: MacQueen and Lloyd style iterations along with
various seeding strategies, including K-Means++
- VA-File (static only, not dynamic databases)
- Many popular cluster evaluation measures
- Alpha shapes, Voronoi cells, Delaunay triangulations in the
visualization layer (in the projected space, so 2D!)
- Parallel coordinates
- Outlier ensemble code, presented at SDM 2012
- Some new algorithms, such as OUTRES

For the final 0.5.0 release we hope to have some approximate outlier
detection methods for you (aLOCI, HilOut) as well as some subspace
outlier detection methods including HiCS (ICDE 2012, to be presented
tomorrow).

The full changelog is not yet up. Here is an excerpt of the new
functions in 0.5.0
- further speed improvements
- R-Tree flexibility: multiple new split strategies, bulk loaders,
insertion strategies, so that ELKI can now do many R-Tree variations,
including the original Guttman R-Tree, not only the R*-Tree.
- K-Means flexibility: MacQueen and Lloyd style iterations along with
various seeding strategies, including K-Means++
- VA-File (static only, not dynamic databases); partial-VA to come for
0.5.0 final?
- Many popular cluster evaluation measures
- Alpha shapes, Voronoi cells, Delaunay triangulations in the
visualization layer (in the projected space, so 2D!)
- Parallel coordinates (only halfway reviewed in beta1, more to come!)
- Outlier ensemble code, to be presented at SDM 2012 end of april

For the final 0.5.0 release we hope to have some approximate outlier
detection methods for you (aLOCI, HilOut) as well as some subspace
outlier detection methods including HiCS (ICDE 2012, to be presented
tomorrow).