Preprocessing

Quality Indices

You can realize manually your quality measures with dedicated class for local or distributed collection. Helpers ClustersIndicesAnalysisLocal and ClustersIndicesAnalysisDistributed allow you to test indices on multiple clustering at once.

Clustering benchmarking and analysis

Using classes ClusteringChainingLocal, BigDataClusteringChaining, DistributedClusteringChaining, and ChainingOneAlgorithm descendants you have the possibility to run multiple clustering algorithms respectively locally and parallely, in a sequentially distributed way, and parallely on a distributed system, locally and parallely, generate many different vectorizations of the data whilst keeping active information on each clustering including used vectorization, clustering model, clustering number and clustering arguments.

Classes ClustersIndicesAnalysisLocal and ClustersIndicesAnalysisDistributed are devoted for clustering indices analysis.

Classes ClustersAnalysisLocal and ClustersAnalysisDistributed will be use to describe obtained clustering in term of distributions, proportions of categorical features...

Incoming soon

Citation

If you publish material based on informations obtained from this repository, then, in your acknowledgements, please note the assistance you received by using this community work. This will help others to obtain the same informations and replicate your experiments, because having results is cool but being able to compare to others is better. Citation: @misc{C4E, url = “https://github.com/Clustering4Ever/Clustering4Ever“, institution = “Paris 13 University, LIPN UMR CNRS 7030”}

C4E-Notebook examples

Miscellaneous

Helper functions to generate Clusterizable collections

You can easily generate your collections with basic Clusterizable using helpers in org.clustering4ever.util.{ArrayAndSeqTowardGVectorImplicit, ScalaCollectionImplicits, SparkImplicits} or explore Clusterizable and EasyClusterizable for more advanced usages.