Gaussian Mixture Model Implementation in Pyspark

GMM algorithm models the entire data set as a finite mixture of Gaussian distributions,each parameterized by a mean vector, a covariance matrix and a mixture weights. Here the probability of each point to belong to each cluster is computed along with the cluster statistics. This distributed implementation of GMM in pyspark estimates the parameters using the Expectation-Maximization algorithm and considers only diagonal covariance matrix for each component.

Spark Packages is a community site hosting modules that are not part of Apache Spark.
Your use of and access to this site is subject to the
terms of use.
Apache Spark and the Spark logo are trademarks of the Apache Software Foundation.
This site is maintained as a community service by
Databricks.