The number of clusters to form as well as the number of
centroids to generate.

max_iter : int, default: 300

Maximum number of iterations of the k-means algorithm for a
single run.

n_init : int, default: 10

Number of time the k-means algorithm will be run with different
centroid seeds. The final results will be the best output of
n_init consecutive runs in terms of inertia.

init : {‘k-means++’, ‘random’ or an ndarray}

Method for initialization, defaults to ‘k-means++’:

‘k-means++’ : selects initial cluster centers for k-mean
clustering in a smart way to speed up convergence. See section
Notes in k_init for more details.

‘random’: choose k observations (rows) at random from data for
the initial centroids.

If an ndarray is passed, it should be of shape (n_clusters, n_features)
and gives the initial centers.

precompute_distances : {‘auto’, True, False}

Precompute distances (faster but takes more memory).

‘auto’ : do not precompute distances if n_samples * n_clusters > 12
million. This corresponds to about 100MB overhead per job using
double precision.

True : always precompute distances

False : never precompute distances

tol : float, default: 1e-4

Relative tolerance with regards to inertia to declare convergence

n_jobs : int

The number of jobs to use for the computation. This works by computing
each of the n_init runs in parallel.

If -1 all CPUs are used. If 1 is given, no parallel computing code is
used at all, which is useful for debugging. For n_jobs below -1,
(n_cpus + 1 + n_jobs) are used. Thus for n_jobs = -2, all CPUs but one
are used.

random_state : integer or numpy.RandomState, optional

The generator used to initialize the centers. If an integer is
given, it fixes the seed. Defaults to the global numpy random
number generator.

verbose : int, default 0

Verbosity mode.

copy_x : boolean, default True

When pre-computing distances it is more numerically accurate to center
the data first. If copy_x is True, then the original data is not
modified. If False, the original data is modified, and put back before
the function returns, but small numerical differences may be introduced
by subtracting and then adding the data mean.

Alternative online implementation that does incremental updates of the centers positions using mini-batches. For large scale learning (say n_samples > 10k) MiniBatchKMeans is probably much faster to than the default batch implementation.

Notes

The k-means problem is solved using Lloyd’s algorithm.

The average complexity is given by O(k n T), were n is the number of
samples and T is the number of iteration.

The method works on simple estimators as well as on nested objects
(such as pipelines). The former have parameters of the form
<component>__<parameter> so that it’s possible to update each
component of a nested object.