Data clustering or clustering analysis belongs to statistical data analysis methods. It aims at forming groups of objects that are similar in some way. Those groups are named clusters. The word cluster is related to clot, for thick mass of coagulated liquid or of material stuck together

The whole set of objects contains heterogeneous data, that ought to be grouped into subsets possessing a greater inner homogeneity. Such methods rely on similarity criteria or proximity measures. They are related to classification, machine learning, segmentation, pattern recognition, and have applications ranging from image processing to bioinformatics.

The K-means technique is a little older. It was published in French by Hugo Steinhaus
in 1956 in the Bulletin de l’Académie Polonaise des Sciences (Bulletin of the Polish Academy of Sciences). Hugo
Steinhaus (1887-1972) is a Polish mathematician, sometimes known as the discover of Stefan Banach. He contributed to numerous branches of mathematics, and considered a early founder of probability theory.

He also contributed to applied mathematics, working jointly with engineers, biologists, physician, economists, geologists, and "even lawyers". Lacking of trustworthy information during World War II, he invented a statistical tool to estimate German losses, using necrologic news from German soldiers on the front. He notably used the mention that the soldier killed was the first, second or third child from a family. He thus is a precursor of data science.

This paper is called "Sur la division des corps matériels en parties" (On the division of material bodies into parts). The first to explicitely
formulate in finite dimension the principle of k-mean clustering. It is much more constructive that the Banach-Tarski paradoxal theorem
which delas with cutting a ball into two different balls, doubling in volume. This paper is pleasant to read, and evokes practical uses from type classification in anthropology to industrial object normalization.

Being in French, in a journal whose archives cannot be accessed easily, it has not been read as much as it deserved. . Here is it transduced by Maciej Denkowski, transmitted by Jérôme Bolte, and transcribed in LaTeX by myself, with some effort to preserve the original typesetting and composition.