Open Source Your Knowledge, Become a Contributor

Partitioning

To illustrate how this partitioning scheme allows for a balanced cluster
assignment, we used 4450 email addresses
from the Enron dataset to simulate
arbitrary email addresses (keys) and we calculated how they would be assigned
across our 5 clusters using the Python script below: