Navigation Menu

a CLUTO clustering scalability question

Hi everybody

I'm looking at different clustering packages and was wondering about CLUTO's clustering scalability

I'm looking to cluster several millions of documents/instances(1-10 million) with several hundreds of thousands of features (100,000-300,000)
could CLUTO handle this load? if not, what load could it handle approximately?
how important is it for me to do some feature reduction in that respect?

Not quite sure how the array of size X applies to clustering, but if the dataset you are trying to cluster contains n objects and the total number of non-zeros over all these n objects is m, then the memory complexity is about 4*(n+m) words, where a word is usually 4 bytes, depending on your architecture.