Which Clustering Algorithm should is used for ?

By ahmed jabbar | Aug 07, 2016 10:35PM CEST

Dear supporters

I would like to know which algorithm should i used for my high-dimensional dataset ( 128 / 82 dimensions ) with string attributes matrix , entries are values for tf-idf , so which algorithm can work and clustering my instances that has 128 in one dataset and second dataset is 82 dimensions ) that are mention suitable algorithm for clustering high-dimensional dataset .

Note : these dimension has been produced after string to word conversion, and attributes selection process , so result attributes has one class labels consist of 10 labels, would like to cluster it into clusters and validate result by cross validation process

Up

0

Down

By Paulin |
Aug 08, 2016 11:38AM CEST |
XLSTAT Agent

Dear Ahmed,

AHC seems suitable for your analysis.
However, in case of a large dataset, you can perform the k-means clustering followed by ACH as described in the linked page below:
https://help.xlstat.com/customer/en/portal/articles/2062322-clustering-big-datasets-with-xlstat—-using-k-means-clustering-followed-by-an-ahc?b_id=9283