Thursday, September 18, 2014

Move the Sequence
File to Hadoop Cluster and check the contents of the Sequence File

Plan and Run K-Means
clustering algorithm

Export the K-Means
output using Cluster Dumper tool

Export the K-Means
output as graphml file

Visualize the output of K-Means using graphml in Gephi

Dataset Preparation :

I
am gonna generate float
values (having
2 Dimension and 5 different ranges) using
Java Gaussian function as given below,

import
java.util.Random;

publicfinalclass
RandomGaussian {

publicstaticvoid
main(String... aArgs) {

RandomGaussian
gaussian = new
RandomGaussian();

double
MEAN = -0.9f;

double
VARIANCE = 0.1f;

for
(int
idx = 1; idx <= 25; ++idx) {

log(gaussian.getGaussian(MEAN,
VARIANCE));

}

}

private
Random fRandom
= new
Random();

privatedouble
getGaussian(double
aMean, double
aVariance) {

return
aMean + fRandom.nextGaussian()
* aVariance;

}

privatestaticvoid
log(Object aMsg) {

System.out.println(String.valueOf(aMsg));

}

}

Generate Sequence File
:

If
you need to process some numerical data, you need to write some
utility functions to write the numerical data into sequence-vector
format. The following java program will convert the above create
numerical data into sequence vector file. SequencesFiles is a file
with structure of key-value format.