-------------------------------------------------------------------------------- | -- Module : Data.Datamining.Clustering.Gsom.Input-- Copyright : (c) 2009 Stephan Günther-- License : BSD3---- Maintainer : gnn.github@gmail.com-- Stability : experimental-- Portability : portable---- The GSOM algorithm works on numerical input vectors. These input vectors-- are internally represented as lists of @'Double'@s and this module contains-- the functions working on these.------------------------------------------------------------------------------moduleData.Datamining.Clustering.Gsom.Input(Bounds,Input,Inputs,bounds,dimension,normalize,unnormalize,distance,(*.),(.*),(<+>),(<->))where-------------------------------------------------------------------------------- Standard modules------------------------------------------------------------------------------importData.List-------------------------------------------------------------------------------- Utility functions on lists of inputvectors-------------------------------------------------------------------------------- | Input vectors are represented as lists of Doubles.typeInput=[Double]typeInputs=[Input]-- | The bounds of a list of inputs. Having the tuple @(a,b)@ at index @i@ -- in @bounds@ means that the value at index @i@ of each of the input vectors-- from the inputs which where used to calculate @bounds@ is from the -- intervall @[a,b]@.typeBounds=[(Double,Double)]-- | Normalizes input vectors.-- @'normalize' inputs@ takes the given list of input vectors @inputs@ and -- returns a list of input vectors where each component is in @[0,1]@.-- If you want to unnormalize the input vectors use @'bounds'@ and -- @'unnormalize'@.normalize::Bounds->Inputs->Inputsnormalizebsis=map(normalizeVectorbs)iswherenormalizeVectorbs=mapnormalizeValue.zipbsnormalizeValue((a,b),v)=ifa==bthen0else(v-a)/(b-a)-- | Calculates the bounds of the input vector components.bounds::Inputs->Boundsbounds[]=[]bounds(i:is)=foldl'f(dzi)iswheredzx=zipxxfps[]=psf[]xs=dzxsf((a,b):ps)(x:xs)=leta'=minax;b'=maxbx;t=fpsxsina'`seq`b'`seq`t`seq`(a',b'):t-- | Unnormalizes the given input vectors @inputs@ assuming that it's bounds-- previously where @bounds@.unnormalize::Bounds->Inputs->Inputsunnormalizeboundsinputs=map(mapf.zipbounds)inputswheref((min',max'),n)=ifmin'==max'thenmin'elsen*(max'-min')+min'-- | Calculating the dimension of a given set of inputs just means finding -- the length of the longest input vector.dimension::Inputs->Intdimension=maximum.maplength-------------------------------------------------------------------------------- Utility functions working on single inputvectors-------------------------------------------------------------------------------- | @'distance' i1 i2@ calculates the euclidean distance between -- @i1@ and @i2@. If @i1@ and @i2@ have different lengths, excess -- values are ignored.distance::Input->Input->Doubledistancei1i2=sqrt.sum.map(\x->x*x)$!(i1<->i2)-- | Multiplication of an input vector with a scalar.infixr7.*(.*)::Double->Input->Input(.*)d=(force.map((d*)$!)$!)infixl7*.(*.)::Input->Double->Input(*.)=flip(.*)-- | Subtraction and addition of vectors between each other.infixl6<->,<+>(<+>)::Input->Input->Input(<+>)i1i2=letfront=zipWith(+)i1i2l1=lengthi1l2=lengthi2incasesignum$l1-l2of0->front-1->front++dropl1i21->front++dropl2i1(<->)::Input->Input->Input(<->)i1i2=i1<+>(-1).*i2-------------------------------------------------------------------------------- Processing functions. Not exported.-------------------------------------------------------------------------------- | Zips two lists, but instead of truncating the longer one to the length-- of the shortert one the shorter one is padded with elements from the -- suffix of the longer one which is exceeding the length of the shorter one.padZip::[a]->[a]->[(a,a)]padZipxsys=let(lx,ly)=(lengthxs,lengthys)inuncurryzip$casecomparelxlyofEQ->(xs,ys)GT->(xs,ys++droplyxs)LT->(xs++droplxys,ys)-- | Forces a whole list. If it wasn't for this function, @'bounds'@ -- would blow the stack because only the @'head'@ of the bounds would be fully-- evaluated while the @'tail'@ would consist of huge thunks of @'min'@ and -- @'max'@ applcations.force::[a]->[a]force[]=[]force(x:xs)=lettail=forcexsinx`seq`tail`seq`x:tail