Comments 0

Document transcript

SICE-ICASE International Joint Conference 2006Oct.18-21,2006 in Bexco,Busan,KoreaFuzzy Decision-making SVMwith An Offset for Real-world Lopsided DataClassiﬁcationBoyang LI,Jinglu HU and Kotaro HIRASAWAGraduate School of Information,Production and Systems,Waseda University,Hibikino 2-7,Wakamatsu-ku,Kitakyushu-shi,Fukuoka-ken,Japan(Tel/fax:(+81)93-692-5271;E-mail:liboyang@akane.waseda.jp,jinglu@waseda.jp,hirasawa@waseda.jp)Abstract:An improved support vector machine (SVM) classiﬁer model for classifying the real-world lopsided data isproposed.The most obvious differences between the model proposed and conventional SVMclassiﬁers are the designsof decision-making functions and the introduction of an offset parameter.With considering about the vagueness of thereal-world data sets,a fuzzy decision-making function is designed to take the place of the traditional sign function in theprediction part of SVM classiﬁer.Because of the existence of the interaction and noises inﬂuence around the boundarybetween different clusters,this ﬂexible design of decision-making model which is more similar to the real-world situationscan present better performances.In addition,in this paper we mainly discuss an offset parameter introduced to modifythe boundary excursion caused by the imbalance of the real-world datasets.Because noises in the real-world can alsoinﬂuence the separation boundary,a weighted harmonic mean (WHM) method is used to modify the offset parameter.Due to these improvements,more robust performances are presented in our simulations.Keywords:SVM,Fuzzy decision-making function,WHMoffset,Real-world lopsided dataset,Classiﬁcation1.INTRODUCTIONSVMis an algorithmbased on the structure of statisti-cal theory,so it has general well performances for unseendata.Heretofore,SVM has been applied to many actualproblems,especially in various classiﬁcation problems,but the classiﬁcation and SVMitself still have some prob-lems to be resolved [2][3].For most real-world classiﬁcation problems,databasesare usually affected by interaction and noises betweendifferent classes,so the real-world classiﬁcation prob-lems are non-separable mostly.For dealing with thesenon-separable cases,SVMalgorithmuse a regularizationparameter C in the training part,which can weigh the tol-erance of SVM.Because this parameter is the unique ad-justable parameter in SVMto control the chosen of sup-port vectors,the changing of variable C always affectsthe performance of SVM evidently.In order to reducethe inﬂuences caused by improper choice of C and dealwith the misclassiﬁed problems caused by the interactionand noises,a fuzzy decision-making model is proposedto replace the traditional one in the prediction part ofSVM classiﬁers.By this way,the hard-shell boundarybetween neighboring classes is transformed into a ﬂexi-ble one.Then the misclassiﬁed cases caused by the inter-action and noises can be reduced [1].In addition,the number of samples in one of theclasses in the real-world datasets is usually much largerthan the others.This imbalance characteristic is the rea-son of the excursion of the boundary,which is anotherfrequent problem in the actual classiﬁcation.An offsetparameter was introduced for modifying this excursion inour model [1].In SVM,the support vectors are the near-est samples to the separation boundary,so their predictionvalues can be used to calculate this offset.However,it isfound fromexperiment results that the noises in the real-world cases not only make the separation boundary to bea gray zone but also can increase the difﬁculties to com-pute a proper offset parameter.In other words,becauseSVM admits the violations in non-separable cases,sup-port vectors can be conﬁrmed by the bounding planes be-longs to the different subsets.If these support vectors aredisturbed by the interaction or noises consumingly,it willbe difﬁcult to get a correct separation boundary.Basedon this consideration,in this paper we introduce a seriesof weights β1,...,βnare introduced to built a WeightedHarmonic Mean (WHM) offset,which can equipoise theinﬂuences from all support vectors and make some de-viant decision values of support vectors to be invalid.Us-ing this WHMoffset we can get a new separation bound-ary and then the performances of the this model is studiedin simulations with different kinds of real-world datasets.This paper is organized as follows:the next sectionprovides an overview of SVMand its applications in thenonlinear nonseparable classiﬁcation problems.Section3 discusses the decision-making part of SVMand a fuzzydecision making method is proposed for ﬁtting the real-world datasets.And In Section 4,an offset parameteris introduced to modify the excursion of the boundary,which can be calculated as a weighted harmonic meanof the decision values of support vectors.And then,theresults with comparison to different kernels and differentclassiﬁcation decision-making functions with details onaccuracy of classiﬁcation are presented in the Section 5.Finally,concluding remarks are given in Section 6.2.SVMFOR CLASSIFICATIONIn recent years,SVMreveals its prominent capabilityin many practical applications,especially in classiﬁcationproblems.In the elementary design of SVM classiﬁer,bounding planes of each subsets are considered and thedistance between these bounding planes is deﬁned as theFig.1 Support Vectors in Non-separable Classiﬁcationmargin.The process of maximizing the margin equalsto the process of ﬁnding the optimal separation bound-ary.In real-world the classiﬁcation problems are usuallynon-separable and non-linear.So the violations can beaccepted in non-separable cases.But for the nonlinearcases,the input vectors should be ﬁrstly mapped into ahigh-dimensional feature space in which a separating hy-perplane is found by solve a quadratic programming (QP)problemin its dual form.2.1 Basic Problemof SVMClassiﬁerAny complex classiﬁcation problems can be dividedinto several binary ones,so we just discuss the binarycases in our paper.If we have a training data set denoted as {xi,yi},where xi∈ Rn,i = 1,2,...,N.ximeans the i-th in-put vector and yiis its class label(+1 or -1).The trainingdata set can be divided into two different sets A and Bwhich have labels +1 and -1 respectively.As we dis-cussed above,the distance between two sets boundingplanes is called the margin.It is obvious that maximiz-ing this margin could improve the ability of the classiﬁermodel generally [4].2.2 SVMfor Non-separable ClassiﬁcationBut in the case where the training data are non-separable,one should attempt to minimize the separationerror and to maximizing the margin simultaneously.The support vector machine classiﬁer is obtained bysolving an optimization problem with an objective func-tion which balances a termforcing separation between Aand B and a term maximizing the margin of separation,so the tolerance is accepted in these cases [5].As shown in the Fig.1,support vectors from A arethose Aiin the halfspace {x ∈ Rn|wTx ≤ b + 1} (i.e.those points of A ’on’ or ’below’ the bounding planewTx = b + 1),where w and b are the weight and bias.Support vectors from B are those points Biin the halfs-pace {x ∈ Rn|wTx ≥ b − 1} (i.e.those points of ’on’or ’above’ the plane wTx = b −1).These points are theonly data points that are relevant in determining the op-timal separating plane.The number of support vectors isusually small and is also proportional to a bound on thegeneralization error of the classiﬁer.But there is a prob-lem in many actual applications.If these support vec-tors are inﬂuenced by the noises,some of themmay havelarge absolute decision values,and because the noises areusually uncertain,we can not get the correct separationhyperplane.2.3 SVMfor Nonlinear ClassiﬁcationBecause the common cases in the real-world classiﬁ-cation are nonlinear nonseparable,so in the primal space,we transformthe lowdimension large input data sets intoa high dimensional feature space by using a mappingfunction ϕ(x).For non-separable cases in the featurespace,the boundary function has a nonnegative variablesξito make the margin accept the violations,and then wehave a separating plane functionyi[wTϕ(xi) +b] ≥ 1 −ξi,∀i (1)The optimal hyperplane problem becomes to ﬁnd the so-lution of the following optimization problem,minw,b,ξJ(w,ξ) =12wTw +CN

i=1ξi(2)s.t.

yi[wTϕ(xi) +b] ≥ 1 −ξi,ξi≥ 0,i = 1,...,N.where parameter C is used to control the degree of toler-ance,which is the only changeable parameter in SVM.By introducing the vector of Lagrange multipliers α =(α1,...,αN),the problem(Eq.2) can be rebuilt as a QPproblemin dual space [6]:maxαQ(α) = −12N

i,j=1yiyjK(xi,xj)αiαj+N

j=1αj(3)s.t.

Ni=1αiyi= 00 ≤ αi≤ C,∀iwhere K(xi,xj) = ϕ(xi)Tϕ(xj) is the kernel func-tion [7].In our experiments,Polynomial kernel andGaussian RBF kernel are taken into account.Polynomialmapping is a common method for non-linear modelingshown as followingK(xi,xj) = (xi,xj +1)d(4)where d is the exponential quantity of the polynomial.And RBF has outstanding performances in applica-tionsK(xi,xj) = exp(−xi−xj

22σ2) (5)where σ2is the common width.And then the decision making function can be gained,y(x) = sign[N

i=1αiyiK(x,xi) +b] (6)y(x) is the output prediction label of input vector x.Although the sign function can divide the test data setinto two classes by detecting the signs of decision values

Ni=1αiyiK(x,xi) + b,it is too hard-shelled to makesome mistakes when the decision value is close to zero.3.FUZZY DECISION-MAKINGSVMMODEL3.1 Fuzzy Decision-making SVMProcessWe propose a fuzzy decision-making SVM process,which is a model based on fuzzy method,SVMalgorithmand analysis of a mass of database.As an extension oftraditional methods,the model proposed is more suitablefor actual applications.In the training part,we still use the same method astraditional one to train the SVM classiﬁer [8].As wediscussed before,the support vectors,which belong toa subset extracted from the training data and used fordescribing the separation boundary,can be found.Us-ing the trained SVMwe can calculate the decision-valueof the input data.But being different from the tradi-tional method,the decision value fromconventional signdecision-making function is used as the independent vari-able in fuzzy decision-making function to measure thebelief degree of each input point.The general structureof the whole processing (Fig 2) can be divided into threemain stages:SVM straining,decision value prediction,and fuzzy decision-making.Fig.2 fuzzy decision-making SVMprocessIn the decision-making part,a fuzzy model is con-structed to replace the sign function in conventionalmodel.This is because interaction usually exists in manyreal conditions,especially around the boundary betweenclasses.So misclassiﬁed cases occur in the neighbor ofthreshold.Being different from conventional methods,the fuzzy boundary in our model makes zero not be theonly important value as a threshold,but clusters will beconsidered as fuzzy sets.3.2 Fuzzy Decision-making FunctionsAssume that these two fuzzy sets are called A (thevalue in this set are deemed to be predicted as -1) andB (the values in this set are deemed to be predicted as+1).A gray-zone should be built to take the place of thehard boundary.The values of the function can indicatetheir reliability which handles the concept of belief prob-lem(belief degree between 1 (completely believable) and0 (completely false)).Through several experiments,arc-tangent function is conﬁrmed to construct the curvilinearfuzzy functions.The boundaries of set A and set B canbe written as:fA(v) =arctan(−v · s −d · s)π+0.5;(7)fB(v) =arctan(v · s −d · s)π+0.5.(8)00.10.20.30.40.50.60.70.80.91-10-50510Believable valueDecision valueBondary Function of Fuzzy Set BBondary Function of Fuzzy Set APositive Set BMinus Set A-dd-ssFig.3 curvilinear fuzzy decision modelwhere d indicates the discerption degree,s means thescale factor and the decision value is denoted as v.Thecross section ﬁgure of the fuzzy sets is shown in Fig 3.4.WEIGHTED HARMONIC MEANOFFSET4.1 Introduction of WHMoffsetBecause most real cases are lopsided,the borderlinecan not be described by formulae (7) and (8) accurately.As a result of the imbalance of dataset,the midpoint ofthe gray boundary zone always not equals to zero.Soan offset constant δ is needed to be introduced to denotethe distance between the real borderline and the theoreticone.Because support vectors are the nearest samplesaround the boundary,so one way to calculate the offset δis to compute the mean of decision values of support vec-tors which has been proposed in our former models [1].The formula can be written as follows:δ =

ni=1Sin(9)where S1,...,Snare the decision values of supports.This mean value can modify the separation boundary toa better position than the conventional one,but if somesupport vectors are also inﬂuenced by the noises in thereal-world datasets,this offset will be inauthentic.For ﬁnding a more proper offset,it is necessary toignore these false support vectors,so we introduce aweighted harmonic mean of decision values of supportvectors(SVs).As the previous offset parameter proposedby us,SVs are used as the test data to gain their decisionvalues S1,...,Sn,where n is the number of support vec-tors.Suppose the corresponding weights are β1,...,βnthen the offset has a conﬁguration shown as followsδ =

ni=1βi

ni=1βiSi(10)As we concerned in the subsection 2.2,the supportvectors from subset A are the points of A ’on’ or ’be-low’ the bounding plane wTx = b +1,and similarly thesupport vectors fromBare those points of ’on’ or ’above’the plane wTx = b −1.So if some of the support vectorsis inﬂuenced by the interaction and noises strongly,thenthese samples may apart from the separation boundary.Therefore,we need to give these support vectors smallerweights so that they will become invalid in the calculationof the offset.Based on this consideration,we employ theBlackman equation to calculate the weights of supportvectors,which is shown in the following,βi= 0.42 −0.5cos(π(Si+Smax)Smax)+0.08cos(2π(Si+Smax)Smax) (11)where Smaxis the largest absolute value of all the sup-port vectors’ decision values,Simeans a certain supportvector’s decision value,and βidenotes its correspondingweight.4.2 Decision-making model with WHMoffsetFig.4 fuzzy decision model with an offsetIntroduce the proposed WHM offset to the fuzzyboundary SVMclassiﬁer model concerned above,a mod-iﬁed fuzzy boundary can be obtained,and then the formu-las (7) and (8) can be rewritten as:f

A(v) =arctan(−v · s −d · s +δ · s)π+0.5 (12)f

B(v) =arctan(v · s −d · s −δ · s)π+0.5 (13)where d,and s are chosen from a great deal of experi-ments.We can use the values f

A(v) and f

B(v) to esti-mate whether an input vector should be labeled as +1 or-1.Based on the formulas (12) and (13),the boundingcurves of the model with WHM offset can be drawn inFig.4.So the whole process of the fuzzy decision-makingSVMwith WHMoffset can be described as shown in Fig.5.From the ﬁgure we can ﬁnd the whole procedure con-sists of four steps as follows,Step 1:SVMtraining process.We can ﬁnd the supportvectors and compute the parameters for building SVMclassiﬁer model.Step 2:Prediction process for support vectors.We cancalculate the decision values of support vectors and theweights corresponding to them.Fig.5 model with weighted harmonic mean offsetStep 3:Prediction process for the test datasets.Deci-sion values of test input vectors can be gained.Step 4:Final decision-making process.Using thedecision values of the support vectors and the weightsfrom step 2 to calculate the offset.And using the fuzzydecision-making method to predict the ﬁnal output labelsfor the input vectors.5.SIMULATION RESULTS5.1 Simulation 1:Heart disease detection problem5.1.1 Description of problemThe ﬁst problem we used to test our models is heartdisease detection,which is from Statlog datasets.Thewhole database consists of two classes:absence class andpresence class.The total number of examples is 200,in which the number of samples in the absence class is150 and the number of samples in the presence class is50.Each sample in the datasets has 13 main attributes,which are extracted from a larger set of 75.Denote eachpair of input vector and output label as {X(n),Y (n)},where X(n) is the input vector with 13 attributes andY (n) is the output label with two poles:-1 (absence)Table 1 Comparison of two offsets in simulation 1C110100model A94.615%95.38%96.15%model B88.46%91.54%92.3%model C95.38%96.15%96.15%model D90%92.3%93.08%or +1 (presence),(n = 1,2,...,200).Assume that thevector X(n) = (x1(n),x2(n),...,x13(n)),the signiﬁ-cations of these 13 elements are:age,sex,chest paintype,resting blood pressure,serum cholestoral,fastingblood sugar,resting electrocardiographic results,maxi-mum heart rate,exercise induced angina,oldpeak,theslope of the peak exercise ST segment,number of majorvessels and thal(3=normal;6=ﬁxed defect;7=reversabledefect).In brief,our purpose is to predict the bipolar outputY (n) with as few sign errors as possible,through thegiven input vectors X(n).5.1.2 Results of ClassiﬁcationBecause the database used in our simulations comesfrom the real-world,it is non-separable apparently.Innon-separable SVM classiﬁer,not only the kind of ker-nel but the regularization parameter C can also effect theaccuracy which is a value used to evaluate classiﬁers.C is used to denote the tolerance of SVM,along withchanging of which,both the results of training and theprediction are changed.For conﬁrming the widely ap-plicability of our method,we let C equal to 1,10,100in turn as common situations,and use two kinds of ker-nels presented in a previous section.They are RBF kerneland Polynomial kernel.Using the fuzzy decision-makingfunction and WHMoffset in the prediction part of SVMclassiﬁers based on these two kinds of kernels,we canform two classiﬁcation models:fuzzy decision-makingRBF-SVMmodel with WHMoffset and fuzzy decision-making Polynomial-SVMmodel with WHMoffset.Using the ﬁrst 70 samples as the training data,and con-sidering the remainder as the test data,the accuracies ofthese two proposed models are gained.Comparing withour previous models with mean offsets,the experimentresults are shown in Table 1.Compare model A (fuzzydecision-making RBF SVMwith mean offset) with modelC (fuzzy decision-making RBF SVM with WHM offset)and compare model B (fuzzy decision-making Poly SVMwith mean offset) with model D (fuzzy decision-makingPoly SVM with WHM offset),then we can ﬁnd that thenewmodels with WHMoffsets have better performances.And then compare with traditional RBF-SVM classi-ﬁer,traditional Polynomial-SVM classiﬁer,the accura-cies (y-axis) with three different values of parameter C(x-coordinate axis) are shown in Fig.6.In the aspect of parameter selection,we choose a stan-dard RBF kernel with common width σ2= 4 and Poly-nomial kernel with d = 3.Using different combinationsof kernel and C value to train the SVM classiﬁers,we80%85%90%95%100%110100AccuracyCFuzzy Decision RBF SVM with WHM OffsetFuzzy Decision Poly SVM with WHM OffsetRBF SVMPoly SVMFig.6 Accuracy Curves for heart disease detectionget some different models.And then the boundary off-set δ can be worked out by the decision values of supportvectors and their weights.We set the scale coefﬁcients = 256 for the classiﬁers using Polynomial kernel,andset s = 512 for the classiﬁers using RBF kernel.5.2 Simulation 2:Misﬁre detection problem5.2.1 Description of problemThe second database we used is also from real-world,which is a misﬁre detection problem in internal combus-tion engines.The whole database is divided into two sub-sets,one is used as the training data and the other is thetest data.These data contain the information of time se-ries of 50000 samples which are produced by physicalsystem (10-cylinder internal combustion engine).Sim-ilarly as the simulation 1,each sample k of time seriesconsists of four inputs elements and one output label,each pair of input vector and output label also can be writ-ten as {X(k),Y (k)},where X(k) is the input vector andY (k) is the output label,(k = 1,2,...,50000).The four elements of input vectors x1(k),x2(k),x3(k) and x4(k) represent cylinder identiﬁer (ﬁrst),en-gine crankshaft speed in Revolutions Per Minute (RPM)(second),load (third) and crankshaft acceleration (fourth)respectively.Y (k) may have two values:-1 (normal ﬁr-ing) or +1 (misﬁre).The amounts of the normal cases inthe training data and the test data are 45093 and 45395and the numbers of samples in the misﬁre classes in thetraining and test datasets are 4907 and 4605.We canﬁnd that this database is also an imbalance one.In thisproblem,Our purpose is also to detect the value of outputY (k) for a certain given input vectors X(k).5.2.2 Results of ClassiﬁcationAs described in the simulation 1,we also let C equal to1,10,100,and use RBF kernel and Polynomial kernel inour experiments respectively.The comparison of modelswith different offsets is shown in Table 2.Model A,B,Cand D have same deﬁnitions as Table 1.In this problemwe can also obtain a better performance from the newmodel with a WHMoffset.The accuracies of traditional classiﬁers and proposedmodels with WHMoffset are shown in Fig.7.In the aspect of parameter selection,we also set σ2=Table 2 Comparison of two offsets in simulation 2C110100model A95.34%95.654%95.42%model B92.656%93.858%95.008%model C95.74%96.02%95.82%model D92.96%94.27%95.01%90%92%94%96%98%100%110100AccuracyCFuzzy Decision RBF SVM with WHM OffsetFuzzy Decision Poly SVM with WHM OffsetRBF SVMPoly SVMFig.7 Accuracy Curves for misﬁre detection problem4 for RBF kernel and d = 3 for Polynomial kernel.Assame as the simulation 1,we set s = 256 for the clas-siﬁers using Polynomial kernel,and set s = 512 for theclassiﬁers using RBF kernel.5.3 Comparison among Different ClassiﬁersAs shown in Fig.6 and Fig.7,with different values ofparameter C the accuracies of the models using RBF ker-nel and the ones using Polynomial kernel are also differ-ent.For different problems,one classiﬁer may presentsdifferent performances,so how to choose a proper kernelis still a problemin SVMmethod.In other words,for thedataset in simulation one RBF kernel is better than thePolynomial kernel but for the second problem,Polyno-mial kernel is more proper.Especially,from the ﬁgures we can also ﬁnd that themethod proposed in this paper can present a better ca-pability than traditional SVM classiﬁer and our formermodels for different regularization parameter C,and theperformance of this model is more robust too.Even as what has been discussed,a more properdecision-making function,a more valid offset,a more ap-propriate width of gray zone and a more suitable kernelwould make the classiﬁer be more effective.6.CONCLUSIONSWe have presented an fuzzy decision-making algo-rithm for building SVM classiﬁers and introduced aWHMoffset to modify the excursion of the boundary inreal-world datasets.The model is proposed in this paper to improve theperformances of SVMin the classiﬁcation problems.Byusing a fuzzy decision-making function,the predictionboundary is transformed from a straitlaced conﬁgurationto a more ﬂexible structure so that the inﬂuence betweendata sets can be reduced and many misclassiﬁed pointsin the prediction part of traditional SVMclassiﬁer have achance to be relabeled.In addition,we also construct a WHMoffset δ.The in-troduction of this offset can modify the separation bound-ary between each two sets to a more appropriate posi-tion and then the error caused by the imbalance of datasets could be overcome.Especially along with the em-ployment of the weight harmonic mean method,an anti-jamming offset can be calculated without considering thesupport vectors inﬂuenced by the noises,since the distri-bution of the weight values is used to control the effec-tiveness of SVs to the offset parameter.Although the model proposed presents some well per-formances in simulations,there still remains some futureproblems to be solved.How to choose a proper kernelfor a certain database,how to build a more robust kernelfunction and how to conﬁrm the parameters in the fuzzydecision-making part automatically will be main direc-tions in our future research.REFERENCES[1] Boyang LI,Jinglu HU,Kotaro Hirasawa,Pu SUN,Kenneth Marko.“Support Vector Machine withFuzzy Decision-Making for Real-world Data Clas-siﬁcation”.In IEEE World Congress on Computa-tional Intelligence 2006,International Joint Con-ference on Neural Networks,Canada,2006.[2] N.Cristianini,J.Shawe-Taylor,.“An Introductionto Support Vector Machines”.Cambridge,UK:Cambridge Univ.Press,2000.[3] P.Bartlett,J.Shawe-Taylor,.“Generalization per-formance of support vector machines and other pat-tern classiﬁers”.In Advances in Kernel Method-sSupport Vector Learning,Cambridge,MA:MITPress,1998.[4] O.Chapelle and V.Vapnik.“Model selection forSupport Vector Machines”.In S.Solla,T.Leen,andK.-R.M¨uler,editors,Adv.Neural Inf.Proc.Syst.12,Cambridge,MA,MIT Press,2000.[5] Steve R.Gunn.“Support Vector Machines for Clas-siﬁcation and Regression”.In Technical Report,Faculty of Engineering,Science and MathematicsSchool of Electronics and Computer Science,10May,1998.[6] Johan Suykens.“Least Squares Support Vector Ma-chines”.In Tutorial IJCNN,2003.[7] Alex J.Smola and Bernhard Sch¨okopf.“On a ker-nelbased method for pattern recognition,regression,approximation and operator inversion”.In Algorith-mica,22:211231,1998.emTechnical Report 1064,GMD FIRST,April 1997.[8] E.Osuna,R.Freund,F.Girosi.“Support Vec-tor Machines:Training and Applications”.In A.I.Memo 1602,MIT A.I.Lab.,1997.