Comments 0

Document transcript

On the Convergence of Some Possibilistic ClusteringAlgorithmsJian ZhouSchool of Management,Shanghai University,Shanghai 200444,ChinaLongbing CaoFaculty of Engineering and Information Technology,University of Technology,Sydney,AustraliaNan Yang‡School of Statistics and Management,Shanghai University of Finance and Economics,Shanghai 200433,ChinaAbstractIn this paper,an analysis of the convergence performance is conducted for a class ofpossibilistic clustering algorithms utilizing the Zangwill convergence theorem.It is shown thatunder certain conditions the iterative sequence generated by a possibilistic clustering algorithmconverges,at least along a subsequence,to either a local minimizer or a saddle point of theobjective function of the algorithm.The convergence performance of more general possibilisticclustering algorithms is also discussed.Keywords:Fuzzy clustering,possibilistic clustering,convergence1 IntroductionPossibilistic clustering,initiated by Krishnapuram and Keller [7],is an approach of fuzzy cluster-ing based on the possibilistic memberships representing the degrees of typicality,which has beenextensively studied and successfully applied in many areas (see,e.g.,[3][6][9][13]).The process offuzzy clustering partitions a data set X = {x1,x2,· · ·,xn} ⊂ ℜpinto c (1 < c < n) clusters,andeach datum xjmay belong to some clusters simultaneously with diﬀerent degrees µij.The possi-bilistic clustering algorithm (PCA) in [7],denoted by PCA93,performs clustering by minimizingthe objective functionJPCA93(,A) =c∑i=1n∑j=1µmij||xj−ai||2+c∑i=1ηin∑j=1(1 −µij)m(1)subject to0 ≤ µij≤ 1,1 ≤ i ≤ c,1 ≤ j ≤ n (2a)∑ci=1µij> 0,1 ≤ j ≤ n (2b)∑nj=1µij> 0,1 ≤ i ≤ c (2c)(2)where A = (a1,a2,· · ·,ac) ∈ ℜcpis the cluster center matrix,m ≥ 1 is a weighting exponentcalled the fuzziﬁer,∥· ∥ is a norm induced by any inner product,and the coeﬃcients ηi(1 ≤ i ≤ c)zCorresponding author.Tel.:+86-13816247965.E-mail address:yangnan@mail.shufe.edu.cn (N.Yang).1are positive.The constraint (2b) guarantees that each feature point should belong to at least onecluster with nonzero membership,and (2c) assures that none of the clusters is empty and thuswe really have a partition into no less than c clusters.It should be noted that throughout thispaper we take the l2norm for ∥ · ∥,i.e.,||xj− ai|| =√∑pk=1(xjk−aik)2.Let UXdenote theset of all the matrices  = (µij)c×nsatisfying the constraints (2a) ∼ (2c).In order to solve theoptimization problem above,Krishnapuram and Keller [7] suggested an iterative algorithm,i.e.,PCA93,through the update equations for  and A,which are both obtained from the necessaryconditions for a minimizer of JPCA93withµij=11 +(||xj−ai||2ηi)1/(m−1),1 ≤ i ≤ c,1 ≤ j ≤ n,(3)andai=∑nj=1µmijxj∑nj=1µmij,1 ≤ i ≤ c,(4)respectively.After that,the other three PCAs were presented in [8][5][10],denoted as PCA96,PCA03,and PCA06,respectively,which are listed as follows,(PCA96,Krishnapuram and Keller [8]) the optimization problem:JPCA96(,A) =c∑i=1n∑j=1µij∥xj−ai∥2+c∑i=1ηin∑j=1(µijlnµij−µij)subject to:0 < µij≤ 1,1 ≤ i ≤ c,1 ≤ j ≤ n(5)with the update equations for µij= exp{−1ηi∥xj−ai∥2},1 ≤ i ≤ c,1 ≤ j ≤ n (6)and the update equations for Aai=∑nj=1µijxj∑nj=1µij,1 ≤ i ≤ c;(7)(PCA03,Hoppner and Klawonn [5]) the optimization problem:JPCA03(,A) =c∑i=1n∑j=1µmij||xj−ai||2+c∑i=1ηin∑j=1(µmij−mµij)subject to: ∈ UX(8)with the update equations for µij=1(1 +||xj−ai||2ηi)1/(m−1),1 ≤ i ≤ c,1 ≤ j ≤ n,(9)and the update equations (4) for A;(PCA06,Yang and Wu [10]) the optimization problem:JPCA06(,A) =c∑i=1n∑j=1µmij||xj−ai||2+βm2√cc∑i=1n∑j=1(µmijlnµmij−µmij)subject to:0 < µij≤ 1,1 ≤ i ≤ c,1 ≤ j ≤ n(10)2with the update equations for µij= exp{−m√cβ||xj−ai||2},1 ≤ i ≤ c,1 ≤ j ≤ n (11)and the update equations (4) for A,whereβ =n∑j=1||xj−x||2/n withx =n∑j=1xj/n.(12)Furthermore,it was claimed in [2][5] that for diﬀerent choices of the second term in the objectivefunctions of the PCAs,diﬀerent algorithms can be obtained with diﬀerent membership functions.Subsequently,a general framework of the PCAs was provided in [12] by examining the characteris-tics of these membership functions.However,except the aforementioned four classes of functions,no more objective functions were suggested for possibilistic clustering in the literature.Although the extensive numerical experiments with these PCAs on diﬀerent data sets from awide range of applications have established the applicability and practicality of such techniques,the convergence of the PCAs has not been rigorously established.In [5],the convergence charactersof the fuzzy c-means (FCM) algorithm and the PCAs were discussed under a uniﬁed view,and thealgorithm PCA03 was testiﬁed to be convergent through a reformulation of the original objectivefunction JPCA03.It was also stated that the proof can be generalized to other similar algorithms.However,it is not easy for the PCAs because of complexity and diversity of the membershipfunctions of the PCAs,and the convergence issue on the PCAs has not been resolved explicitly.In this paper,we investigate the convergence performance of the PCAs.Bezdek [1] and Hath-away et al.[4] had established the convergence of FCMutilizing the reformulated Zangwill conver-gence theorem.It is shown in this paper that this approach works as well for the PCAs.We ﬁrstshow by means of the Zangwill’s theorem that the iterative sequence generated by PCA93 wouldconverge globally to a minimizer or a saddle point of the objective function JPCA93at worst alonga subsequence,where “globally” means the convergence occurs from any initializations.The resultis also applicable to PCA96 and PCA03.The rest of this paper is organized as follows.In Section 2,the problem description of theconvergence of the PCAs is stated,and the reformulated Zangwill’s convergence theorem to beused is reviewed brieﬂy.Then the convergence of PCA93 is proven in Section 3 utilizing theZangwill’s theorem.In Section 4,we demonstrate that the proof can be extended to PCA96 andPCA03 with some slight modiﬁcations.Section 5 contains a short summarization of the proofstrategy.2 Convergence of the PCAsNumerical experiments with real data have veriﬁed the usefulness of the possibilistic clusteringalgorithms.Our goal below is to prove they are theoretically sound.As a preliminary,this sectionﬁrst describes the problem by deﬁning some new notations,and then expatiate the general proofstrategy to be used for this problem.2.1 Problem descriptionIn general,the procedure of the PCAs can be summarized as follows:Possibilistic Clustering AlgorithmsStep 0One of the optimization problems (1),(5) and (8),is given.In other words,the objectivefunction and the constraints are predetermined.Step 1Initialize (0)∈ ℜcp,and set a small number ϵ > 0 and iteration counter l = 0.Step 2Compute A(l+1)using the update equations (4) or (7) for A.3Step 3Compute (l+1)using the evaluation equations (3),(6),or (9) for .Step 4Increase l until maxi,j|µ(l+1)ij−µ(l)ij| < ϵ.By this procedure,an iterative sequence {((l),A(l))} is generated.The problem we seek toresolve is whether or not {((l),A(l))} converges.The following notation is given in order to furtherdescribe the iteration.LetF:ℜcp7→UX,F(A) = F(a1,a2,· · ·,ac) =  (13)where the entries of  = (µij)c×nare calculated by (3),(6),(9) or (11).LetG:UX7→ℜcp,G() = A= (a1,a2,· · ·,ac) (14)where the vectors ai∈ ℜp(1 ≤ i ≤ c) are calculated via (4) or (7).Using F and G,we deﬁne thePCA operator as Tp:UX×ℜcp7→UX×ℜcpbyTp:T2◦ T1(15)whereT1:UX×ℜcp7→ℜcp,T1(,A) = G(),(16)T2:ℜcp7→UX×ℜcp,T2(A) = (F(A),A).(17)Then we haveTp(,A) = (T2◦ T1)(,A) = (F ◦ G(),G()).(18)By (18),the iterative sequence can be rewritten as((l),A(l)) = Tlp((0),A(0)) = ((F ◦ G)l((0)),Gl((0))),l = 1,2,· · ·(19)One of the most critical issues in the PCAs is to prove whether or not {Tlp((0),A(0))} deﬁned in(19) is convergent.2.2 Proof strategyThe Zangwill’s convergence theorem [11] provides a useful approach to analyze the convergence ofsequences which has been utilized to establish the convergence of FCM in [1][4].Motivated by thesimilarity between the FCM algorithm and the PCAs,our proof strategy will apply the theoremof Zangwill to the PCA operator Tp.Let f:ℜp7→ℜ be a real function with domain Df,and S be the solution set of the optimizationproblem minDff(x).Zangwill deﬁned an iterative algorithm for solving the problem as any pointto set mapping Z:Df7→P(Df),where P(Df) is the power set of Df.The algorithm of interesthere is a point to point map Z = Tp,so we are interested in the special case Z:Df7→ Df.Consequently,we reduce the closedness constraint on Z in [11] by ordinary continuity,and restatethe convergence theorem for our particular case as follows.Theorem 1Let the point-to-point map Z:Df7→Dfdetermine an algorithm that generates thesequence {z(l)}∞1for a given point z(0)∈ Df.Also let a solution set S ⊂ Dfbe given.SupposeC1.(Descent Constraint) there is a continuous function g:Df7→ℜ such that:(a) if z is not a solution,then g(Z(z)) < g(z),(b) if z is a solution,then g(Z(z)) ≤ g(z);C2.(Continuity Constraint) Z is continuous on (Df\S);C3.(Compactness Constraint) all points z(l)are in a compact set in K ⊂ Df,4then either the algorithm stops at a solution,or the limit of any convergent subsequence is asolution.On the convergence issue of the PCAs,we have Z = Tp,z(l)= ((l),A(l)),and g correspondsto the objective functions of the PCAs.In order to proceed,what we need to do is to verifythe objective function (e.g.,JPCA93) satisﬁes the descent constraint for a proper solution set S,Tpsatisﬁes the continuity constraint,and {((l),A(l))}∞1satisﬁes the compactness constraint.Following this strategy,Section 3 gives the detailed proof procedure for PCA93.3 Convergence of PCA93In this section,we assume that the fuzziﬁer m > 1.In order to establish convergence of PCA93,the three constraints in Theorem 1 are veriﬁed in turn.3.1 Descent constraintFirst we show that the descent constraint holds for Jp= JPCA93,which is the ﬁrst requirement ofTheorem 1.Lemma 1Let φ:UX7→ ℜ,φ() = Jp(,A),where A is xed.Then ∗∈ UXis a globalminimum solution of φ if and only if ∗= F(A),where F is dened by (13) and (3).Proof:Minimization of φ over UXis an optimization problem with 2cn+n+c linear inequalityconstraints (2a) ∼ (2c).By lettingyij() = µij−1,1 ≤ i ≤ c,1 ≤ j ≤ n,(20)zij() = −µij,1 ≤ i ≤ c,1 ≤ j ≤ n,(21)ζj() = −c∑i=1µij,1 ≤ j ≤ n,(22)ςi() = −n∑j=1µij,1 ≤ i ≤ c,(23)the original optimization problem is rewritten asminφ()subject to:yij() ≤ 0,1 ≤ i ≤ c,1 ≤ j ≤ nzij() ≤ 0,1 ≤ i ≤ c,1 ≤ j ≤ n.ζj() < 0,1 ≤ j ≤ n.ςi() < 0,1 ≤ i ≤ c.(24)Suppose that ∗is a minimizer of (24).Then it must satisfy the following KKT conditions,(1) ∗is feasible,i.e.,yij(∗) ≤ 0,1 ≤ i ≤ c,1 ≤ j ≤ n,zij(∗) ≤ 0,1 ≤ i ≤ c,1 ≤ j ≤ n,ζj(∗) < 0,1 ≤ j ≤ n,ςi(∗) < 0,1 ≤ i ≤ c;(25)5(2) There exist 2cn nonnegative multiplies λij≥ 0 and τij≥ 0 such thatλijyij(∗) = 0,1 ≤ i ≤ c,1 ≤ j ≤ n,(26)τijzij(∗) = 0,1 ≤ i ≤ c,1 ≤ j ≤ n;(27)(3)∂φ∂µij(∗) +c∑i=1n∑j=1λij∂yij∂µij(∗) +c∑i=1n∑j=1τij∂zij∂µij(∗) = 0,1 ≤ i ≤ c,1 ≤ j ≤ n.(28)Substituting (20) and (21) into (25)∼(28),we have0 ≤ µ∗ij≤ 1λij(µ∗ij−1) = 0τijµ∗ij= 0m(µ∗ij)m−1d2ij−mηi(1 −µ∗ij)m−1+λij−τij= 0(29)for all 1 ≤ i ≤ c and 1 ≤ j ≤ n,∑ci=1µ∗ij> 0 for 1 ≤ j ≤ n and∑nj=1µ∗ij> 0 for ≤ i ≤ c.Below we will show that all the multipliers λijand τijare zero by contrapositive.If there exists amultiplier λst> 0 for some (s,t),it follows from (29) thatµ∗st= 1,τst= 0,md2st+λst= 0.(30)Then we have λst= −md2st≤ 0,which is contradictive to the assumption λst> 0.Similarly,ifthere exists a multiplier τst> 0 for some (s,t),it follows from (29) thatµ∗st= 0,λst= 0,−mηs−τst= 0.(31)Then we have τst= −mηs< 0,which is contradictive to the assumption τst> 0.Substitutingλij= 0 and τij= 0 into (28),we obtain∂φ∂µij(∗) = m(µ∗ij)m−1d2ij−mηi(1 −µ∗ij)m−1= 0⇔µ∗ij=11 +(d2ij/ηi)1/(m−1),1 ≤ i ≤ c,1 ≤ j ≤ n.(32)It is clear that µ∗ij> 0 for all (i,j),thus ∗is a feasible solution satisfying (25).The necessity isproved.To show the suﬃciency,we examine Hφ(),the (cn × cn) Hessian matrix of φ evaluated at ∈ UX.It is easy to deduce that∂2φ∂µij∂µi′j′() ={λijfor any i = i′and j = j′0 else.(33)whereλij= m(m−1)(µij)m−2d2ij+m(m−1)ηi(1 −µij)m−2,1 ≤ i ≤ c,1 ≤ j ≤ n.(34)Since we assume m > 1 in this section,consequently Hφ() is a diagonal matrix with all thediagonal element λijpositive,i.e.,a positive deﬁnite matrix.Since UXis a convex set involving aset of linear constraints,minimizing φ subject to  ∈ UXis a convex program with a strict convexfunction φ over a convex set UX.Moreover,it follows from the necessity and (32) that ∗= F(A)is the one and only KKT point and∂φ∂µij(∗) =∂φ∂µij(F(A)) = 0,1 ≤ i ≤ c,1 ≤ j ≤ n.(35)As a result,∗= F(A) is a strict global minimum solution of φ.6Next,we ﬁx  ∈ UXand consider the minimization of Jp(,A) with respect to A.Lemma 2Let ψ:ℜcp7→ℜ,ψ(A) = Jp(,A),where  ∈ UXis xed.Then A∗is a strict globalminimum solution of ψ if and only if A∗= G(),where G is dened by (14) and (4).Proof:Let us examine the Hessian matrix Hψ(A) of ψ.It is easy to deduce that Hψ(A) is adiagonal matrix deﬁned by∂2ψ∂aik∂ai′k′(A) ={2∑nj=1µmijfor any i = i′and k = k′0 else.(36)Since  ∈ UX,we have∑nj=1µij> 0 which follows that∑nj=1µmij> 0 for any 1 ≤ i ≤ c.Thatimplies that Hψ(A) is a positive deﬁnite matrix for all A∈ ℜcp,and hence ψ(A) is a strict convexfunction in ℜcp.As a result,A∗is a strict global minimum solution if and only if∂ψ∂aik(A∗) = 2n∑j=1µmij(xjk−a∗ik) = 0⇔a∗ik=∑nj=1µmijxjk∑nj=1µmij,1 ≤ i ≤ c,1 ≤ k ≤ p(37)which are equivalent to A∗= G().Based on Lemmas 1 and 2,the ﬁrst requirement of Theorem 1 – Jpsatisﬁes the descentconstraint – can be obtained in the following.Lemma 3Let Sp= {(∗,A∗) ∈ UX×ℜcp

Jp(∗,A∗) < Jp(,A∗) ∀ ̸= ∗(38)andJp(∗,A∗) < Jp(∗,A) ∀A̸= A∗} (39)be the solution set,and let (,A) ∈ UX× ℜcp.We have Jpis continuous and Jp(Tp(,A)) ≤Jp(,A) with strictness in the inequality if (,A)/∈ Sp,where Tpis the algorithm operator ofPCA93 in (15).Proof:First,since {y →∥y∥2},{y →1 −y} and {y →ym} are continuous,and Jpis the sum ofproducts of such functions,so Jpis continuous on UX×ℜcp.Next,suppose (,A) ∈ UX×ℜcp.Then it follows from (18) thatJp(Tp(,A)) = Jp(F ◦ G(),G())≤ Jp(,G()) by Lemma 1≤ Jp(,A) by Lemma 2.(40)If the equality prevails throughout in the above argument,then we have = F ◦ G() and A= G().(41)By Lemmas 1 and 2,it follows thatJp(,A) = J(F ◦ G(),G())< Jp(′,G()) by Lemma 2= Jp(′,A),∀′̸= (= F ◦ G())(42)7andJp(,A) = J(,G())< Jp(,A′) by Lemma 1,∀A′̸= A(= G()).(43)(42) and (43) imply that (,A) ∈ Sp.3.2 Continuity constraintThe second requirement of Theorem 1 is that Tpshould be continuous on the domain of JpwithSpdeleted.Tpis in fact continuous on all of UX×ℜcp,as we show in the following.Lemma 4Tpis continuous on UX×ℜcp.Proof:Since Tp= T2◦ T1,and the composition of continuous functions is again continuous,itsuﬃces to show that T1and T2are each continuous.Since T1(,A) = G(),T1is continuous if Gis.To see that G is continuous in the variable ,note that G is a vector ﬁeld,with the resolutionby (cp) scalar ﬁeld asG = (G11,G12,· · ·,Gcp):ℜcn7→ℜcp(44)where Gik:ℜcn7→ℜ is deﬁned via (4) asGik() =∑nj=1µmijxjk∑nj=1µmij= aik,1 ≤ i ≤ c,1 ≤ k ≤ p.(45)Now {µij→µmij} is continuous,{µmij→µmijxjk} is continuous,and the sumof continuous functionsis again continuous,thus Gikis the quotient of two continuous functions.In view of constraint (2c),the denominator∑nj=1µmijnever vanishes,then Gikare also continuous for all (i,k).Therefore,G,and in turn T1,are continuous on their entire domains.Similarly,since T2(A) = (F(A),A),it suﬃces to show that F is a continuous function in thevariable A.F is a vector ﬁeld with the resolution by (cn) scalar ﬁelds asF = (F11,F12,· · ·,Fcn):ℜcp7→ℜcn(46)where Fij:ℜcp7→ℜ is deﬁned via (3) asFij(A) =11 +(||xj−ai||2ηi)1/(m−1).(47)Since {ai→∥xj−ai∥} is continuous,{∥xj−ak∥ →∥xj−ak∥2/(m−1)} is continuous,and the sumof continuous functions is again continuous,thus Fijis the quotient of two continuous functions.Itfollows from dij= ∥xj−ai∥ ≥ 0 that the denominator 1 +(||xj−ai||2/ηi)1/(m−1)never vanishes,thus Fijare also continuous for all 1 ≤ i ≤ c and 1 ≤ j ≤ n.Therefore,F as well as T2arecontinuous on their entire domains.3.3 Compactness constraintThe ﬁnal condition required for Theorem1 is compactness of a subset of (UX×ℜcp) which containsall of the possible iterative sequences generated by Tp.In order to do that,some notations aregiven ﬁrst.Let conv(X) denote the convex hull of data set X,which is the minimal close convexset containing X.Since X is ﬁnite,i.e.,each xk∈ X has ﬁnite components,so the diameter of Xis ﬁnite,i.e.,dX= max1≤s,t≤n∥xs−xt∥ < ∞.(48)8The coeﬃcients ηi(1 ≤ i ≤ c) in JPCA93are calculated byηi= K∑nj=1µmij||xj−ai||2∑nj=1µmij,1 ≤ i ≤ c,(49)where the constant K > 0,or alternatively,ηi=∑µij≥α||xj−ai||2∑µij≥α1,1 ≤ i ≤ c,(50)where 0 < α < 1 is predetermined.In [7],the value of ηiis suggested to be ﬁxed for all iterationsfor the sake of stabilities.So the parameters ηi,1 ≤ i ≤ c,are actually positive constants in thiscase.Letη = min{η1,η2,· · ·,ηc} (51)and letD =11 +(d2X/η)1/(m−1),(52)which is a positive constant to be used in the following lemma.Lemma 5Let [conv(X)]cbe the c-fold Cartesian product of the convex hull of X,[D,1]cnbe thecn-fold Cartesian product of the closed interval [D,1],and ((0),A(0)) be the starting point ofiteration with Jp.Then((l),A(l)) = Tlp((0),A(0)) ∈ [D,1]cn×[conv(X)]c,l = 1,2,· · · (53)and [D,1]cn×[conv(X)]cis compact in UX×ℜcp.Proof:Let (0)∈ UXbe chosen,which is possibly not in [D,1]cn.Then A(0)= G((0)) iscalculated using (4) so thata(0)i=∑nj=1(µ(0)ij)mxj∑nj=1(µ(0)ij)m,1 ≤ i ≤ c.(54)By lettingρik=(µ(0)ik)m∑nj=1(µ(0)ij)m,1 ≤ k ≤ n,(55)we rewrite (54) asa(0)i=n∑k=1ρikxk,1 ≤ i ≤ c (56)withn∑k=1ρik=n∑k=1((µ(0)ik)m∑nj=1(µ(0)ij)m)=∑nk=1(µ(0)ik)m∑nj=1(µ(0)ij)m= 1.(57)Furthermore,it follows from the constraints (2a) and (2c) that 0 ≤ ρik≤ 1 for all 1 ≤ i ≤ c and1 ≤ k ≤ n,which implies that a(0)iis a convex combination of X.Therefore a(0)i∈ conv(X),andhence A(0)∈ [conv(X)]c.Continuing recursively,(1)is calculated via (3) so thatµ(1)ij=11 +(||xj−a(0)i||2ηi)1/(m−1),1 ≤ i ≤ c,1 ≤ j ≤ n.(58)9It follows from (56) and (57) that for any (i,j),∥xj−a(0)i∥ = ∥xj−∑nk=1ρikxk∥= ∥∑nk=1ρikxj−∑nk=1ρikxk∥= ∥∑nk=1ρik(xj−xk)∥≤∑nk=1ρik∥xj−xk∥≤∑nk=1ρikdX= dX.(59)Substituting (59) into (58),we haveµ(1)ij≥11 +(d2X/ηi)1/(m−1)≥11 +(d2X/η)1/(m−1)= D,1 ≤ i ≤ c,1 ≤ j ≤ n.(60)Therefore µ(1)ij∈ [D,1],and hence (1)∈ [D,1]cn.After that a(1)= G((1)) ∈ [conv(X)]cby thesame argument as above.Thus every iterative sequence ((l),A(l)) of Tpbelongs to [D,1]cn×[conv(X)]cfor any l ≥ 1.Furthermore,it is clear that [D,1]cn×[conv(X)]cis a compact set.3.4 Convergence theorem for PCA93We now assemble the hypotheses and results of the above theorems into a formal statement forthe convergence of the algorithm PCA93.Theorem 2(Convergence Theorem for PCA93) Suppose X = {x1,x2,· · ·,xn} ∈ ℜpare given.LetJp(,A) =c∑i=1n∑j=1µmij||xj−ai||2+c∑i=1ηin∑j=1(1 −µij)m,1 < m< ∞ (61)where  ∈ UX,A = (a1,a2,· · ·,ac) with ai∈ ℜpfor all i.If Tp:UX×ℜcp7→UX×ℜcpis thealgorithm operator of PCA93,then for any ((0),A(0)) ∈ UX×ℜcp,either(1) {Tlp((0),A(0))} terminates at a local minimum solution or saddle point of Jp;or(2) any convergent subsequence {Tlkp((0),A(0))} terminates at a local minimumsolution or saddlepoint of Jp.Proof:Taking Jpas g in Theorem 1,Lemma 1 shows that Jpsatisﬁes the descent constraint forthe solution set Sp.Lemma 2 asserts that the iterative algorithm Tpis continuous on UX×ℜcp,and by Lemma 3,the iterative sequences of the operator Tpare always in a compact subset of thedomain of Jp.The result follows immediately from Theorem 1.4 Extensions to PCA96 and PCA03It is conceivable that PCA96 and PCA03 can be proved to be convergent through a similar pro-cedure as above by Theorem 1.Below we show that by presenting the results directly and onlyproviding necessary details.104.1 Descent constraintsLemma 6Let φ:U′X7→ℜ,φ() = JPCA96(,A),where A is xed,U′Xis the domain of  withU′X= {

0 < µij≤ 1,1 ≤ i ≤ c,1 ≤ j ≤ n},(62)and JPCA96is the objective function of PCA96 dened in (5).Then ∗∈ U′Xis a strict globalminimum solution of φ if and only ifµ∗ij= exp{−1ηi||xj−ai||2},1 ≤ i ≤ c,1 ≤ j ≤ n.(63)Proof:At ﬁrst examine the Hessian matrix Hφ().It is easy to deduce that the entries of Hφare calculated by∂2φ∂µij∂µi′j′() ={ηi/µijfor any i = i′and j = j′0 else.(64)Since ηi(1 ≤ i ≤ c) are positive constants,the diagonal elements ηi/µij> 0 for any  ∈ U′X,andthe denominators µij> 0 for all (i,j).Thus Hφ() is positive deﬁnite,which implies that φ is astrict convex function of .Since U′Xis a convex set,the minimization of φ() over U′Xis a convexprogram.Furthermore,the KKT conditions can be used to show that the point ∗calculated by(63) is the one and only KKT point via a similar procedure in the proof of Lemma 1.Hence ∗isa strict global minimum solution of φ if and only if ∗is calculated via (63).Lemma 7Let φ:UX7→ℜ,φ() = JPCA03(,A),where A is xed,and JPCA03is the objectivefunction of PCA03 dened in (8).Also suppose that m > 1.Then ∗∈ UXis a strict globalminimum solution of φ if and only ifµ∗ij=1(1 +||xj−ai||2ηi)1/(m−1),1 ≤ i ≤ c,1 ≤ j ≤ n.(65)Proof:At ﬁrst examine the Hessian matrix Hφ().It is easy to deduce that the entries of Hφare calculated by∂2φ∂µij∂µi′j′() ={τijfor any i = i′and j = j′0 else,(66)where τij= m(m−1)(d2ij+ηiµm−2ij).Since m > 1 and ηi> 0,the diagonal elements τij> 0 forany  ∈ UX.Thus Hφ() is positive deﬁnite,which implies that φ is a strict convex function of .Hence minimizing φ over UXis a convex program.Furthermore,the KKT conditions can be usedto show that the point ∗calculated by (65) is a KKT point via a similar procedure in the proofof Lemma 1.Hence ∗is a strict global minimum solution of φ if and only if ∗is calculated via(65).4.2 Compactness constraintsIt is easy to deduce that Lemmas 2∼4 will also hold for PCA96 and PCA03 according to the similarderivation procedures.Now we investigate the compactness of a subset which contains all of thepossible iterative sequences generated in PCA96 and PCA03.The results are showed as follows.Lemma 8Let [conv(X)]cbe the c-fold Cartesian product of the convex hull of X,[D1,1]cnbe thecn-fold Cartesian product of the closed interval [D1,1] with D1= exp{−d2X/η},((0),A(0)) be thestarting point of iteration with JPCA96,and Tpis the algorithm operator of PCA96.Then((l),A(l)) = Tlp((0),A(0)) ∈ [D1,1]cn×[conv(X)]c,l = 1,2,· · · (67)and [D1,1]cn×[conv(X)]cis compact in U′X×ℜcp.11Proof:It follows from the proof of Lemma 5 that for any (0)∈ U′X,we have A(0)= G((0)) ∈[conv(X)]c.Subsequently,(1)is calculated via (63) so thatµ(1)ij= exp{−1ηi||xj−a(0)i||2},1 ≤ i ≤ c,1 ≤ j ≤ n.(68)Substituting (59) into (68),we haveµ(1)ij≥ exp{−d2X/ηi} ≥ exp{−d2X/η} = D1,1 ≤ i ≤ c,1 ≤ j ≤ n.(69)Therefore µ(1)ij∈ [D1,1],and hence (1)∈ [D1,1]cn.Consequently,it follows from Lemma 5that every iterative sequence ((l),A(l)) of Tpbelongs to [D1,1]cn× [conv(X)]cfor any l ≥ 1.Furthermore,it is clear that [D1,1]cn×[conv(X)]cis a compact set.Lemma 9Let [conv(X)]cbe the c-fold Cartesian product of the convex hull of X,[D2,1]cnbe thecn-fold Cartesian product of the closed interval [D2,1] with D2= (1 +d2X/η)−1m−1,((0),A(0)) bethe starting point of iteration with JPCA03,and Tpis the algorithm operator of PCA03.Then((l),A(l)) = Tlp((0),A(0)) ∈ [D2,1]cn×[conv(X)]c,l = 1,2,· · · (70)and [D2,1]cn×[conv(X)]cis compact in UX×ℜcp.Proof:It follows from the proof of Lemma 5 that for any (0)∈ UX,we have A(0)= G((0)) ∈[conv(X)]c.Subsequently,(1)is calculated via (65) so thatµ(1)ij=1(1 +||xj−a(0)i||2ηi)1/(m−1),1 ≤ i ≤ c,1 ≤ j ≤ n.(71)Substituting (59) into (71),we haveµ(1)ij≥ (1 +d2X/ηi)−1m−1≥ (1 +d2X/η)−1m−1= D2,1 ≤ i ≤ c,1 ≤ j ≤ n.(72)Therefore µ(1)ij∈ [D2,1],and hence (1)∈ [D2,1]cn.Consequently,it follows from Lemma 5that every iterative sequence ((l),A(l)) of Tpbelongs to [D2,1]cn× [conv(X)]cfor any l ≥ 1.Furthermore,it is clear that [D2,1]cn×[conv(X)]cis a compact set.4.3 Convergence theorems for PCA96 and PCA03Finally we conclude the convergence theorems for the two PCAs by assembling the hypotheses andresults of the above theorems.Theorem 3(Convergence Theorem for PCA96) Suppose X = {x1,x2,· · ·,xn} ∈ ℜpare given.LetJPCA96(,A) =c∑i=1n∑j=1µij∥xj−ai∥2+c∑i=1ηin∑j=1(µijlnµij−µij) (73)where  ∈ U′X,A = (a1,a2,· · ·,ac) with ai∈ ℜpfor all i.If Tp:(U′X×ℜcp) 7→ (U′X×ℜcp) isthe algorithm operator of PCA96,then for any ((0),A(0)) ∈ U′X×ℜcp,either(1) {Tlp((0),A(0))} terminates at a local minimum solution or saddle point of JPCA96;or(2) any convergent subsequence {Tlkp((0),A(0))} terminates at a local minimumsolution or saddlepoint of JPCA96.12Theorem 4(Convergence Theorem for PCA03) Suppose X = {x1,x2,· · ·,xn} ∈ ℜpare given.LetJPCA03(,A) =c∑i=1n∑j=1µmij∥xj−ai∥2+c∑i=1ηin∑j=1(µmij−mµij),1 < m< ∞ (74)where  ∈ UX,A = (a1,a2,· · ·,ac) with ai∈ ℜpfor all i.If Tp:(UX×ℜcp) 7→ (UX×ℜcp) isthe algorithm operator of PCA03,then for any ((0),A(0)) ∈ UX×ℜcp,either(1) {Tlp((0),A(0))} terminates at a local minimum solution or saddle point of JPCA03;or(2) any convergent subsequence {Tlkp((0),A(0))} terminates at a local minimumsolution or saddlepoint of JPCA03.5 ConclusionDiﬀerent from the FCMalgorithm,the possibilistic clustering algorithms include a family of PCAswith diﬀerent objective functions and diﬀerent membership functions.This fact makes the the-oretical convergence of the PCAs more complex.Due to the similarity between the FCM andthe PCAs,this paper considers to establish convergence of the PCAs by the speciﬁc case of theZangwill’s convergence theorem.The proof procedure can be summarized as the following fourcritical steps.S1.(Strict Convexity of φ()) For any ﬁxed A ∈ ℜcp,the function φ() = Jp(,A) is astrict convex function of  and the domain of  is convex,which is attained by examiningthe Hessian matrix of φ.This step depends on the objective function and the membershipfunction used.S2.(Strict Convexity of ψ(A)) For any ﬁxed  in the domain,the function ψ(A) = Jp(,A)is a strict convex function of A,which holds for all the PCAs since  is considered as aconstant in this step.S3.(Continuity of Objective Function) The objective function Jp(,A) is continuous in thedomain,which follows from continuity of the membership function used directly.S4.(Compactness of Iterative Sequence) The iterative sequence {((l),A(l))} generated by thePCAs is contained in a compact set.In this step we only need to show that (l)has a positivelower bound.The above proof strategy can be applied to establish the convergence in more general situations.However,it is not applicable to PCA06 since the objective function JPCA06is not strictly convexon ,which does not imply that the algorithm PCA06 does not converge.The performance ofPCA06 requires further investigation.AcknowledgmentsThis work was supported in part by the Shanghai Philosophy and Social Science Planning Projectgrant (2012XAL022),Australian Research Council Discovery grants (DP1096218 and DP130102691)and Linkage grants (LP100200774 and LP120100566).References[1]Bezdek,J.C.,A Convergence theorem for the fuzzy ISODATA clustering algorithms,IEEETransactions on Pattern Analysis and Machine Intellegence,Vol.PAMI-2,No.1,1-8,1980.[2]Dave,R.N.,and Krishnapuram,R.,Robust clustering methods:a uniﬁed view,IEEE Trans-actions on Fuzzy Systems,Vol.5,No.2,270-293,1997.13[3]Dey,V.,Pratihar,D.K.,and Datta,G.L.,Genetic algorithm-tuned entropy-based fuzzy C-means algorithmfor obtaining distinct and compact clusters,Fuzzy Optimization and DecisionMaking,Vol.10,No.2,153-166,2011.[4]Hathaway,R.J.,Bezdek,J.C.,and Tucker,W.T.,An improved convergence theory for thefuzzy ISODATAclustering algorithms,The Analysis of Fuzzy Information,Vol.3,Boca Raton:CRC Press,123-132,1987.[5]H¨oppner,F.,and Klawonn,F.,A contribution to convergence theory of fuzzy c-means andderivatives,IEEE Transactions on Fuzzy Systems,Vol.11,No.5,682-694,2003.[6]Krishnapuram,R.,Frigui,H.,and Nasraoui,O.,Fuzzy and possibilistic shell clustering algo-rihm and their application to boundary detection and surface approximation,IEEE Transac-tions on Fuzzy Systems,Vol.3,29-60,1995.[7]Krishnapuram,R.,and Keller,J.M.,Apossibilistic approach to clustering,IEEE Transactionson Fuzzy Systems,Vol.1,No.2,98-110,1993.[8]Krishnapuram,R.,and Keller,J.M.,The possibilistic c-means algorithm:insights and recom-mendations,IEEE Transactions on Fuzzy Systems,Vol.4,No.3,385-393,1996.[9]Oussalah,M.,and Nefti,S.,On the use of divergence distance in fuzzy clustering,FuzzyOptimization and Decision Making,Vol.7,No.2,147-167,2008.[10]Yang,M.-S.,and Wu,K.-L.,Unsupervised possibilistic clustering,Pattern Recognition,Vol.39,No.1,5-21,2006.[11]Zangwill,W.,Nolinear Programming:A Unied Approach,Englewood Cliﬀs,NJ:Prentice-Hall,1969.[12]Zhou,J.,and Hung,C.C.,A generalized approach to possibilistic clustering algorithms,In-ternational Journal of Uncertainty,Fuzziness and Knowledge-Based Systems,Vol.15,No.2Suppl.,117-138,2007.[13]Zhang,Y.,Chi,Z.-X.,A fuzzy support vector classiﬁer based on Bayesian optimization,FuzzyOptimization and Decision Making,Vol.7,No.1,75-86,2008.14