Σχόλια 0

Το κείμενο του εγγράφου

SUBMITTED FOR PUBLICATION TO IEEE TRANSACTIONS ON SIGNAL PROCESSING,SEPTEMBER 2012 1Analysis of Sum-Weight-like algorithms foraveraging in Wireless Sensor NetworksFranck Iutzeler,Philippe Ciblat and Walid HachemAbstractDistributed estimation of the average value over a Wireless Sensor Network has recently received alot of attention.Most papers consider single variable sensors and communications with feedback (e.g.peer-to-peer communications).However,in order to use ef ciently the broadcast nature of the wirelesschannel,communications without feedback are advocated.To ensure the convergence in this feedback-free case,the recently-introduced Sum-Weight-like algorithms which rely on two variables at each sensorare a promising solution.In this paper,the convergence towards the consensus over the average of theinitial values is analyzed in depth.Furthermore,it is shown that the squared error decreases exponentiallywith the time.In addition,a powerful algorithm relying on the Sum-Weight structure and taking intoaccount the broadcast nature of the channel is proposed.I.INTRODUCTIONThe recent years have seen a surge of signal processing and estimation technologies operating instressful environments.These environments do not make possible the use of a fusion center so theunits/sensors have to behave in a distributed fashion.In various applications,sensors have to communicatethrough wireless channels because of the lack of infrastructure.Hence,along with distributed computation,the problem of communicating between the different sensors to estimate a global value is a key issueand was pioneered by Tsitsiklis [2].One of the most studied problems in Wireless Sensor Networks is the average computation of theinitial measurements of the sensors.More precisely,each sensor wants to reach consensus over the meanof the initial values.A basic technique to address this problem,called Random Gossip,is to make theThe authors are with Institut Mines-T´el´ecom/T´el´ecom ParisTech;CNRS LTCI.This work was partially granted by the FrenchDefense Agency (DGA) and by the T´el´ecom/Eurecom Carnot Institute.Some parts of the work introduced in this paper werepresented at ICASSP 2012 [1].DRAFT2 SUBMITTED FOR PUBLICATION TO IEEE TRANSACTIONS ON SIGNAL PROCESSING,SEPTEMBER 2012sensors exchange their estimates in pairs and average them.This technique has been widely analyzed interms of convergence and convergence speed in [3],[4],[5].Finding more efcient exchange protocols has been a hot topi c for the past few years;the proposedimprovements were essentially twofold:i) exploiting the geometry of the network to have a more efcientmixing between the values (e.g.[6],[7],[8]) and ii) taking advantage of the broadcast nature of thewireless channels (e.g.[9] without feedback link,and [10] with feedback link).Whereas the use ofnetwork geometry has received a lot of attention,the use of the broadcast nature of the wireless channelis less studied albeit promising.Therefore,in our paper,we will focus on averaging algorithms takinginto account the broadcast nature of the channel.In order to keep the number of communications as lowas possible,we forbid the use of feedback links.In the feedback-free context,one can mention [9].However,even if the algorithm described in [9]converges quickly to a consensus,the reached value is incorrect.This can be explained by the fact thatthe sum of the sensor estimates is not constant over time.To overcome this problem,Franceschelli etal.[11] proposed to use well-chosen updates on two local variables per sensor while using the broadcastnature of the channel without feedback link.A more promising alternative is to use the Sum-Weightscheme proposed by Kempe [12] and studied more generally by B´en´ezit [13].In this setup,two localvariables are also used:one representing the sum of the received values and the other representing theweight of the sensor (namely,the proportion of the sensor activity compared to the others).The twovariables are transmitted at each iteration and both are updated in the same manner.The wanted estimateis then the quotient of these values.The convergence of this class of algorithms (without necessarilysum-conservation) has been proven in [12],[13].In contrast,their convergence speed has never beentheoretically evaluated except in [12] for a very specic ca se.The goal of this paper is to theoretically analyze the convergence speed for any Sum-Weight-likealgorithm.As a by-product,we obtain necessary and sufcie nt condition for the convergence.In addition,we propose a new Sum-Weight-like algorithm based on broadcasting which outperforms existing ones.This paper is organized as follows:the notations and assumptions on the network model and on theSum-Weight-like algorithms are provided in Section II.Section III is dedicated to the theoretical analysisof the squared error of the algorithms and provides the main contributions of the paper.In SectionIV,we propose new Sum-Weight-like algorithms.In Section V,we compare our results with previousderivations done in the literature for the Sum-Weight-like algorithms as well as the algorithms based onthe exchange of a single variable between the nodes.Section VI is devoted to numerical illustrations.Concluding remarks are drawn in Section VII.DRAFTSUBMITTED FOR PUBLICATION TO IEEE TRANSACTIONS ON SIGNAL PROCESSING,SEPTEMBER 2012 3II.MODEL AND ASSUMPTIONSA.Network modelThe sensor network will be modeled by a directed graph G = (V,E),V being the set of vertices/sensorsand E being the set of edges which models the possible links between the sensors.We also dene theadjacency matrix A of G as the N ×N matrix such that (A)ijequals 1 if there is an edge from i toj and 0 otherwise.We dene the neighborhood of each sensor i as follows Ni= {j ∈ V |(i,j) ∈ E}.Let di= |Ni| denote the degree of the sensor i where |A| represents the cardinality of the set A.Letdmax= maxidibe the maximum degree.Let D = diag(d1, ,dN) and L = D−A be the degreematrix and the Laplacian matrix respectively [14].Every sensor i has an initial value xi(0) and we dene x(0) = [x1(0),...,xN(0)]Twhere the super-scriptTstands for the transposition.The goal of the network is to communicate through the edges of theunderlying graph to reach consensus over the mean of the initial values of the sensors.A communicationand estimation step will be referred to as an update.We will assume that the network follows a discrete time model such that the time t is the time of thet-th update.As an example,every sensor could be activated by an independent Poisson clock.The timewould then be counted as the total number of clock ticks across the network.We will denote xi(t) thei-th sensor estimate at time t and x(t) = [x1(t),...,xN(t)]T.B.Averaging AlgorithmsThe goal of averaging algorithms is to make the vector of estimates x(t) converge to xave1,alsoknown as the consensus vector,where 1 is the length-N vector of ones and xave= (1/N)1Tx(0) is theaverage of the initial values of the sensors.In the present state-of-the-art,two classes of algorithms existand are described below.1) Class of Random Gossip algorithms:In standard gossip algorithms (e.g.[4]),sensors update theirestimate according to the equation x(t+1)T= x(t)TK(t) where the K(t) are doubly-stochastic matrices.We recall that a matrix K is said row-stochastic (resp.column-stochastic) when all its elements are non-negative and when K1 = 1 (resp.KT1 = 1).A matrix which is both row- and column-stochastic is saidto be doubly-stochastic.Since two sensors can exchange information only across the edges of the graph,for any i 6= j,(K(t))ijcannot be positive if (A)ij= 0.From an algorithmic point of view,the row-stochasticity implies that the sumof the values is unchanged:x(t+1)T1 = x(t)TK(t)1 = x(t)T1 whereasthe column-stochasticity implies that the consensus is stable:if x(t) = c1,then x(t+1)T= x(t)TK(t) =DRAFT4 SUBMITTED FOR PUBLICATION TO IEEE TRANSACTIONS ON SIGNAL PROCESSING,SEPTEMBER 2012c1TK(t) = c1T.For these reasons,double-stochasticity is desirable.However using doubly-stochasticmatrices implies a feedback which is not always possible.In particular if a sensor sends information tomultiple neighbors,the feedback message might raise multiple access problems.Similarly,if the messageis sent through a long route within the network,the same route may not exist anymore for feedback inthe context of mobile wireless networks.As these algorithms only rely on the exchanges of one variableper sensor,they will be called single-variate algorithms in the rest of the paper.2) Class of Sum-Weight algorithms:To overcome this drawback,a possible method is to use twovariables:one representing the sum of the received values and another representing the relative weightof the sensor.For the sensor i at time t,they be respectively written si(t) and wi(t).Writing s(t) =[s1(t),...,sN(t)]Tand w(t) = [w1(t),...,wN(t)]T,both variables will be modied by the same updatematrix,s(t +1)T= s(t)TK(t) and w(t +1)T= w(t)TK(t).Finally,the estimate of sensor i at time twill be the quotient of the two variables,xi(t),si(t)/wi(t).The initialization is done as follows:s(0) = x(0)w(0) = 1.(1)For the sake of convergence we will need an important property:Mass ConservationPNi=1si(t) =PNi=1xi(0) = NxavePNi=1wi(t) = N.(2)This clearly rewrites as ∀t > 0,K(t)1 = 1 which corresponds to sum-conservation as in classic gossipalgorithms and leads to row-stochastic updates matrices.C.Notations for the Sum-Weight schemeLet us now introduce some useful notations along with some fundamental assumptions for convergencein the Sum-Weight scheme.Given two vectors a and b with the same size,we denote by a/b the vectorof the elementwise quotients.The Sum-Weight algorithm is described by the following equations:x(t),s(t)w(t)=

s1(t)w1(t),...,sN(t)wN(t)

TsT(t +1) = sT(t)K(t) = xT(0)P(t)wT(t +1) = wT(t)K(t) = 1TP(t)with P(t) = K(1)K(2)...K(t).In the following,the matrix inequalities will be taken elementwise so that M > 0 (resp.M ≥ 0)means that the matrix M is (elementwise) positive (resp.non-negative).We recall that a non-negativeDRAFTSUBMITTED FOR PUBLICATION TO IEEE TRANSACTIONS ON SIGNAL PROCESSING,SEPTEMBER 2012 5matrix Mis said to be primitive if Mm> 0 for some m ≥ 1 (see [15,Chap 8.5] for details).We willdenote the Kronecker product by`⊗'.We can notice that reaching consensus is equivalent for x(t) to converge to the consensus line c1 wherec is consensus value.For this reason,it is useful to dene J = (1/N)11Tthe orthogonal projection matrixto the subspace spanned by 1 and (I −J) the orthogonal projection matrix to the complementary subspacewhich can be seen as the error hyperplane.The matrix I is the identity matrix with appropriate size.In order to intuitively understand the algorithm behavior,let us decompose xT(t) as followsxT(t) =sT(t)wT(t)=xT(0)P(t)wT(t)=xT(0)JP(t)wT(t)+xT(0)(I −J)P(t)wT(t)=xave1TP(t)1TP(t)+xT(0)(I −J)P(t)wT(t)= xave1T+xT(0)(I −J)P(t)wT(t)(3)Obviously,the algorithm will converge to the right consensus if the second term in the right hand sidevanishes.Actually,under some mild assumptions related to the connectedness of the network,we expectthe numerator which corresponds to a projection on the error hyperplane will converge to zero at anexponential rate while all the elements of w(t) are of order one.Proving these results will be the coreof the paper.D.Assumptions on the update matrices K(t)First,we will always assume that both following conditions will be satised by any update matrixassociated with a Sum-Weight like algorithm.(A1) Matrices (K(t))t>0are independent and identically distributed (i.i.d.),and row-stochastic.Thematrix K(t) is valued in a set K = {Ki}i=1..Mof size M < ∞.Also,pi= P[K(t) = Ki] > 0.(A2) Any matrix in K has a strictly positive diagonal.The rst assumption is just a reformulation of the mass conservation property introduced in section II-B2along with the assumption of a nite number of actions across the network.This assumption is reasonablewhen one assumes that each sensor performs a nite number of a ctions.The second assumption forcesDRAFT6 SUBMITTED FOR PUBLICATION TO IEEE TRANSACTIONS ON SIGNAL PROCESSING,SEPTEMBER 2012every sensor to keep a part of the information it had previously.We also denemK= mini,j,kn(Kk)ij:(Kk)ij> 0o,pK= mink{P[K(t) = Kk]} = minkpk> 0.(4)In addition to both previous assumptions,we will see that next assumption plays a central role in theconvergence analysis of any Sum-Weight like algorithm.(B) E[K] =PMi=1piKiis a primitive matrix.In terms of graph theory,matrix E[K] represents a weighted directed graph (see [15,Def.6.2.11]).Since it is primitive,this graph is strongly connected (see [15,Cor.6.2.18] and [20]).Observe that thisgraph contains a self-loop at every node due to Assumption (A2).In fact,the matrix A+I coincideswith the so-called indicator matrix ([15,Def.6.2.10]) of E[K].III.MATHEMATICAL RESULTSA.Preliminary resultsThe assumption (B) can be re-written in different ways thanks to the next Lemma.Lemma 1.Under assumptions (A1) and (A2),the following propositions are equivalent to (B):(B1) ∀(i,j) ∈ {1,...,N}2,∃Lij< N and a realization of P(Lij) verifying P(Lij)i,j> 0.(B2) ∃L < 2N2and a realization of P(L) which is a positive matrix.(B3) E[K⊗K] =PMi=1piKi⊗Kiis a primitive matrix.The proof is reported in Appendix A.This Lemma will be very useful in the sequel since it enablesus to interpret the Assumption (B) in various manners.Our approach for analyzing the convergence of Sum-Weight algorithms is inspired by [12] (with anumber of important differences explained below) and so relies on the analysis of the Squared Error(SE).Actually,the Squared Error can be upper-bounded by a product of two terms as followskx(t) −xave1k22=NXi=1|xi(t) −xave|2=NXi=11wi(t)2|si(t) −xavewi(t)|2(5)=NXi=11wi(t)2

2.(8)Notice that the decomposition done in Eq.(6) mimics Eq.(3) for the Squared Error.From now,our main contributions will be to understand the behavior of both terms Ψ1(t) and Ψ2(t)when t is large.In Section III-B,we will prove that Ψ1(t) can be upper bounded innitely often.Theterm Ψ2(t) represents the projection of the current sensor values on the orthogonal space to the consensusline.The analysis of this term is drawn in Section III-C.B.Analysis of Ψ1(t)This term depends on the inverse of the minimum of the sensors weights (see Eq.(7)) and thus canincrease quickly.However,the sensors frequently exchange information and hence spread their weight sothe probability that a node weight keeps decreasing for a long time is very small.We will work on thisprobability and show that it can be made as small as one wants considering a sufciently long amountof time.This will enable us to prove that Ψ1(t) will be innitely often lower than a nite constant.Toobtain these results,some preliminary lemmas are needed.First,we will focus on the behavior of the nodes weights and especially on their minimum.One canremark that at every time t there is as least one node whose weight is greater than or equal to 1 (asthe weights are non-negative and ∀t > 0,Piwi(t) = N because of the mass conservation exhibited inEq.(2)).As w(t0+t)T= w(t)TP(t0,t0+t) where P(t0,t0+t),K(t0)...K(t0+t),it is interestingto focus on i) the minimum non-null value of P(t0,t0+t) and ii) on the instants where P(t0,t0+t) ispositive.Lemma 2.For all t,t0> 0,all the non-null coefcients of P(t0,t0+t) are greater than or equal to(mK)t.Proof:Let us recall that mKis the smallest non-null entry of all the matrices belonging to the set Kas dened in Eq.(4).Let us consider the random matrix P(t) (as the matrix choice is i.i.d.,we drop theoffset t0).We will then prove this result by induction.It is trivial to see that every non-null coefcientDRAFT8 SUBMITTED FOR PUBLICATION TO IEEE TRANSACTIONS ON SIGNAL PROCESSING,SEPTEMBER 2012of P(1) = K(1) is greater than mKand as(P(t))i,j=NXk=1(P(t −1))i,k(K(t))k,j,it is obvious that if (P(t))i,j> 0,then there is a term in the sum that is positive (we remind that all thecoefcient here are non-negative).This term is the product of a positive coefcient of P(t −1) and apositive coefcient of K(t).Hence,if all the non-null coefcients of P(t −1) are greater than (mK)t−1,then any non-null coefcient of P(t) is greater than (mK)t−1.mK= (mK)t.So,by induction,we havethat ∀t > 0 every non-null coefcient of P(t) is greater than (mK)t.Thanks to Item (B2) of Lemma 1,there is a nite L such that there exists a realization of P(L) which isa positive matrix.Considering the time at multiples of L,we knowthat for any n,if P(nL+1,(n+1)L) >0 then for all i,wi((n +1)L) ≥ mLK.Let us dene the following stopping times:τ0= 0τn= L×minnj:Pjk=1

thanks to Eq.(9) and nding a linear recursion between E[Ψ2(t)|Ψ2(t −1)] and Ψ2(t −1).However thistechnique does not work in the most general case1.Therefore,as proposed alternatively in [4] (though not essential in [4]) in the context of Random-Gossip Algorithms (see Section II-B1),we write Ψ2(t) with respect to a more complicated matrix forwhich the recursion property is easier to analyze.Indeed,recalling that for any matrix M,kMk2F= Trace

=NXi,j=1(E[Ξ(t)])i+(i−1)N,j+(j−1)N.As a consequence,the behavior of the entries of E[Ξ(t)] drives the behavior of E[Ψ2(t)].Let us dene the L∞vector norm on N×N matrices as |||M|||∞= N max1≤i,j≤N|mij|.The norm |||•|||∞isa matrix norm (see [15,Chap.5.6]) and hence is submultiplicative.Now,using the Jordan normal formof R (see [15,Chap.3.1 and 3.2]),we get that there is an invertible matrix S such that

Rt

∞=

SΛtS−1

∞≤ |||S|||∞

S−1

∞

Λt

∞(13)where Λ is the Jordan matrix associated with R.After some computations,it is easy to see that the absolute value of all the entries of Λtare boundedin the following way:max1≤i,j≤N

(Λt)ij

≤ max0≤j≤J−1

tt −j

ρ(R)t−jwith ρ(R) the spectral radius of R and J the maximum size of the associated Jordan blocks.Hence,∀t > 0max1≤i,j≤N

(Λt)ij

≤ tJ−1ρ(R)t−J+1(14)When R is diagonalizable,J = 1,and we get thatmax1≤i,j≤N

(Λt)ij

≤ ρ(R)t(when R is diagonalizable) (15)Putting together Eqs.(11),(13),(14),(15),and remarking that the subspace spanned by 1N2 = 1 ⊗1is in the kernel of R,we get that the size of the greatest Jordan block is ≤ N −1,hence the followinglemma:Lemma 3.We haveE[Ψ2(t)] = O

tN−2ρ(R)t

DRAFTSUBMITTED FOR PUBLICATION TO IEEE TRANSACTIONS ON SIGNAL PROCESSING,SEPTEMBER 2012 11where R is dened in Eq.(12) and where ρ(R) is the spectral radius of the matrix R.The next step of our analysis is to prove that the spectral radius ρ(R) is strictly less than 1 whenAssumptions (A1),(A2),and (B) hold.Applying Theorem5.6.12 of [15] on Eq.(11) proves that ρ(R) < 1if and only if E[Ξ(t)] converges to zero as t goes to innity.Therefore our next objective is to provethat E[Ξ(t)] converges to zero by using another way than the study of the spectral radius of R.Actually,one can nd another linear recursion on Ξ(t) (different from the one exhibited in Eq.(10)).We getΞ(t) = Ξ(t −1).(K(t) ⊗K(t))and,by taking the mathematical expectation given the past,we obtainE[Ξ(t)|Ft−1] = Ξ(t −1).E[K⊗K].Remarking that Ξ(t)1N2 = 0,we have for any vector v,E[Ξ(t)|Ft−1] = Ξ(t −1).

E[K⊗K] −1N2vT

and then,for any vector v,E[Ξ(t)] = Ξ(0)Stv(16)with Sv= E[K⊗K] −1N2vTand Ξ(0) = (I −J) ⊗(I −J).By considering Eq.(16),it is straightforward to see that E[Ξ(t)] converges to zero as t goes to innityif there is a vector v such that ρ(Sv) < 1.Notice that the recursion given in Eq.(16) is less strongthan the one in Eq.(11) since it only leads to a sufcient cond ition instead of a necessary and sufcientcondition.As ρ(Sv) < 1 implies the convergence of E[Ξ(t)] and as the convergence of E[Ξ(t)] impliesthat ρ(R) < 1,one thus can state the following Lemma:Lemma 4.If there is a vector v such that ρ

E[K⊗K] −1N2vT

< 1,then ρ(R) < 1.One of the most important result in the paper lies in the following Lemma in which we ensure that,under Assumptions (A1),(A2),and (B) there is a vector v such that ρ

<1.Proof:Assumptions (A1),(A2),and (B) imply thatDRAFT12 SUBMITTED FOR PUBLICATION TO IEEE TRANSACTIONS ON SIGNAL PROCESSING,SEPTEMBER 2012i) E[K⊗ K] is a non-negative matrix with a constant row sum equal to one (because of the row-stochasticity).According to Lemma 8.1.21 in [15],we have ρ(E[K⊗K]) = 1.ii) E[K⊗K] is a primitive matrix (see (B3) in Lemma 1) which implies that there only is one eigenvalueof maximum modulus.This eigenvalue is thus equal to 1 and associated with the eigenvector 1N2.By using the Jordan normal form and the simple multiplicity of the maximum eigenvalue (equal to 1),we know that i) there exists a vector v1equal to the left eigenvector corresponding to the eigenvalue 1,and ii) that the set of the eigenvalues of E[K⊗K] −1N2vT1= Sv1are exactly the set of the eigenvaluesof E[K⊗K] without the maximum one equal to 1.Indeed the maximum eigenvalue of E[K⊗K] hasbeen removed by the vector 1N2vT1and the associated eigenvector now belongs to the kernel of Sv1.Asa consequence,the modulus of the eigenvalues of Sv1is strictly less than 1,i.e.,ρ(Sv1) < 1.Aggregating successively the results provided in Lemmas 5,4,and 3 leads to the main result of thisSection devoted to the analysis of Ψ2(t).Indeed,Lemma 5 ensures that there is a vector v such thatρ(Sv) < 1,then Lemma 4 states that ρ(R) < 1.Then,Lemma 3 concludes the proof for the next result.Proposition 2.Under Assumptions (A1),(A2) and (B) holds,thenE[Ψ2(t)] = O

tN−2e−κt

with κ = −log (ρ(R)) > 0.D.Final resultsThanks to the various intermediate Lemmas and Propositions provided above,we are now able tostate the main Theorems of the paper.The rst one deals with t he determination of the necessary andsufcient conditions for Sum-Weight-like algorithms to co nverge.The second one gives us an insight onthe decrease speed of the Squared Error (dened in Eq.(5)).I n the meanwhile,we need the followinglemma:Lemma 6.kx(t) −xave1k∞= maxi|xi(t) −xave| is a non-increasing sequence with respect to t.Proof:One can remark that,at time t +1,we have∀j,xj(t +1) =PNi=1(K)ijsi(t)PNi=1(K)ijwi(t)=NXi=1

(K)ijwi(t)PNℓ=1(K)ℓjwℓ(t)!xi(t)DRAFTSUBMITTED FOR PUBLICATION TO IEEE TRANSACTIONS ON SIGNAL PROCESSING,SEPTEMBER 2012 13where K corresponds to any matrix in K.So xj(t +1) is a center of mass of (xi(t))i=1,...,N.Therefore,∀j ∈ {1,...,N},|xj(t +1) −xave| ≤NXi=1

(K)ijwi(t)PNℓ=1(K)ℓjwℓ(t)!|xi(t) −xave|≤ maxi|xi(t) −xave|.1) Result on the convergence:Let us consider that Assumption (B) does not hold.Thanks to (B1)in Lemma 1,this is equivalent to ∃(k,l) ∈ N2such that ∀T,P(T)k,l= 0.Let us take x(0) equalto the canonical vector composed by a 1 at the k-th position and 0 elsewhere.Then for any t > 0,xl(t) = 0 which is different from xave= 1/N.Consequently,the algorithm does not converge to thetrue consensus for any initial measurement.So if the Sum-Weight algorithm converges almost surely tothe true consensus for any initial vector x(0) then Assumption (B) holds.Let us now assume that Assumption (B) holds.Using Markov's inequality along with Result 2,wehave a nite K such that for any δ > 0,Xt>0P[|Ψ2(t)| > δ] ≤1δXt>0E[|Ψ2(t)|]≤1δKXt>0tN−2e−κt< ∞.Consequently,Borel-Cantelli's Lemma leads to the almost s ure convergence of Ψ2(t) to zero.Inaddition,the random variables (τn)n>0provided in the statement of Proposition 1 converge to inni tywith probability one,hence Ψ2(τn) →0 almost surely.Since Ψ1(τn) is bounded,Ψ1(τn)Ψ2(τn) →n→∞0almost surely.According to Lemma 6,kx(t)−xave1k∞is a nonincreasing nonnegative sequence verifyingkx(t) − xave1k∞≤ Ψ1(t)Ψ2(t),as there is converging subsequence with limit 0,the sequence itselfconverges to the same limit which implies the following theorem.Theorem 1.Under Assumptions (A1) and (A2),x(t) converges almost surely to the average consensusxave1 for any x(0),if and only if Assumption (B) holds.We have additional result on another type of convergence for x(t).As kx(t) − xave1k∞is a non-increasing sequence,we have,for any t,kx(t) −xave1k∞≤ kx(0) −xave1k∞which implies that x(t)is bounded for any t > 0.As a consequence,according to [16],since x(t) also converges almost surelyto xave1,we know that x(t) converges to xave1 in Lpfor any positive integer p.The convergence ofthe mean squared error of x(t) thus corresponds to the case p = 2.DRAFT14 SUBMITTED FOR PUBLICATION TO IEEE TRANSACTIONS ON SIGNAL PROCESSING,SEPTEMBER 2012Corollary 1.If x(t) converges almost surely to the average consensus xave1 then the mean squarederror (MSE) converges to zero.2) Result on the convergence speed:The next result on the convergence speed corresponds to themain challenge and novelty of the paper.Except in [12] for a very specic case (cf.Section V-A formore details),our paper provides the rst general results a bout the theoretical convergence speed for thesquared error of the Sum-Weight like algorithms.For the sake of this theorem we introduce the followingnotation:given two sequences of randomvariables (Xn)n>0and (Yn)n>0,we will say that Xn= oa.s.(Yn)if Xn/Yn→0 almost surely.Theorem 2.Under Assumptions (A1),(A2),and (B),the squared error (SE) is non-increasing.Further-more,it is bounded by an exponentially decreasing function as followsSE(τn) = kx(τn) −xave1k22= oa.s.

τNne−κτn

with κ = −log (ρ(((I −J) ⊗(I −J)) E[K⊗K])) > 0 and τn=Pni=1Δias dened in Proposition 1.This result tells us that the slope of log(SE(t)) is lower-bounded by κ innitely often which provides usa good insight about the asymptotic behavior of x(t).Indeed,the squared error will vanish exponentiallyand we have derived a lower bound for this speed.We believe this result is new as it may foretell anyalgorithm speed.The particular behavior of the weights variables in this very general setting does notenable us to provide a clearer result about the mean squared error;however for some particular algorithms(e.g.single-variate ones) this derivation is possible (see Section V for more details).The authors wouldlike to draw the reader's attention to the fact that the main c ontribution of the paper lies in the exponentialdecrease constant κ.Proof:To prove this result we will once more use the decomposition of the squared error introducedin Eq.(6).We know from Proposition 2 that E[t−NeκtΨ2(t)] = O(t−2).By Markov's inequality andBorel-Cantelli's lemma,t−NeκtΨ2(t) −−−→t→∞0 almost surely.Composing with the (τn)n>0,we getτ−NneκτnΨ2(τn) −−−→n→∞0 almost surely.Since ∃C,∀n > 0,Ψ1(τn) ≤ C,we get the claimed result.DRAFTSUBMITTED FOR PUBLICATION TO IEEE TRANSACTIONS ON SIGNAL PROCESSING,SEPTEMBER 2012 15IV.PROPOSED ALGORITHMSIn Subsection IV-A,we propose a new Sum-Weight-like algorithm using the broadcast nature ofthe wireless channel which converges and offers remarkable performance.This algorithm is hereaftercalled Broadcast-Weighted Gossip (BWGossip).In Subsection IV-B,a new distributed management ofthe nodes'clocks which can improve averaging algorithms is proposed.Finally,Subsection IV-C providesan extension of this work to the distributed sum computation.A.BWGossip algorithmRemarking i) that the broadcast nature of the wireless channel was often not taken into account inthe distributed estimation algorithms (apart in [9] but this algorithm does not converge to the average)and ii) that information propagation is much faster while broadcasting compared to pairwise exchanges[17],we propose an algorithm taking into account the broadcast nature of the wireless channel.At eachglobal clock tick,it simply consists in uniformly choosing a sensor that broadcasts its pair of values inan appropriate way;then,the receiving sensors add their received pair of values to their current one.Amore algorithmic formulation is presented below.Algorithm 1 BWGossipWhen the sensor i wakes up (at global time t):◮ The sensor i broadcasts

si(t)|Ni|+1;wi(t)|Ni|+1

◮ The sensors of the neighborhood Niupdate:∀j ∈ Ni,sj(t +1) = sj(t) +si(t)|Ni|+1wj(t +1) = wj(t) +wi(t)|Ni|+1◮ The sensor i updates:si(t +1) =si(t)|Ni|+1wi(t +1) =wi(t)|Ni|+1According to this formulation,the update matrix Kiassociated with the action of the i-th sensor takesthe following formKi= I −eieTi+eieTi

(I +D)−1(A+I)

= I −eieTi(I +D)−1L (17)with eithe i-th canonical vector.Clearly,the update matrices satisfy the Assumptions (A1) and (A2).DRAFT16 SUBMITTED FOR PUBLICATION TO IEEE TRANSACTIONS ON SIGNAL PROCESSING,SEPTEMBER 2012Thanks to Eq.(17) and recalling that L = D−A,we obtain thatE[K] = I −1N(I +D)−1L=N −1NI +(I +D)−1(I +A).As all the involved matrices are non-negative,we have (I +D)−1(I +A) ≥ (I +A)/((dmax+1)N).As a consequence,we haveE[K] ≥1(dmax+1)N(I +A) ≥ 0.Since A is the adjacency matrix of a connected graph,∃m > 0,(I + A)m> 0.Hence,for the samem,E[K]m≥ 1/(dmaxN +N)m(I +A)m> 0,which implies that E[K] is a primitive matrix.ApplyingLemma 1 enables us to prove that Assumption (B) also holds.Hence,Theorem1 states that the BWGossip algorithmconverges almost surely to the average consensusand Theorem 2 gives us an insight about the decrease speed of the squared error.B.Adaptation to smart clock managementSo far,all the Poisson coefcients of the clocks were identi cal.This means that all sensors were wakingup uniformly and independently from their past actions.Intuitively,it would be more logical that a sensortalking a lot became less active during a long period.Another advantage of the Sum-Weight algorithms is the knowledge of how much a sensor talkscompared to the others which is a useful information.Actually,each sensor knows whether it talksfrequently or not (without additional cost) through its own weight value because when a sensor talks,itsweight decreases and conversely when it receives information,its weight increases.Therefore,our ideais to control the Poisson coefcient of each sensor with resp ect to its weight.We thus propose to consider the following rule for each Poisson coefcient∀i ∈ V,λi(t) = α +(1 −α)wi(t) (18)where α ∈ (0,1) is a tuning coefcient.Notice that the global clock remains unchanged since ∀t > 0,PNi=1λi(t) = N.Keeping the globalmessage exchange rate unchanged,the clock rates of each sensor are improved.The complexity of thealgorithm is the same because the sensor whose weight changes has just to launch a Poisson clock.Even if the convergence and the convergence speed with clock improvement have not been formallyestablished,our simulations with the BWGossip algorithm (see Fig.2) show that it seems to also convergeexponentially to the average more quickly if α is well chosen.DRAFTSUBMITTED FOR PUBLICATION TO IEEE TRANSACTIONS ON SIGNAL PROCESSING,SEPTEMBER 2012 17C.Distributed estimation of the sumIn some cases,distributively computing the sum of the initial values is very interesting.For example,in the case of signal detection,the Log Likelihood Ratio (LLR) of a set of sensors is separable intothe sum of the LLRs of the sensors.Hence,in order to perform a signal detection test based on theinformation of the whole network (using a Generalized LLR Test for instance),every sensor needs toestimate the sum of the LLRs computed by the sensors.An estimate of the sum can be trivially obtained by multiplying the average estimate by the numberof sensors which might not be available at any sensor.Another interest of the Sum-Weight scheme isthat the initialization of the weights of the sensors enables us to compute different functions related tothe average.Intuitively,as the sum of the s(t) and w(t) vectors are conserved through time and theconvergence to a consensus is guaranteed by the assumptions on the update matrices,we get that thesensors will converge toPisi(0)/Piwi(0).This is obviously equal to the average 1/NPixi(0) withthe initialisation of Eq.(1).Now,if a sensor wants to trigger a estimation of the sum through the network,it simply sets its weightto 1 and sends a starting signal to the other nodes which set their weights to 0.Mathematically,we thenhave the following initialization after sensor i triggers the algorithms(0) = x(0)w(0) = eiwhere eiis the i-th canonical vector.In this setting,all Sum-Weight like algorithms converge exponentiallyto the sum of the initial value as all the theorems of the paper hold with only minor modications in theproofs.V.COMPARISON WITH EXISTING WORKSIn this section,we will show that our results extend the works done previously in the literature.InSubsection V-A and V-B,we compare our results with existing papers dealing with the design and theanalysis of the Sum-Weight like algorithms.In Subsection V-C,we will observe that our results can evenbe applied to the traditional framework of single-variate gossip algorithms.A.Comparison with Kempe's workIn the Kempe's work [12],the setup is quite different since t he sensors'updates are synchronous,thatis,at each time t,all the sensors send and update their values.Another important difference lies in thefact that the communication graph is assumed to be complete and to offer self-loops,i.e.,each sensorDRAFT18 SUBMITTED FOR PUBLICATION TO IEEE TRANSACTIONS ON SIGNAL PROCESSING,SEPTEMBER 2012can communicate with any other one,including itself.The algorithm introduced in [12] is described inAlgorithm 2.Algorithm 2 Push-Sum Algorithm [12]At each time t,every sensor i activates:◮ The sensor i chooses uniformly a node ji(t) belonging to its neighborhood (including itself)◮ The sensor i sends the pair (si(t)/2;wi(t)/2) to ji(t)◮ Let R be the set of sensors that sent information to i.The sensor i updates:si(t +1) = si(t)/2 +Pr∈Rsr(t)/2wi(t +1) = wi(t)/2 +Pr∈Rwr(t)/2Consequently,at time t,the update matrix takes the following formK(t) =12I +12NXi=1eieTji(t)(19)where the index ji(t) is dened in Algorithm2.Notice that the rst termof the righ t hand side correspondsto the information kept by the sensor,while the second term corresponds to the information sent to thechosen sensor.Moreover,as each sensor selects uniformly its neighbor2(including itself),we obtain thatE[K] =12I +12J.It is then easy to check that- the (instantaneous) update matrices are non-negative and row-stochastic.In addition,they are chosenuniformly in a set of size NN.- the (instantaneous) update matrices have a strictly positive diagonal.- E[K] > 0,thus E[K] is a primitive matrix.This proves that the Kempe's algorithm satises the assumpt ions (A1),(A2) and (B),and so it convergesalmost surely to the average consensus (which was also proven in [12]).Let us now focus on the convergence speed of the Kempe's algor ithm.We remind that the convergencespeed is driven by Ψ2(t) (denoted by Φtin [12]).As this algorithm is synchronous and only applieson a complete communication graph,it is simple to obtain a recursion between E[Ψ2(t)|Ψ2(t −1)] andΨ2(t −1).Indeed,the approach given in the footnote of Section III-C can be applied.More precisely,2as the graph is complete,this means,choosing one node uniformly in the graph.DRAFTSUBMITTED FOR PUBLICATION TO IEEE TRANSACTIONS ON SIGNAL PROCESSING,SEPTEMBER 2012 19the corresponding matrix M= (I −J) E[KKT] (I −J) is given in closed-form as (see Appendix B-Afor details)M= (I −J) E[KKT] (I −J) =

12−14N

(I −J),(20)and then one can easily check3thatE[Ψ2(t)|Ψ2(t −1)] =

12−14N

Ψ2(t −1).(21)Moreover,thanks to Eq.(20),we have that ρ(M) = (1/2 −1/(4N)) < 1 and thus the inequality in theabove-mentioned footnote has been replaced with an equality and the spectral radius of M is less than1.Therefore,the true convergence speed is provided by ρ(M).Comparing this previous convergencespeed (obtained very easily in [12]) with the convergence speed bounds obtained in our paper is of greatinterest and will be done below.First of all we remind (see the footnote in Section III-C) that in thegeneral case treated in our paper,it is impossible to nd a re cursion similar to Eq.(21) which justiesour alternative approach.Secondly,following the general alternative approach developed in this paper,we know that the matrix of interest is R = ((I −J) ⊗(I −J)).E[K⊗K] (see Proposition 2).Aftersome computations (a detailed proof is available in Appendix B-B),we have thatR=14(I −J) ⊗(I −J) +N −14NvvT(22)with v = (1/√N −1) (u −(1/N)1N2) and u =PNi=1ei⊗ei.Consequently,R is a linear combination of two following orthogonal projections:• the rst projection,generated by (I −J) ⊗(I −J),is of rank N2−2N +1,• the second projection,generated by vvT,is of rank 1.As (I −J) ⊗(I −J) and vvTare orthogonal projections,the vector space RN2(on which the matrixR is operating) can be decomposed into a direct sum of four subspaces:• S0= Im(vvT) ∩ Ker ((I −J) ⊗(I −J))• S1= Im(vvT) ∩ Im((I −J) ⊗(I −J))• S2= Ker(vvT) ∩ Im((I −J) ⊗(I −J))• S3= Ker(vvT) ∩ Ker ((I −J) ⊗(I −J))As ((I −J) ⊗(I −J)) v = v (see Appendix B-B),we have S0= {0}.3Note that there is a typo in Lemma 2.3 of [12].Indeed,the coefcient is (1/2−1/(2N)) in [12] instead of (1/2−1/(4N)).DRAFT20 SUBMITTED FOR PUBLICATION TO IEEE TRANSACTIONS ON SIGNAL PROCESSING,SEPTEMBER 2012Moreover,according to Eq.(22),we obtain thatRx =

12−14N

x ∀x ∈ S114x ∀x ∈ S20 ∀x ∈ S3As a consequence,the non-null eigenvalues of R are 1/4 and (1/2 − 1/(4N)) which implies thatρ(R) = 1/2−1/(4N).Hence,the convergence speed bound obtained by our general alternative approachdeveloped in this paper provides the true convergence speed for the Kempe's algorithm [12].B.Comparison with B´en´ezit's algorithmIn [7],it has been shown that doing a multi-hop communication between sensors provides signicantperformance gain.However,the proposed algorithm relied on a single-variate algorithm.In order toensure the convergence of this algorithm,the double-stochasticity of the matrix update is necessary whichimplies a feedback along the route.The feedback can suffer from link failure (due to high mobility inwireless networks).To counter-act this issue,B´en´ezit proposes to get rid of the feedback by using theSum-Weight approach [13].In this paper,the authors established a general convergence theorem closeto ours.In contrast,they did not provide any result about convergence speed.It is worth noting that ourconvergence speed results can apply to the B´en´ezit's algorithm.C.Comparison with the single-variate algorithmsIf the following additional assumption holds,(A3) The matrices of K are column-stochastic,one can easily show that all the weights w(t) remain constant and equal to 1,i.e.,∀t > 0,w(t)T= w(0)TP(t) = 1TP(t) = 1Tand x(t) = s(t) = K(t)Tx(t −1).Therefore,the single-variate algorithms ([18]) with double-stochastic update matrices such as theRandom Gossip [3],[4],the Geographic Gossip [6] can surprisingly be cast into the Sum-Weightframework.Moreover as Ψ1(t) = kx(0)k22because all the weights stay equal to 1,the proposed resultsabout Ψ2(t) (that is Section III-C) can be applied directly to the squared error for these algorithms.Let us re-interpret the work of Boyd et al.[4] (especially their section 2) in the light of our results.In [4],it is stated that under doubly-stochastic update matrices K(t),the mean squared error at time t isDRAFTSUBMITTED FOR PUBLICATION TO IEEE TRANSACTIONS ON SIGNAL PROCESSING,SEPTEMBER 2012 21dominated by ρ

E[KKT] −(1/N)11T

tand converges to 0 when t goes to innity ifρ

E[K] −1N11T

< 1.(23)Since K(t) is doubly-stochastic,one can remark that (I −J) E

KKT

(I −J) = E

KKT

−(1/N)11T.By following the approach developed in the footnote of Section III-C,we obtained directly the dominationproven in [4].Moreover,the condition corresponding to Eq.(23) actually implies Assumption (B).Indeed,due to Eq.(23) and the double-stochasticity of K(t),one can remark that the maximum eigenvalue ofE[K] is unique and equal to 1.Consequently,E[K] is primitive,and thus Assumption (B) holds (seeLemma 1).Furthermore,in [4] (see section II-B),it is stated that the condition corresponding to Eq.(23)is only a sufcient condition and that the necessary and suf cient condition is the following oneρ

E[K⊗K] −1N1N21TN2

< 1 (24)which is exactly the same expression as that in Lemmas 4 and 54.Along with the reasoning detailedin Section III-D1,these two lemmas prove that under assumptions (A1) and (A2),the condition corre-sponding to Eq.(24) is eventually necessary and sufcient w hen assumption (A3) is also satised.Moreover,according to Eq.(19) (in [4]) and Eq.(16) (in our paper),we know that the mean squarederror at time t is upper bounded by −κ′t with κ′= −log(ρ

E[K⊗K] −(1/N)1N21TN2

) > 0.However,as stated in Proposition 2,the logarithm of the squared error scales with −κt.Though these two spectralradii are less 1 and so ensure the convergence,ρ((I −J) ⊗(I −J).E[K⊗K]) (i.e.e−κ) exhibited inour paper is in general smaller than ρ

E[K⊗K] −(1/N)1N21TN2

(i.e.e−κ′) introduced in [4].Hence,thanks to our approach,a tighter convergence speed bound has been derived.Numerical illustrationsrelated to this statement are displayed on Fig.4.VI.NUMERICAL RESULTSIn order to investigate the performance of distributed averaging algorithms over Wireless SensorNetworks,the use of Random Geometric Graphs (RGG) is commonly advocated.These graphs consistin uniformly placing N points in the unit square (representing the vertices of the future graph) thenconnecting those which are closer than a predened distance r.A choice of r of the formpr0log(N)/Nwith r0∈ [1,..,10] ensures connectedness with high probability when N becomes large and avoidscomplete graphs (see [19] for more details).4Indeed,as the vector v used in our formulation can be replaced with the left eigenvector corresponding to the eigenvalue1 (see the proof of Lemma 5 for more details) which is proportional to 1 here due to the double-stochasticity of the updatematricesDRAFT22 SUBMITTED FOR PUBLICATION TO IEEE TRANSACTIONS ON SIGNAL PROCESSING,SEPTEMBER 2012In Fig.1,we plot the empirical mean squared error versus time for different gossip algorithms:i) theRandom Gossip [3] which is the reference algorithm in the literature;ii) the Broadcast Gossip introducedin [9] which uses the broadcasting abilities of the wireless channel but does not converge to the average;iii) the algorithm introduced by Franceschelli in [11] which uses a bivariate scheme and seems to converge(no convergence proof is provided in the paper);and iv) the proposed BWGossip algorithm.A RandomGeometric Graphs with N = 100 sensors and r0= 4 has been considered.We remark that the BWGossipalgorithm outperforms the existing algorithms without adding routing or any other kind of complexity.In Fig.2,we plot the empirical mean squared error for the BWGossip algorithm versus time withdifferent clock tuning coefcients (see IV-B and Eq.(18) fo r more details).Compared to the algorithmwithout clock management (α = 1),the convergence is much faster at the beginning with α = 0 but theasymptotic rate is lower;with α = 0.5,the performance is better than the BWGossip for any time.In Fig.3,we display the empirical convergence slope5and the associated lower-bound κ derived inTheorem 2 for the BWGossip algorithm versus the number of sensors N.Different Random GeometricGraphs with r0= 4 have been considered.We observe a very good agreement between the empiricalslope and the proposed lower bound.Consequently,our bound is very tight.In Fig.4,we display the empirical convergence slope,the associated lower-bound κ,and the boundgiven in [4] for the Random Gossip algorithm versus the number of sensors N.The proposed boundκ ts much better than the one proposed in [4].Actually,the pr oposed bound matches very well theempirical slope (see Section V-C for more details).Thanks to Fig.5,we inspect the inuence of link failures in t he underlying communication graph onthe BWGossip algorithm.We consider a Random Geographic Graph with 10 sensors and r0= 1 ontowhich i.i.d.link failure events appear with probability pe.In Fig.5a,we plot the empirical mean squarederror of the BWGossip versus time for different values of the edge failure probability pe.As expected,we observe that the higher pethe slower the convergence but the MSE still exponentially decreases.Then,in Fig.5b,we plot the empirical convergence slope and the associated bound κ for different linkfailure probabilities.Here,κ is computed according to a modied matrix set taking into acc ount the linkfailures through different update matrices.We remark a very good tting between our lower bound andthe simulated results.Consequently,computing κ on the matrix set including the link failures enables usto predict very well the convergence speed in this context.5this slope has been obtained by linear regression on the logarithm of the empirical mean squared error.This regression makessense since,for inspected algorithms,the mean squared error in log scale is almost linear for t large enough as seen in Fig.1.DRAFTSUBMITTED FOR PUBLICATION TO IEEE TRANSACTIONS ON SIGNAL PROCESSING,SEPTEMBER 2012 23VII.CONCLUSIONIn this paper,we have analyzed the convergence of the Sum-Weight-like algorithms (relying on twovariables rather than one) for distributed averaging in a Wireless Sensor Network.We especially givea very precise insight on the convergence speed of the squared error for such algorithms.In addition,we proposed a particular Sum-Weight-like algorithm taking full advantage of the broadcast nature of thewireless channel.We observed that this algorithm signica ntly outperforms the existing ones.APPENDIX APROOF OF LEMMA 1(B) ⇒ (B1) Let denote by K(u,v)a matrix of K whose (u,v)-th coefcient is positive.As the graphassociated with E[K] is connected,then for all couples of nodes (i,j),there is a path of nite length Lij<N from i to j:(i = u1,..,uLij= j).Consequently,the matrix Ki→j= K(u1,u2)K(u2,u3)..K(uLij−1,uLij)veries:(Ki→j)i,j> 0 which gives us a realization of P(Lij) verifying (P(Lij))i,j> 0.(B1) ⇒ (B2) Let us take L =PNi,j=1Lij< 2N2.Since each matrix has a positive diagonal accordingto Assumption (A2) thenQNi,j=1Ki→jis a possible realization of P(L) of strictly positive probabilitywhich is a positive matrix.(B2) ⇒ (B3) If there is a L < 2N2and a realization p of P(L) so that P[P(L) = p] > 0 and p > 0,then p⊗p is also positive.Since (A⊗B).(C⊗D) = (AC) ⊗(BD) for any matrices A,B,C,D withthe appropriate dimensions,(E[K⊗K])L=

MXi=1piKi⊗Ki!L≥ P[P(L) = p].p ⊗p > 0.Hence,E[K⊗K] is a primitive matrix.(B3) ⇒ (B) First,we will calculate E[K] ⊗E[K] with respect to E[K⊗K].So,E[K] ⊗E[K] =MXi=1MXj=1pipjKi⊗Kj≥MXi=1p2iKi⊗Ki≥ (minjpj).MXi=1piKi⊗Ki= (minjpj).E[K⊗K]Hence as it exists k such that (E[K⊗K])k> 0,then (E[K])k> 0 so the primitivity of E[K] is proven.DRAFT24 SUBMITTED FOR PUBLICATION TO IEEE TRANSACTIONS ON SIGNAL PROCESSING,SEPTEMBER 2012APPENDIX BDERIVATIONS RELATED TO SECTION VA.Derivations for Eq.(20)According to Eq.(19),we have easily thatK(t)K(t)T=14I +14NXi=1eieTji(t)+14NXi=1eji(t)eTi+14NXi=1NXi′=1eieTji(t)eji′ (t)eTi′By remarking that eTjej= 1,we haveK(t)K(t)T=12I +14NXi=1eieTji(t)+14NXi=1eji(t)eTi+14NXi=1NXi′=1i′6=ieieTji(t)eji′ (t)eTi′The randomness in K(t)K(t)Tis only due to the choice of the nodes ji(t) for i = {1, ,N}.Therefore,each ji(t) will be modeled by a random variable ℓ(i) (independent of t).The random variables{ℓ(i)}i=1,,Nare i.i.d.and are uniformly distributed over {1, ,N}.As a consequence,we obtainE[KKT] =12I +14NXi=1ei

0pedge failure0.10.20.30.50.40.60.70.80.9(a) Mean squared error versus time for different link failure probabilities.00.10.20.30.40.50.60.70.80.9100.0050.010.0150.020.0250.030.0350.040.045edge failure probabilityAsymptotic slope of the logarithm of the MSE