Comments 0

Document transcript

On the Fourier Spectrum of Symmetric Boolean FunctionsMihail N.KolountzakisyRichard J.LiptonzEvangelos MarkakisxAranyak Mehta{Nisheeth K.VishnoikAbstractWe study the following question:What is the smallest t such that every symmetric boolean function on k variables(which is not a constant or a parity function),has a non-zero Fourier coecient oforder at least 1 and at most t?We exclude the constant functions for which there is no such t and the parity functions forwhich t has to be k.Let (k) be the smallest such t.Our main result is that for large k,(k)  4k=log k.The motivation for our work is to understand the complexity of learning symmetric juntas.A k-junta is a boolean function of n variables that depends only on an unknown subset of kvariables.A symmetric k-junta is a junta that is symmetric in the variables it depends on.Our result implies an algorithm to learn the class of symmetric k-juntas,in the uniform PAClearning model,in time no(k):This improves on a result of Mossel,O'Donnell and Servedio in[16],who show that symmetric k-juntas can be learned in time n2k3.1 IntroductionProblem statementThe study of the Fourier representation of boolean functions has proved to be extremely useful incomputational complexity and learning theory.In this paper we focus on the Fourier spectrum ofsymmetric boolean functions and we study the following question:What is the smallest t such that every symmetric boolean function on k variables (whichis not a constant or a parity function),has a non-zero Fourier coecient of order atleast 1 and at most t?This work was done when all authors were at the Georgia Institute of Technology and it is based on the preliminaryversions [14] and [11].yDepartment of Mathematics,Univ.of Crete,GR-71409 Iraklio,Greece.E-mail:kolount@gmail.com.Partiallysupported by European Commission IHP Network HARP (Harmonic Analysis and Related Problems),ContractNumber:HPRN-CT-2001-00273 - HARP,and by grant INTAS 03-51-5070 (2004) (Analytical and CombinatorialMethods in Number Theory and Geometry).zGeorgia Tech,College of Computing,Atlanta,GA 30332,USA,and Telcordia Research,Morristown,NJ 07960,USA,E-mail:rjl@cc.gatech.edu.Research supported by NSF grant CCF-0431023.xCorresponding author:Centre for Math and Computer Science (CWI),Kruislaan 413,Amsterdam,the Nether-lands,E-mail:vangelis@cwi.nl{IBM Almaden Research Center,650 Harry Rd,San Jose,CA 95120,USA,E-mail:mehtaa@us.ibm.comkCollege of Computing,Georgia Institute of Technology,Atlanta GA 30332,USA,and IBM India Research Lab,Block-1,IIT Delhi,New Delhi,110016,India,E-mail:nkv@cc.gatech.edu1We exclude the two constant functions,for which there is no such t;and the two parity functions,for which t has to be k.Let (k) be the smallest such t.While the above question is interestingin its own right,there is also an important learning theory application behind it,which we outlinenext.MotivationThe motivation to study (k) comes from the following fundamental problem in computationallearning theory:learning in the presence of irrelevant information.One formalization of the problemis as follows:we want to learn an unknown boolean function of n variables,which depends only onk n variables.Typically,k is O(log n).Such a function is referred to as a k-junta.The input is aset of labeled examples hx;f(x)i,where the x's are picked uniformly and independently at randomfrom the domain f0;1gn.The goal is to identify the k relevant variables and the truth table of thefunction.The problem was rst posed by Blum [3] and Blum and Langley [6],and it is considered [4,16]to be one of the most important open problems in the theory of uniform distribution learning.Ithas connections with learning DNF formulas and decision trees of super-constant size,see [7,10,15,20,21] for more details.The general case is believed to be hard and has even been used inthe construction of a cryptosystem [5].A trivial algorithm runs in time roughly nkby doing anexhaustive search over all possible sets of relevant variables.Two important classes of juntas arelearnable in polynomial time:parity and monotone functions.Learning parity functions can bereduced to solving a system of linear equations over F2[9].Monotone functions have non-zerosingleton Fourier coecients (see [16]).For the general case,the rst signicant breakthrough wasgiven in [16] - learning with condence 1  in time n0:7kpoly(2k;n;log 1=).Note that we allowthe running time to be polynomial in 2k,since this is the size of the truth-table which is output.In the typical setting of k = O(log n),this becomes polynomial in n.Fourier based techniques in learning were introduced in [13] and have proved to be very successfulin several problems.Fourier coecients are easy to compute in the uniform distribution learningmodel and furthermore,if a Fourier coecient is non-zero then its entire support is contained inthe set of relevant variables.Hence,it is interesting to ask:what are the sub-classes of juntas forwhich Fourier based techniques yield fast learning algorithms?An important and natural subclassis the class of symmetric juntas.While this subclass contains only 2k+1functions,the problem isnot known to be signicantly easier than the general case.The bound before our work was n2k=3[16],which is not much better than the best bound for general juntas (also obtained in [16]).Ourresults imply an improved bound for learning symmetric juntas via the Fourier based algorithm.We believe that the case of symmetric juntas constitutes a good\challenge problem"towardsthe goal of learning general juntas.One motivation for this is a consideration of the followingwell-known challenge problem [4]:Let f(x1;:::;xn):= MAJORITY(x1;:::;x2k=3)Lx2k=3+1   xk,where x1;:::;xkare someunknown variables among x1;:::;xn.This subclass has been identied as a candidate hard-to-learnclass [4].The current bound for learning this subclass of juntas is nk=3;and it is asked in [4] if afaster algorithm exists.Note that f is invariant under permutations of fx1;:::;x2k=3g and underpermutations of fx2k=3;:::;xkg,i.e.,it is invariant under a large group of symmetries.This suggeststhat it is interesting to begin with the case of symmetric juntas.2 Our ResultsThere are two main results in this paper:22.1 The Self-Similarity TheoremTheorem 2.1.Let 1  s  l be xed integers such that (l)  s.Then there exists k0:= k0(s;l);such that for every k  k0,(k) s+1l+1k +o(k).It was observed in [14],via a computer search,that (30) = 2:This implies that (k)  3k=31.Proof Technique.Not surprisingly,the study of (k) is equivalent to the study of 0=1 solutionsof a system of Diophantine equations involving binomial coecients.As a rst step,we simplifythese Diophantine equations by moving to a representation which is equivalent to the Fourierrepresentation,but seems much simpler for the application of number theoretic tools.Once thisis done,we reduce these Diophantine equations modulo carefully chosen prime numbers to get asimpler system of equations which we can analyze.Finally,we combine the information about theequations over the nite elds in a combinatorial manner to deduce the nature of the 0=1 solutions.The following well-known self-similarity property of Pascal's Triangle (known as Lucas'Theorem)plays an important role:If m= lp for some integer l;and some prime p;then the values obtainedby reducing the m-th row of Pascal's Triangle modulo p;can be read o directly from the l-th rowof Pascal's Triangle.2.2 The O(k= log k) TheoremTheorem 2.2.There is an absolute constant k0> 0 such that for k  k0,(k)  4klog k.Proof Technique. We start again by looking at the 0/1 solutions of the system of Diophantine equations,as inthe proof of Theorem 2.1.We then take a departure from this approach by further reducingthis to the problem of showing that a certain integer-valued polynomial P is constant overthe set f0;1;:::;kg.We manage to prove this in two steps: First,we show that P is constant over the union of two small intervals f0;:::;tg[fkt;:::;kg.This is obtained by looking at P modulo carefully chosen prime numbers.One way to provethis (at least innitely often) would be to assume the twin primes conjecture (that there arean innite number of pairs of primes whose dierence is 2).We manage to replace the use ofthe twin prime conjecture (and get a result which works for all large enough k) by choosingfour dierent primes in a more involved manner.To choose these prime numbers we use theSiegel-Walsz theorem on the density of primes in arithmetic progressions with modulus ofmoderate growth.This is a generalization of Dirichlet's Theorem,and is stated precisely inSection 6. In the second step,we extend the constant nature of P to the whole interval f0;:::;kg byrepeated applications of Lucas'Theorem.One additional interesting aspect of our proof is theuse of an equivalence between (a) the vanishing of Fourier coecients,and (b) the equality ofmoments of certain random variables under the uniformmeasure on the hypercube and underthe measure dened by the function itself.This equivalence helps in the proof by eliminatingthe need for a large amount of case analysis.Our results imply a bound of no(k)for the Fourier based learning algorithm for the class ofsymmetric k-juntas.To our knowledge,this is the best known upper bound for learning symmetricjuntas under the uniformdistribution.Independent of the learning problem,the fact that symmetric3boolean functions have non-zero Fourier coecients of relatively small order provides new insightinto the structure of these functions.2.3 Related WorkPreviously,the idea of reducing binomial coecients modulo a prime number has been used in [22]to prove lower bounds on the degree of polynomials representing symmetric boolean functions.In [22],their problem reduces to showing that a certain sum of binomial coecients is non-zero,which is done by reducing the sum modulo a prime number.Our problem involves a collection ofsums which we have to prove are unequal.For this we need to consider reductions modulo manydierent primes which have to be carefully chosen so as to satisfy certain properties.Combiningthe information obtained by these reductions is also more involved in our case.The result of [22] has in fact been used in the proof of the previous best n2k=3bound forlearning symmetric juntas [16].Using [22],it is shown in [16] that if a symmetric function f isbalanced,i.e.,Pr[f(x) = 1] = 1=2,then it has a non-zero Fourier coecient of order o(k).The2k=3 bottleneck comes in the case of unbalanced symmetric functions,which are analyzed througha dierent argument.As noted in [16] and as we also note in Section 6,the result of [22] does notseem to be applicable to learning unbalanced functions.3 NotationWe consider boolean functions from f0;1gk!f0;1g.For a set S  [k];dene S:f0;1gk!f1;1g to be the function S(x):= (1)Pi2Sxi.By convention,the boldface x denotes a vector,in this case (x1;:::;xk).For a function f:f0;1gk!f0;1g;and S  [k];dene the Fouriercoecient corresponding to S as^f(S):=12kPx2f0;1gkf(x)S(x):The order of a Fourier coecient^f(S) is jSj.The Fourier expansion of f is:f(x) =PS[k]^f(S)S(x):If f is symmetric,f is completely determined by its value on any k + 1 vectors of distinctweights where the weight of a boolean vector is the number of 1's in it.We will use the followingvector representation of f:(f):= (f0;f1;:::;fk):Here fiis the value of f on a vector of weighti:Further f has precisely k +1 (non-equivalent) Fourier coecients,(^f0;:::;^fk):Here^ftis denedas^f(S);for some S  [k] with cardinality t:Since f is symmetric,this does not depend on thechoice of S:The following four special symmetric functions on k variables will appear often:thetwo constant functions 0 and 1;the parity function ;and its complement:4 An Equivalent Formulation as a Diophantine ProblemIn this section we give an equivalent condition for the existence of a non-zero Fourier coecient ofa boolean function f.While we prove the equivalence for all boolean functions,we use it only forthe special case of symmetric functions.Let f:f0;1gk7!f0;1g be a boolean function.For a vector x = (x1;:::;xk);and a set S  [k];xSis the projection of x on the indices of S:Let  2 f0;1gjSj:Dene the following probabilities:pS;(f):= Pr [f(x) = 1jxS= ]:(1)Unless mentioned,all probabilities are over the uniform distribution.Denition 4.1.For t  1,call a boolean function f on k variables t-null,if for all sets S  [k];with jSj = t;and for all  2 f0;1gt;the probabilities pS;(f),as dened in (1),are all equal to eachother.4The notion of t-nullity has been introduced in dierent contexts and under dierent namesin other areas including,among others,cryptographic applications [18].In particular t-nullity isequivalent to the notion of t-th order correlation immunity [18],strongly balancedness up to sizet [2] and t-wise independence of the corresponding probability distribution [1].The following lemmareveals the connection with the Fourier coecients of f.Lemma 4.1.Let f be a boolean function on k variables.f is t-null for some 1  t  k;if andonly if,for all;6= S  [k] with cardinality at most t,^f(S) = 0:Proof.It can be easily veried that if f is t-null,then for all;6= S  [k] with cardinality at mostt,^f(S) = 0.This follows from the fact that the Fourier coecients of order at most t can beexpressed as 1 combinations of pS;(f) with  2 f0;1gt,and S  [k];jSj = t.When f is t-null,the terms cancel out.The proof of the other direction is by induction and we omit it here.The following is an immediate corollary of this lemma.Corollary 4.2.Let f be a boolean function on k variables.If f is t-null for some 1  t  n thenf is s-null for 1  s  t:When we consider the case of symmetric functions,pS;(f) just depends on s:= jSj and the weightw of .We denote this by ps;w(f):It is clear thatps;w(f) =12kskXi=0fik si w;wherelmis 0 if m < 0 or m > l,and00is 1.By denition,f is s-null if for 0  w  s,ps;w(f)are all equal.Hence,f is s-null i there exists c:= c(f;s;k) such thatkXi=0k si wfi= c;8 0  w  s:(2)Thus,we haveLemma 4.3.For 1  s  k,let Ak;sbe the (s +1) (k +1) matrix:Ak;s(i;j):=k sj i:A symmetric function f is s-null if and only if there exists a positive integer c:= c(f;s;k) suchthat:Ak;s (f) = c1:It is easy to see that the constant boolean functions f0;1g satisfy this system of equations forall s,i.e.,they are s-null for all s,s.t.1  s  k.One can also see that the boolean functionsf;g are s-null for all s s.t.1  s < k.From Lemma 4.1 and Lemma 4.3 we get:Corollary 4.4.All symmetric boolean functions f 62 f0;1;;g have a non-zero Fourier coe-cient of order at most s0(and at least 1) i there exists s,1  s  s0s.t.f0;1;;g are the only0/1 solutions to:ksXi=0fik si=ks+1Xi=1fik si 1=    =kXi=sfik si s:(3)55 The Self-Similarity TheoremIn this section we prove Theorem 2.1.First we recall a few results from number theory that we willuse repeatedly.The following result is a special case of Lucas'Theorem [8,Ch.3] and illustratesthe self-similar nature of the Pascal's Triangle modulo primes.Lemma 5.1.For a prime p;an integer m  0 and 0  i  mp;mpimjmod p if i = jp forsome 0  j  m;and 0 otherwise.On numerous occasions,we will use the following result about the density of primes.Thisfollows from the Prime Number Theorem.Lemma 5.2.For large enough n;there is a prime p  n;such that p = n o(n):5.1 A Simple Bound of k=2In this section we give a self-contained proof of the following weaker result.The aim of thissubsection is merely to illustrate the key ideas behind the proof of Theorem 2.1.Theorem5.3.For any symmetric boolean function f on k variables (f 62 f0;1;;g),there exists1  t k2+o(k) such that^ft6= 0:We need the following combinatorial lemma.For positive integers k;p;q;s.t.p 6= q,let Gk;p;qbe the graph with vertex set f0;1;2;:::;kg;and the edge set f(i;j):ji jj = p or qg.Lemma 5.4.For positive integers k;p;q such that (p;q) = 1 and p +q  k;Gk;p;qis connected.Proof.We proceed by induction on p+q.Without loss of generality,let p > q.Clearly,the lemmaholds for the base case.Let i;j be s.t.0  i < j  k and j i = p q.Since p +q  k,eitheri +p  k or i q  0.In either case,there is a path of length 2 between i and j.Hence,replacingthe edges f(u;v):ju  vj = pg by the new edges f(u0;v0):ju0 v0j = p  qg does not increasethe connectivity of the graph.It suces to show that Gk;pq;qis connected,which follows by theinduction hypothesis.Proof of Theorem 5.3:Let f be a symmetric function such that for every 1  t k2+o(k),^ft= 0.We will show that f 2 f0;1;;g.By Lemma 5.2,we can pick primes p;q,s.t.k2o(k) = p < q k2.Since k p and k q areboth at mostk2+o(k),we get from Lemma 4.1 that f is (k p)-null and (k q)-null.Hence,byLemma 4.3,there are constants c1;c2such thatAk;kp(f) = c11 and Ak;kq(f) = c21:Consider these two systems of equations modulo p and q respectively.Let 0  cp< p and 0  cq< qbe s.t.cp c1mod p;and cq c2mod q.We will use pto denote congruences mod p (andsimilarly for q).The systems become:Ak;kp(f) pcp1 and Ak;kq(f) qcq1:Now,from Lemma 5.1,we see thatpip1 if i = 0 or i = p,andpip0 otherwise (and similarlyfor q).Hence,we see that the equations are of the formfi+fi+ppcpfor 0  i  k p6andfi+fi+qqcqfor 0  i  k q:Since fi2 f0;1g and p > 2,these modular equations are in fact exact equalities and cp;cq2 f0;1;2g.If cp= 0;then it follows that cq= 0 and f = 0.If cp= 2;then cq= 2 and f = 1.The onlyremaining case is cp= cq= 1.This givesfi= 1 fi+pfor 0  i  k p and fi= 1 fi+qfor 0  i  k q:In other words,ji jj = p or q implies that fi= 1 fj.Since Gk;p;qis connected (Lemma 5.4)it follows that xing the value of any one fiuniquely determines f,and hence,there are at most2 possible choices for f.We can see that f;g are solutions to these equations,and hence,theyare the only solutions in this case.25.2 Proof of Theorem 2.1Recall that the hypothesis of the Theorem is that (l)  s.Let f be a symmetric boolean functionon k variables.Suppose that f is t-null,for all t s+1l+1k + o(k).We will show that f 2f0;1;;g.Let m = l s:As of now,assume that there is a prime p such that k = (m+s +1)p 1:Wehandle the case when there is no such prime p later.Set t:= k mp = (s +1)p1:Since p =k+1l+1;t =s +1l +1k +s +1l +11 <s +1l +1k:Hence,f being t-null implies that there is an integer c such thatAk;t(f) = c1:(4)We remark that the role of o(k) term is redundant in this case.It will play a role when we cannotchoose p such that k t = mp:Reducing to a smaller problemNote that,by denition of t;k t = mp.For 0  i  p 1;let Fi:= (fi;fi+p;fi+2p;:::;fi+lp):Hence,reducing Equations (4) modulo p;and using Lemma 5.1,one obtains the following systemsof equations.Al;sF0 c01 mod pAl;sF1 c01 mod p...Al;sFp1 c01 mod p:Here c0 c mod p:If k is greater than (l +1)2ls,then it follows that p > 2ls.Therefore,forsuch a k,these modular equations are in fact exact.That is,there is a positive integer d  0;suchthat the following set of equations hold.7Al;sF0= d1Al;sF1= d1...Al;sFp1= d1:(5)Using the fact that (l)  s;we deduce that for any i;the system of equations Al;sFi= d1 hasat most 4 solutions.Hence,xing any two variables in Fixes all its variables.This implies thatthere are at most 4pchoices for f:Now we show how to narrow down these choices to 4:Combining the smaller instancesLetk2< mp  q  (m+s)p be a prime.Since f is t-null,and t = k mp  k q;by Corollary4.2,f is (k q)-null.Now,consider the system of equations Ak;kq(f) = c1 modulo the prime q:Since q > 2;we get,for some e  0;exact equations of the following form:f0+fq= ef1+fq+1= e...fkq+fk= e:(6)The idea is that these equations,along with Equations (5),are sucient to restrict f to one of thefour functions,as desired.First,we need a simple fact.For an integer r  0;let (r)p:= r mod p:Also,for 0  i  p 1;let [iq]p:= f(iq)p;(iq)p+p;:::;(iq)p+(m+s)pg.Fact 5.5.Let p;q be distinct primes.Then,for 0  i < j  p  1;[iq]p\[jq]p=;;and[i +q]p\[j +q]p=;:Now,x f0;fp2 F0:As noticed before,this xes all the variables in F0:Using Equations (6),inparticular,we get that fqand fq+pare xed.Notice that fq;fq+p2 F(q)p:Now Equations (5) implythat all the indices in F(q)pget xed.Note that for any 0  i0< p;we have that i0+ q  k bythe choice of q:Now applying this argument to f(q)pand f(q)p+p(which are in F(q)p),we get thatf(q)p+qand f(q)p+p+qare xed.Note that these variables are in F(q+1)p:By Fact 5.5,F(q+1)pisdisjoint from F(q)p:Iterating the alternate use of these two systems of equations,along with Fact 5.5,one obtainsthat all the variables in Fi,for every i;are xed,once f0and fpare xed.Hence,f has atmost four choices:f0;1;;g;one for every possible xing of ff0;fpg:Thus,since p > 2lsandk = (l +1)p1,we can choose k0:= k0(l) such that for all k  k0,(k)  t =s+1l+1k +s+1l+11 s+1l+1k:Handling the residual class of variablesNow we consider the case when there is no prime p such that k = (m+s +1)p 1:In this case,we pick a prime p in the intervalhkm+s+1o(k);km+s+1i:We are guaranteed the existence of sucha prime by Lemma 5.2.Let t = k  mp:Hence,(s + 1)p + o(p)  t  (s + 1)p:Since we thinkof m as a constant,p =(k):Hence,there is a small number (o(k)) of variables,say R;whichremain to be dealt with in the previous argument.In particular,these are the variables startingfrom position (m+ s + 1)p all the way to k and ff0;:::;fkg =[p1i=0Fi[ R:By the argument8in the previous case,xing f0and fpxes all the variables in [p1i=0Fi:Further,since jRj = o(k);and q > k=2;every variable in R will appear in one of the Equations (6) along with a variable in[p1i=0Fi;and hence,get xed.Thus,since p > 2lsand k = (l +1)p 1,we can choose k0:= k0(l;s) such that for all k  k0,(k) s+1l+1k +o(k).This completes the proof of Theorem 2.1.6 A bound of O(klog k)This section is devoted to the proof of Theorem 2.2.We start with some general discussion aboutthe proof.The preliminary setup is the following.Suppose f is a boolean function on G = Zk2,such that all its non-constant Fourier coecients of order up to k = k N are 0.Then the valuesfjof f satisfy (3) with s = k N,which,changing indices,can be rewritten as:XjNjf+j= cN;for all  = 0;:::;k N:(7)It is easy to show by induction on N,starting with N = k and going down,thatcN= 2NAvg f = 2NkXx2f0;1gkf(x):(8)We want to show that if k N = k = 4klog k,then fjis either constant or alternates between 0and 1.We prove this for all k suciently large.Dene Dj= fj+1 fj,for j = 0;:::;k  1,and observe that the sequence Djsatises thehomogeneous version of (7):XjNjD+j= 0;for all  = 0;:::;k N 1:(9)Recall that in (9) the number N can be replaced by any other integer N1in the interval [N;k]by Corollary 4.2 and Lemma 4.3.From (9) the sequence Djmay be dened for all j 2 Z and Dj2 Z for all j.From the theoryof recurrence relations we know then that the sequence Djmay be written as a linear combinationof the following sequences:(1)j;(1)jj;(1)jj2;:::;(1)jjN1:The reason for this is that 1 is the only root of the characteristic polynomial of the recurrence,(z) =PjNjzj= (1 +z)N.Therefore there is a polynomial P(x),of degree at most N 1,suchthatDj= (1)jP(j);for all j 2 Z:Clearly P(x) takes integer values on integers and in particular P(j) 2 f1;0;1g for j = 0;:::;k1.From the well known characterization of integer-valued polynomials [17,p.129,Problem 85] itfollows that we may writeP(x) =N1Xj=0ajxj;with aj2 Z:(10)At this point it is instructive to give a proof,in this framework,of a result of [16].This proofwill also serve to clarify the relation of our method to that of [22].A boolean function is calledbalanced if it takes the value 1 as often as it takes the value 0.9Theorem 6.1.(Mossel,O'Donnell and Servedio,2003) If f:f0;1gk!f0;1g is a balancedsymmetric function which is not constant or a parity function then some of its Fourier coecientsof order at most O(k0:548) are non-zero.Proof.Subtracting cNfrom both sides of (7) and using (8) we obtain that the sequence fncN2N=fnAvg f = fn12satises the homogeneous recurrence relation (9) in place of Dn.By the samereasoning as above (1)n(fn12) is then a polynomial of degree at most N 1.But it only takesthe values 12for n = 0;1;:::;N;:::;k 1.Von zur Gathen and Roche [22] have shown that anypolynomial Q(n) which takes only two values for n = 0;1;:::;k must have degree d  kO(k0:548),hence k N = O(k0:548),which is what we wanted to prove.Remark.The method of [22] says nothing about polynomials which may take 3 or 4 values.Ifone omits the assumption that f is balanced then the sequence (1)n(fnAvg f) may take up to4 possible values.Plan of proof.We assume that f has all non-constant Fourier coecients of order up to k Nequal to 0 and we want to show that f 2 f0;1;;g.Since Dj= fj+1fjit is enough to showthat either Djis identically 0 or that Dj= (1)jor Dj= (1)j+1.This is equivalent to showingthat P(j) = (1)jDjis a constant polynomial,constantly equal to 1;0 or 1.We will rst show that the polynomial P is constant in two\small"intervals at the endpointsof the interval [0;k] (Lemma 6.3).To achieve this we will rst show that P has period 2 in each ofthese intervals (Lemma 6.2).For this we use some elaborate number-theoretic results (Theorem A)on the distribution of primes.Many of the technicalities in that part would not be needed if oneknew that there are plenty of twin primes,that is integers p such that p and p+2 are both primes.Once we have that P is constant in these two intervals near the endpoints of [0;k] we showusing the modular approach that P is also constant on a similar interval around the midpoint of[0;k] (Lemma 6.4).At this point a signicant element of our method is to eliminate the possibilitythat P is 0 (we are assuming of course that f is not constant).To show this we interpret f asa probability measure on the discrete cube and the vanishing of Fourier coecients up to order rbecomes equivalent with r-wise independence of the marginals of that measure (Theorem 6.5).Itfollows that if P vanishes in the middle interval in question then the second moment of a certainrandom variable would be larger than we know it is (Corollary 6.6).This elimination of 0 as apossible value is what makes the method work.We repeatedly obtain that P is constant in moreand more intervals of the same length,each in the middle of the existing gaps,until the wholeinterval [0;k] is covered (Lemma 6.8).Notation.In what follows we repeatedly use the letter C to denote a positive constant whichdepends on no parameter (unless we say otherwise).As is customary,this constant C need not bethe same in all its occurences.Denition 6.1. denotes the maximum dierence between succesive primes in the interval [0;k].From Theorem A it follows,for instance,that  =O(k= log10k) which is o(k N).Lemma 6.2.The polynomial P satises the 2-periodicity conditionP(j) = P(j +2);whenever j;j +2 2 A = [0;k N ] [[N +;k 1].10Proof.If p  N is a prime,and since all the factors that appear in denominators in (10) are strictlyless than p (hence invertible mod p),it follows that the sequence P(j) mod p,j 2 Z,may be viewedas a polynomial with coecients in Zpand therefore is a p-periodic sequence mod p,i.e.P(j +p) = P(j) mod p;for all j 2 Z and p  N:(11)If,in addition,0  j < j +p < k,when all P-values that appear in (11) are in f1;0;1g,it followsthat we have the non-modular equalityP(j +p) = P(j);(N  p  p +j < k):(12)We shall need various primes in intervals fromnowon.The version of the prime number theoremthat we will be using is the Siegel-Walsz theorem (see [12,Theorem 2]).Dene the logarithmicintegralLi x =Zx2dtlog txlog x;(x!1):The Euler function'(q) below denotes the number of moduli mod q which are coprime to q.Theorem A (Siegel-Walsz) Let (x;M;a) be the number of primes  x which are equal toa mod M and assume that (M;a) = 1.Then if M  (log x)A,A a constant,we have(x;M;a) =Li x'(M)+O(xexp(cplog x));(as x!1):(13)where c depends on A only (the constant in the O() term is absolute).For (x),the number of primes up to x without any restriction,we thus have (x) = Li (x) +O(xexp(cplog x),for some absolute constant c.These theorems guarantee that,for x!1,the interval [x;x +] has the\expected"numberof primes whenever   Cx(log x)A,whatever the constant A,even if we impose the conditionthat these primes are equal to a mod M,as long as M  (log x)B,for any constant B.We use the above theorems along with the p-periodicity of P to deduce that P is in fact 2-periodic on the union of 2 small sub-intervals of [0;k 1].Assume q < r are two primes in [N;N +h],where h = (k N)=3 =3k.(The length of theinterval [N;N +h] is large enough to guarantee the existence of many primes in it.) From (12) itfollows that the nite sequencesP(0);:::;P(k q) and P(q);:::;P(k)are identical.Applying (12) again with r we get that the nite sequencesP(0);:::;P(k r) and P(r);:::;P(k)are identical.It follows thatP(j +r q) = P(j);for all j with N +h  j  N +2h and r > q primes in [N;N +h]:(14)We now assume,as we may,that the dierence M = r q is the smallest dierence between twoprimes in [N;N+h].By the prime number theoremM  C log k.Hence,we can apply Theorem Awith modulus M.Since'(M)  M  C log k in that case Theorem A guarantees that the numberof primes equal to a mod M in [N;N +h] is at leastChlog2k Cklog3k;11whenever (M;a) = 1.All that matters here is that this number is positive for large k.Let t 2 [N;N +h] be the smallest prime which is equal to 1 mod M.By Theorem A,appliedto modulus M and residue 1,its existence is guaranteed and furthermore that t  N.Thesame theorem guarantees that we can nd a prime s 2 (t;N +h] such that s = 1 mod M.Thenst = 2 mod M or st =`M+2,for some nonnegative integer`.Therefore,for N+h  j  N+2hwe haveP(j) = P(j +s t) (applying (14) for the primes s;t)= P(j +`M +2)= P(j +(`1)M +2) (applying (14) for the primes r;q)  = P(j +2):This 2-periodicityP(j) = P(j +2) (15)is now transferred to all j;j +2 2 A by using (12) repeatedly for appropriate primes p.We use the following observation:if P(j) is 2-periodic in an interval [a;b]  [0;k] and j 2 [0;k]is such that there exists a prime p  N for which j +p;j +2 +p 2 [a;b] or j p;j +2 p 2 [a;b]then P(j) = P(j +2).Since we know that P is 2-periodic in the interval [N+h;N+2h],we rst apply the observationto obtain the 2-periodicity in the interval [0;2h],since for any j in that interval we can nd anappropriate prime to apply the observation.Using this new interval we now get the 2-periodicity in the interval [N +;k].Next we deducethe 2-periodicity in the interval [0;k N ].Notice that in the sequence Dj,if one erases the 0's,one sees an alternation of 1 and 1(this follows from the fact that fj2 f0;1g).This property greatly reduces the number of allowedpatterns in Djand in fact it implies that P is constant in A.Lemma 6.3.The polynomial P is constant in A (dened in Lemma 6.2).Proof.From Lemma 6.2 the values of P in [N + ;k  1] must be a 2-periodic sequence.Theonly essentially dierent non-constant 2-periodic patterns for the values of P in [N + ;k  1]are 010101:::and (1)1(1)1:::and they both violate the property that Dj= (1)jP(j) mustsatisfy,namely that if one erases the 0's then one must see an alternation of 1 and 1.ThereforeP is constant in each of the two intervals of A.From the p-periodicity (12),applied,say,for somep  (k +N)=2 it follows that the constant is the same in both intervals.We now extend the set on which P is constant to a superset of A that contains a small intervalaround k=2.Lemma 6.4.Let a =N2+32and b =N2+(k N) 52.Then P(l) = P(0) for a  l  b.Proof.We shall apply Lemma 5.1 with m= 2 and with a prime r such that 2r is the least possiblesuch number larger than N +.It follows that 2r  (N +) +2 = N +3.And it follows fromthe remark after (9) thatXj(1)j2rjP(j +) = 0;( 2 Z):(16)12Taking residues mod r and using Lemma 5.1 for m= 2 we obtainP() 2P( +r) +P( +2r) = 0 mod r;( 2 Z):By our particular choice of r we have P() = P( +2r) = P(0) whenever  2 [0;k N 3].Itfollows that P( +r) = P(0) for all such  so we get P(l) = P(0) for all l in the intervalN2+32;N2+(k N) 52:So far we have proved P(l) = P(0) on the set (a;b are dened in Lemma 6.4)A2= [0;k N ] [[a;b] [[N +;k 1];which consists of three asymptotically equispaced intervals of asymptotic size k.We consider twocases for P.The rst is when P is 0 on A2and the second is when P is 1 or 1.To eliminate the case that P is 0 on A2,we shall need the following theorem,which alreadygives a lot of signicant information about the function f.It should be thought of as analogous tothe fact that the moments of a vector random variable can be read o the Fourier Transform of itsdistribution (the characteristic function) by looking at partial derivatives at 0.Theorem 6.5.Suppose f:G = Zk2= f0;1gk!R is nonnegative and not identically 0 and hasall its Fourier coecients of order at most r (and at least 1) equal to 0.Let  denote the uniformprobability measure on the cube G and  denote the probability measure on G dened by(A) =Xx2Af(x).Xx2Gf(x);(A  G):Let also X1;:::;Xkdenote the coordinate functions on G,which we view as random variables.Then for all i1< i2<    < is,0  s  r,we haveE(Xi1   Xis) = E(Xi1   Xis):Proof.Let F =Px2Gf(x).We assume for simplicity that i1= 1;:::;is= s.Then,writingx = (x1;x2;:::;xk) and [s] = f1;:::;sg,we haveE(X1   Xs) =1FXx2Gf(x)x1   xs=1FXx2Gf(x)1 +(1)x1+12  1 +(1)xs+12=12sFXx2Gf(x)XS[s](1)jSj+Pi2Sxi=jGj2sFXS[s](1)jSj1jGjXx2Gf(x)(1)Pi2Sxi=jGj2sFXS[s](1)jSjbf(S)=jGj2sFbf(0) (by the vanishing ofbf(S) for;6= S  [s])= 2s= E(X1   Xs)13Remarks.1.For functions f:f0;1gk!f0;1g,which is all we shall need here,the above theorem also followsdirectly from the denition of t-nullity in Section 4.2.If the nonnegative function f is symmetric then the identity of moments up to order r with thoseof the uniform distribution (r-wise independence) and the vanishing of the non-constant Fouriercoecients of weight up to r are equivalent (see also [1] for a discussion on this connection).Thiscan be proved by induction on r.We do not use this here.Corollary 6.6.Under the assumptions and denitions of Theorem 6.5 the random variable S =X1+   +Xkhas the same power moments E(Ss) under the probability measures  and ,up toorder s  r.Proof.The power Ss,s  r,can be written as a sum of terms of the type Xi1   Xit,for t  s.One uses the fact that X2j= Xj.Lemma 6.7.If P is 0 on A2,then f is constant.Proof.Suppose the polynomial P is constantly equal to 0 on the set A2and that f is not constant.The sequence fjis then constant in each of the three intervals of A2.By possibly considering 1f(whose Fourier coecients vanish exactly where those of f do,if f is not a constant function),wemay assume that fj= 0 on the middle interval (a;b).Let  be the distribution of the randomvariable S = X1+  +Xkunder the measure induced by f on G (each vertex x 2 G has probabilityproportional to f(x)),where X1;:::;Xkare the coordinate functions on G.Note that this is a welldened probability distribution since we assumed that f is not the 0 function.The s-th moment with respect to the measure  of the variable S in Corollary 6.6 is theexpressionM(;s) =1FXjfjkjjs;where again F =Pjfjkj.By Corollary 6.6,if s  kN this moment must equal the s-th momentwith respect to the binomial measure ,which is the quantityM(;s) = 2kXjkjjs:But the variance of S under  isM(;2) M(;1)2= k;(17)since under  the random variables X1;:::;Xkare independent,while the variance of S under  isE(S ES)2= E(S ES)2= E(S k2)2 C2k2(18)as the mass of  sits to the left of a  k2  k2 and to the right of b  k2 + k2.Theorders of magnitude in (17) and (18) are dierent whenever   Cpk,which is true in our caseas  = 4log k.This contradiction proves that P cannot equal 0 on A2.14Extending A2to [0;k 1].For 2l= m= 2;4;:::,we dene the setsBm=m[j=0jmN +(m);jmN +k (m);where (m) = (m=2) +m,for m 4,and (2) = 3 (these intervals will be overlapping whenm is large).Lemma 6.8.There is a constant k0> 0 such that if k  k0and  = 4log k then(a) the polynomial P is equal to 1 on Bm\[0;k 1],for m= 2;4;8;:::with m12log k,and(b) if m takes the highest value allowed in (a) then Bmcovers [0;k 1],hence P = 1 on [0;k 1].Proof.To prove (a) we work by induction on m= 2;4;:::.The base case m= 2 is settled since wehave B2 A2(that's why we chose (2) large enough).Assume now that we have proved P = 1 on Bm=2\[0;k 1].We apply Theorem 5.1 for m andwe choose a prime r such that mr is the least possible larger than N.ThusNm r  Nm+:(19)Lemma 5.1 together with relation (16) gives for all  2 ZP() mP( +r) +m2P( +2r)    +(1)mP( +mr) = 0 mod r:(20)We would like,for j even,the number  + jr to belong to Bm=2,for most values of  in theinterval [0;k].That is we wantjmN +(m=2)   +jr jmN +k (m=2);for 0  j  m,j even.Given (19) this follows from(m=2)    k (m=2) m:(21)For  satisfying (21) the range of the expression  +jr (j xed) contains the interval[jr +(m=2);jr +k (m=2) m];which,using (19) again,contains the intervaljmN +m +(m=2);jmN +k (m=2) m:From the relation (m) = (m=2) +m it follows that this last interval is the j-th interval of Bm.We have shown that whenever  satises (21) the numbers  +jr,0  j  m,j even,are all inBm=2so,by the induction hypothesis,the polynomial P takes the value 1 on them.In the left hand side of (20) the sum of the absolute values of the coecients is at most 2mandas long as 2m< r it follows that (mod r) can be dropped from (20).If (21) is satised it is clearthat the sum of the terms of (20) corresponding to even j is 2m1,since these P terms are all 1.If,in addition 2m< r,we obtain that the terms corresponding to odd j must all have their P term15equal to 1.The reason for this is that the sum of absolute values of the odd terms is at most 2m1and is equal to that only in case all P's are equal to 1.Letting  run through all terms allowed by (21) we obtain that P has the value of 1 on allintervals of Bmcorresponding to odd j.Since the intervals corresponding to even j are alreadycontained in Bm=2we obtain the desired conclusion,that P is equal to 1 on Bm,as long as 2m< r,which is clearly satised if 2m< N=m orm12log k:(22)This concludes the proof of (a).To prove (b) observe that (m)  2m.Letting  = 4= log k,we observe that if we let m be aslarge as part (a) allows then each of the intervals of Bmoverlaps with the next one thus coveringall of the interval [0;k  1],which proves (b) and that P is constantly equal to 1,as we had toprove.7 Learning symmetric juntasIn this section we apply Theorem 2.2 to obtain faster learning algorithms for the class of symmetrick-juntas on n variables.First we need some preliminaries and well known tools from computationallearning theory.7.1 PreliminariesWe consider the PAC learning model [19].The learning problem at hand is a Concept ClassC =SnCn;where each Cnis a collection of boolean functions from f0;1gn!f0;1g:Let  be anaccuracy parameter and  a condence parameter.A learning algorithm A for C has access to anoracle I(f) for f 2 Cn.A query to I(f) outputs a labeled example hx;f(x)i;where x is drawnfrom f0;1gnaccording to some probability distribution.A is said to be a learning algorithm forthe class C if for all f 2 C;when A is run with oracle I(f),it outputs,with probability at least1 ,a hypothesis h such that Prx[h(x) = f(x)]  1 :Although Valiant's PAC model is denedfor general distributions,in this paper we will be concerned only with the uniform distribution.We recall the denition of a k-junta.Let f:f0;1gn!f0;1g be a boolean function.We saythat f depends on the variable i;if there are vectors x and y that dier only on the i'th coordinateand f(x) 6= f(y).A function that depends only on an (unknown) subset of k  n variables iscalled a k-junta.The variables on which f depends are called the relevant variables of f.Typicallyk = O(log n):Hence,a running time that is polynomial in 2k;n and log(1=) is considered ecient.A symmetric k-junta is a boolean function which is symmetric in the variables it depends on.Theclass of all such functions dened on n variables is the class of symmetric k-juntas.In this section,we present an algorithm for learning this class in the uniform PAC model.7.2 Analysis of the Fourier based algorithmWe will use the following facts about learning in the PAC model which are well known.(i) We can exactly calculate the Fourier coecients of the target function with condence 1 in time poly(log 1=,2k;n) using standard Cherno-Hoeding bounds (see [13,16]).(ii) We can decide whether the target function f is constant or not in time poly(log 1=;2k).16(iii) We can learn a parity function in time n!poly(log 1=;2k) [9].Here!is the exponent formatrix multiplication,!< 2:376.We state the standard Fourier based algorithm below:Throughout the algorithm,we maintain a set of relevant variables,R. Check if the function is constant or parity. If not,set R:=;,t:= 1.1.For every subset of t variables,say S = fxi1;:::;xitg do:(a) Compute^f(S).(b) If^f(S) 6= 0,then R:= R[S.2.If for all sets S of size t,^f(S) = 0 then t:= t +1 and go to step 1.3.Else,R now contains all the relevant variables.Draw enough samples to build f's truthtable and halt.If xiis an irrelevant variable for f,then it is easy to see that for any S containing xi,^f(S) = 0.Hence,if^f(S) 6= 0,for some S,then S contains only relevant variables.Since the function issymmetric,for any two sets S;T of relevant variables such that jSj = jTj,we have^f(S) =^f(T).Hence,the rst time that we will identify some relevant variables in the algorithm (^f(S) 6= 0 forsome S,jSj = s),we will actually be able to identify all the relevant variables,and the runningtime will be roughly ns.Hence,as a direct consequence of Theorem 2.2,we obtain a bound of no(k)for learning symmetric juntas.Theorem 7.1.The class of symmetric k-juntas can be learned exactly under the uniform distri-bution with condence 1  in time nO(k=log k) poly(2k;n;log(1=)):8 DiscussionThe main open question is to obtain tight upper and lower bounds on the running time of theFourier-based algorithm for symmetric juntas.It may even be that for large k,every symmetricfunction has a non-zero Fourier coecient of constant order.It should also be noted that in the case of balanced symmetric functions,i.e.,symmetric func-tions with Pr[f(x) = 1] = 1=2,a bound of O(k0:548) follows from [22] (see [16]).Hence,to improveour result,one may focus on nding new techniques for unbalanced functions.References[1] N.Alon,A.Andoni,T.Kaufman,K.Matulef,R.Rubinfeld,and N.Xie.Testing k-wise andalmost k-wise independence.In STOC,pages 496{505,2007.[2] A.Bernasconi.Mathematical Techniques for the Analysis of Boolean Functions.PhD thesis,Universita degli Studi di Pisa,Dipartimento de Informatica,1998.[3] A.Blum.Relevant examples and relevant features:Thoughts from computational learningtheory.In AAAI Symposium on Relevance,1994.[4] A.Blum.Open problems.COLT,2003.17[5] A.Blum,M.Furst,M.Kearns,and R.J.Lipton.Cryptographic primitives based on hardlearning problems.In CRYPTO,pages 278{291,1993.[6] A.Blum and P.Langley.Selection of relevant features and examples in machine learning.Articial Intelligence,97:245{271,1997.[7] N.Bshouty,J.Jackson,and C.Tamon.More ecient PAC learning of DNF with membershipqueries under the uniform distribution.In Annual Conference on Computational LearningTheory,pages 286{295,1999.[8] P.Cameron.Combinatorics:topics,techniques,algorithms.Cambridge Univ.Press,1994.[9] D.Helmbold,R.Sloan,and M.Warmuth.Learning integer lattices.SIAM Journal of Com-puting,21(2):240{266,1992.[10] J.Jackson.An ecient membership-query algorithm for learning dnf with respect to theuniform distribution.Journal of Computer and System Sciences,55:414{440,1997.[11] M.Kolountzakis,E.Markakis,and A.Mehta.Learning symmetric juntas in time no(k).InProceedings of the conference Interface entre l'analyse harmonique et la theorie des nombres,CIRM,Luminy,2005.[12] A.Kumchev.The distribution of prime numbers.manuscript,2005.[13] N.Linial,Y.Mansour,and N.Nisan.Constant depth circuits,fourier transform and learn-ability.Journal of the ACM,40(3):607{620,1993.[14] R.Lipton,E.Markakis,A.Mehta,and N.Vishnoi.On the fourier spectrum of symmetricboolean functions with applications to learning symmetric juntas.In IEEE Conference onComputational Complexity (CCC),pages 112{119,2005.[15] Y.Mansour.An o(nlog log n) learning algorithm for DNF under the uniform distribution.Jour-nal of Computer and System Sciences,50:543{550,1995.[16] E.Mossel,R.O'Donnell,and R.Servedio.Learning juntas.In STOC,pages 206{212,2003.[17] G.Polya and G.Szego.Problems and theorems in Analysis,II.Springer,1976.[18] T.Siegenthaler.Correlation-immunity of nonlinear combining functions for cryptographicapplications.IEEE Transactions on Information Theory,30(5):776{780,1984.[19] L.Valiant.A theory of the learnable.Communications of the ACM,27(11):1134{1142,1984.[20] K.Verbeurgt.Learning DNF under the uniform distribution in quasi-polynomial time.InAnnual Workshop on Computational Learning Theory,pages 314{326,1990.[21] K.Verbeurgt.Learning sub-classes of monotone DNF on the uniform distribution.InMichael M.Richter,Carl H.Smith,Rolf Wiehagen,and Thomas Zeugmann,editors,Al-gorithmic Learning Theory,9th International Conference,pages 385{399,1998.[22] J.von zur Gathen and J.Roche.Polynomials with two values.Combinatorica,17(3):345{362,1997.18