Σχόλια 0

Το κείμενο του εγγράφου

Grouping of residues based on their contact interactionsJun Wang and Wei Wang*National Laboratory of Solid State Microstructure and Department of Physics,Nanjing University,Nanjing 210093,China~Received 5 December 2001;published 28 March 2002!Based on the concept of energy landscape a grouping method of residues for reducing the sequence com-plexity in proteins is presented.For the Miyazawa and Jernigan matrix,rational groupings of 20 kinds ofresidues with minimal mismatches,under the consideration of local minima and statistics on correlationbetween the residues,are studied.Ahierarchical tree of groupings relating to different numbers of groups N isobtained,and a plateau around N58±10 is found,which may represent the basic degree of freedom of thesequence complexity in proteins.DOI:10.1103/PhysRevE.65.041911 PACS number ~s!:87.10.1eUsing a small set of amino acid residues to reduce thesequence complexity in proteins,i.e.,reducing the naturallyoccurring 20 kinds of residues into several kinds,has beenstudied @1±3#.Some patterns of residues were discovered inthe reconstruction of secondary structures,such as binarypatterns in ahelices and helix bundles @2#~see review @4#,and references therein!.These imply that the hydrophobiccores,the native structures and the rapid folding behaviors ofproteins can be realized by some simpli®ed alphabets of resi-dues.Theoretically,the simplest reduction,the so-called H-Pmodel including H group with hydrophobic residues and Pgroup with polar residues,has been extensively used.Yet,therelation between different forms or levels of these reductions~such as the ®ve-letter palette@3#,or different H-P groupings@5,6#!relating to the original sequences is not generally es-tablished.To ®nd out the physical origin of these reductionsis of importance for the protein representation.Based on the Miyazawa and Jernigan ~MJ!matrix of con-tact potentials of residues @7#,reductions by dividing resi-dues into different groups are made in our previous paper@8#.Several simpli®ed schemes from minimized mismatchesbetween reduced interaction matrix and the original MJ oneare found.However,the physical picture of the mismatch isnot well clari®ed,and the physical reasons for the groupingof residues need to be further studied.It is also important tomake a comparison between the grouping results of differentinteraction matrices,and to study the generality of our sim-pli®cation method.The goal of this paper is in these aspects.In this paper,a general picture and simpli®ed formula ofmismatch,based on the concept of energy landscape,arepresented.Some rational groupings are obtained.Statisticson correlation between the residues reveal that some residuestend to aggregate together or are friends to live in the samegroup.A plateau of mismatch around group numberN58±10 for three different interaction matrices is found,implying that groupings with N58±10 may provide a ratio-nal reduction for the complexity of protein sequences.Thiscoincides with a fact that proteins generally include morethan seven types of residues @4#.To divide 20 types of residues into a number of groups,the basic principle may be that the residues in a group shouldbe similar in their physical aspects,mainly the interactions.After grouping,the residues in a group could be representedby one of the residues from the group,thus the complexity ofprotein sequences is reduced.When a residue is replaced byanother,the energy landscape of a protein @9#should notchange its main feature ~the shape!or the folding features arebasically the same.This is the case,especially when the sys-tem is near the bottom of the funnel where a protein has themost compact conformations.The energy difference betweentwo nearby conformations ( c1) and (c2) is de®ned asDE5(n@en(c1)~si,sj!2en(c2)~sk,sl!#,~1!where en(c1)(si,sj)  or en(c2)(sk,sl) is the contact energy ofthe nth contact between two residues siand sj~or skand sl)in c1 ~or in c2),side®nes the residue type of the ithelement in the protein sequence,and the number of contactsin two conformations are assumed to be the same.To keepthe main feature of the energy landscape means that DEshould not change its sign,i.e.,sgn@DEnew#5sgn@DEold#,~2!when a residue sg(g5i,j,k,or l) is substituted by one of its``friends''sg8in the same group.Here DEoldand DEnewarethe energy differences of the original sequence and its sub-stitute,and sgn@X#51,0,or 21 for X.0,X50,or X,0.Any discrepancy of Eq.~2!may change the energy land-scape,and a quantity``mismatch''is used to characterize thediscrepancy.Thus,the mismatch acts as a quantitative non-®tness of substitutions of residues.In detail,20 kinds of residues are partitioned into Ngroups as G1,...,GNwith n1residues in group G1,n2inG2and so on,where n11n21...1nN520.For a givengroup number N,different values of nigive different``sets''(n1,n2,...,nN) of the partition,e.g.,two sets (8,3,2,2,5)and (8,3,2,1,6) for N55.@Actually,the``sets''relate to thepartition of the number 20 into N groups,and the number ofthe sets LNis 1,10,33,64,84,90,82,70,54,42,30,22,15,11,7,5,3,2,1,1 for N from N51 to 20,respectively.#Thegroup assembly for a certain value of N could be representedas GN5$$GK(l)(N),K51,N%,l51,LN% where GK(l)(N) meansthe Kth group in the lth set among LN.For a given set,*Email address:wangwei@nju.edu.cnPHYSICAL REVIEW E,VOLUME 65,0419111063-651X/2002/65~4!/041911~5!/$20.00 2002 The American Physical Society65 041911-1different arrangements of residues in the groups representdifferent``distributions''of the residues,such as residue E inG1or in G2.The mismatch will be minimized if the intra-group residues are friends for each group.@Residues that arenot aggregated together ®nally in a group are not friends.#Due to the arbitrariness of contact index in DE and variouspossible distributions of residues,we de®ne a strong require-ment for a successful grouping:no change of the sign of eachterm in DE,i.e.,l(sisjsksl)[sgn@e(si,sj)2e(sk,sl)#equals to l(si8sjsksl)[sgn@e(si8,sj)2e(sk,sl)#,when siissubstituted by one of its friends si8.Here si,sj,sk,or slbelongs to groups Ga,Gb,Gg,or Gnwith a,b,g,nP1,2,...,N,respectively.Generally,when a residue is sub-stituted by another residue ~friend or nonfriend!from thesame group,one always has l(si8sjsksl)51 or 0 or 21.Then,all possible substitutions give a sum of related valuesof l,i.e.,Labgn5(i jkll(sisjsksl),which describes the totaleffects of substitutions of the residues from four groups Ga,Gb,Gg,and Gn.If l(si8sjsksl) is not the same as sgn@L#,the substitution si!si8is not favorable or the grouping of siand si8in a group is a mismatch one.The average overallgroups and residues gives out the total mismatch of this dis-tributionMab5(abgn(i jkl$12dl~sisjsksl!,sgn@Labgn#%/(abgn(i jkl1,~3!where the summation runs overall possible combinations ofa,b,g,and nand the index i runs overall residues in groupGaand so on,and the dfunction is de®ned as d(U,V)51when U5V,0 otherwise.For sgn@L#50,only the casesl(sisjsksl).0 are counted to avoid double counting.Among all distributions of a ®xed set (n1,n2,...,nN),the best distribution ~or the best arrangement of the residues!gives a minimal mismatch among all Mab,i.e.,Mabmin.Thus,for this set,one obtains Mabminand the related distri-bution of residues in every group.To ®nd out Mabmin,aMonte Carlo minimization procedure is used,where a lessvalue of Mabis obtained after every randomexchange of tworesidues between two groups is accepted with a Metropolisprobability min@1,exp(2DMab/T)#.Here DMabis thechange of the mismatches and T50.1 is an arti®cial``tem-perature.''An enumeration overall possible distributions ofresidues can also be made for small N.For each N,all mini-mal mismatches Mabminof LNsets can then be obtained.Inprinciple,for each N we could choose the lowest Mabminandthe related grouping as the ®nal result among all sets LN.However,this is dif®cult for those sets with MGWSE orgroups with singlets.For example,as shown in Fig.1 themismatch of set (1,19) is the lowest one among all ten sets~also the set (1,1,1,1,16) for N55,and so on,see Fig.5!.Obviously,this kind of mismatches does not relate to the bestor rational groupings of the residues.Therefore,we mustconsider a local minimum ~or a plateau!among all sets as therational global minimum Mg~see Fig.1!.Such a``locality''is motivated from the similarity between two groupings.Twogroupings are regarded as a couple of neighbors when theycan transform to each other just by exchanging two residuesbetween two groups or by moving one residue from onegroup to another.With this,all local minima ~or plateaus!areidenti®ed.Figure 1 shows such a local minimum ~or a pla-teau!besides those with MGWSE.These local minima andplateaus represent better groupings,and re¯ect some intrinsicaf®nity between the residues.As a result,they are taken asthe corresponding rational groupings with mismatches Mg.The aggregation of some friendly residues into a groupresults from the correlation between these residues.Let usconsider a two-residue correlation by counting the number ofgroups that include residues siand sj,i.e.,C~si,sj!5(K51N(l51LNIsi,GK(l)~N!Isj,GK(l)~N!,~4!where I(s,G)51 when sPG,or zero when s¹G.Clearly,C(si,sj) is a quantitative scale of the af®nity between tworesidues,or a probability of two residues being in a samegroup.It is worth noting that a weight average for groupswith different mismatches is possible.For example,a prob-ability with a Boltzmann-like distribution biased toward thesmall mismatches could be used.This might change the pref-erence of the residues in some degree,but not largely.As wediscuss the differences between different groups,the variousde®nitions will not change the picture.Here we only discussFIG.1.Mabminof different sets for N52 ~a!and N53 ~b!.Theset index represents the sets marked in the ®gure.JUN WANG AND WEI WANG PHYSICAL REVIEW E 65 041911041911-2the simple average with an equal weight.For all groups GNwith minimal mismatch Mabmin,it is found that the countsof some residue pairs are much large than those of otherpairs ~see Fig.2!.This means that some residues are friendsand some are not,re¯ecting effective``attraction''betweenthe residues in a group and``repulsion''between residues indifferent groups.Note that for the groupings with different N,we have similar patterns.The probability for ®nding a certaingroup G with speci®ed residues among all minimal mis-match groups GNcan also be obtained by a countC8~G!5(K51N(lLNd@G,GK(l)~N!#,~5!where d(G,G8) is a d function.As expected,differentgroups have different chances to appear ~see Fig.3!.Thesedifferences result from not only the grouping af®nity be-tween residues but also the preference for the groups with acertain size.For comparison,the count C8(G) is normalizedby the total number of groups with the same size of group Gin the group assemble GN.This normalized count is taken asa probability of the occurrence of group G,i.e.,P~G!5C8~G!/(K51N(lLNdS~G!,S@GK(l)~N!#,~6!where S(G) de®nes the number of residues in group G,andd(S1,S2) is also a dfunction.From Fig.3,it is found thatsome groups have large probabilities P(G) and appear manytimes with large number of the counts C8(G),implying thatthe residues in these groups have more chances to be in agroup or that these groups have strong preference to appearin grouping.Thus,the grouping with these groups shows abetter settlement of 20 kinds of residues than others.Notethat some groups with large probabilities P(G),but smallcounts C8(G),are removed in our analysis because of lack-ing the statistical reliability.These correlation statistics areused in the grouping,especially in the selection of the bestgrouping among some competitive candidates.With the method and requirements mentioned above,thereduction can be settled.For the MJ matrix,the groupingsfollow a hierarchically treelike structure ~see Fig.4!.That is,20 kinds of residues are ®rstly divided into two groups,i.e.,an H group with residues ( C,M,F,I,L,V,W,Y) and a Pgroup with residues ( A,G,T,S,N,Q,D,E,H,R,K,P).Thenthese two groups are alternatively divided into two or moregroups relating to different N,re¯ecting the detailed differ-ences between the interactions of the H and P groups.In thecase of N53,to divide the P group ~on the base of N52) isobviously more rational than to divide the H group,suggest-ing a priority for dividing the P group ®rst.Differently,forN54,we should divide the H group ®rst,and for N55FIG.2.A two-residue correlation C(si,sj) for the MJ matrix.Different shades of gray represent different values of the countC(si,sj) among all 8435 groups for N55.FIG.3.Probabilities P(G) and the counts C8(G) for N55 ofthe MJ matrix.The group index is arranged following the magni-tude of the probability of the groups.Some groups are labeled.FIG.4.The rational groupings of a hierarchically treelike struc-ture for the MJ matrix for N up to 9.GROUPING OF RESIDUES BASED ON THEIR CONTACT...PHYSICAL REVIEW E 65 041911041911-3divide the P group again.For example,in the case of N55,the H group is divided into ( F,I,L) and (C,M,V,W,Y),andthe P group is divided into ( A,H,T),(D,E,K),and(G,S,N,Q,R,P).Similar results are obtained for N up to 9with a sequential order of hydrophobicity without any over-lap between the hydrophobic branch and the hydrophilic onefollowing the H/P dividing.This relates to a clear picture ofthe rational groupings.The difference between the presentstudy and previous one in Ref.@8#is that there are alternantdividings of the H and P groups in the new groupings,whichgives out a little decreasing in the mismatches,and alsoslightly different representative residues.The former resultsunder some restrictions,such as to ®x the H group ~witheight residues!unchanged,may relate to somewhat roughdividing and the grouping space for searching the local mini-mal is a little bit limited.Figure 5 shows a decrease in the mismatch as the groupnumber N increases,implying,in general,the more groupsthe better.However,there is a plateau near N58 ~case A),which characterizes the saturation of the grouping.Thismeans that more groups will not further decrease the mis-match or more groups might not greatly enhance the ef®-ciency of the complexity reduction.Thus,the number N58 may indicate the minimal number of residue types toreconstruct the natural proteins,or a basic degree of freedomof the complexity for protein representation.This,in a sense,relates well to the argument in Ref.@4#.Noted that theformer plateau at N55 ceases due to the canceling of thegrouping restriction.Interestingly,in Fig.5,we also plot allthe lowest mismatches relating to the groupings with MG-WSE,which generally are not the local minima.An exampleis the grouping with groups (1,1,1,1,16),which has the low-est mismatch among all sets of N55.However,it is notedthat even including all these trivial groups,the curve stillshows a plateau around N59 with eight groups with singleresidue of C,M,F,I,L,V,W,Y and one group with the resttwelve residues as well @see case C in Fig.5~a!#.Clearly,thisplateau relates again to the saturation of the H or P groupingor the detailed differences between the interactions of theresidues,and also gives out a support on the discussion forthe N58 plateau above.In addition,similar results for twoother interaction matrices @6,10#are also obtained @see Fig.5~b!#.To see the plateaus more clearly,we derive the gradient ofmismatch Mgfrom N groups to N11 groups for above ra-tional cases.Here,the gradient gN,N11is de®ned as gN,N115uMg(N11)2Mg(N)u.It is obvious that there are minimaof gradient gN,N11vs N,implying a small variation of mis-match as the group number N increases.These minima maycorrespond to plateaus or shoulders of the curve of the mis-match vs group number.For our results,the values of gradi-ent gN,N11of different datasets of contact potentials basi-cally are minimal around N55 ~gray region I in Fig.6!,which correspond shoulders around N55,and also are mini-mal around N58 ~gray region II in Fig.6!,which relate toplateaus about N58 ~see Fig.5!.That is to say,the contactpotentials of different sources all favor the eight-type group-ing.Such an independence of detailed forms of interactionssuggests that the grouping with eight-type residues might bea common feature of residues in the protein systems.It is worth noting that for each N the representative resi-dues have been found for the MJ matrix @11#,e.g.,(I,A,D)for N53,(I,A,C,D) for N54 and (I,A,G,E,C) for N55.These residues are selected based on the rational group-ings by minimizing the mismatch among all other choices.The foldability of the reduced sequences and the effective-ness of the reduced alphabet have also been studied.Allthese details will be reported elsewhere.FIG.5.Mgvs N:~a!for the MJ matrix;~b!for contact potentialsin Ref.@6#~TD case!and in Ref.@10#~SWcase!.The plateaus areshown for different cases.FIG.6.The gradient gN,N11vs group number N for ~a!MJ case,~b!SWcase and ~c!TD case related to the rational considerations inFig.5,respectively.The grey regions highlight the common minimaof gN,N11.JUN WANG AND WEI WANG PHYSICAL REVIEW E 65 041911041911-4Finally,as a remark,we note that we use the pair-wisecontact potentials as the starting point of our approach.Ac-tually,the effective interactions between residues in foldingprocesses are of many body due to their complicated inter-play with solvent.The pair-wise interactions between theresidues are the average ones under some approximations,and are believed possessing the basic ingredients of the driv-ing forces in the folding in general @5±8#.Recently,it ispointed out that the many-body effect may have their impor-tant roles for the recognition of the correct folds and thethermodynamics and kinetics of the folding processes @12±19#.To consider the many-body effect would be appealingfor the grouping problem.Generally,the preferences be-tween some certain residues may be enhanced,while somefragile connection between residues might be broken due tothe competition of the many-body perturbation.However,thebasic pattern of residue grouping will be maintained thoughthe relation between some residues may become vague andcomplex.The detailed schemes deserve further investigation.In conclusion,we have shown a grouping method of resi-dues based on a requirement that the energy landscapeshould be basically kept in reduction.A quantity,the mis-match,is taken as the measurement of the reduction.Ourresults imply that the residues do have some similarities intheir interaction properties and can be put together intogroups.By choosing a residue for each group,the complex-ity of proteins can be reduced or the proteins can be repre-sented with reduced compositions.Especially,a basic degreeof freedom of the complexity with 8±10 types of residues isfound.This work was supported by the Foundation of NNSF~Nos.10074030,90103031,and 10021001!and the Nonlin-ear Project ~973!of the NSM.J.W.thanks the Ke-Li Re-search Foundation.We thank C.Tang,C.H.Lee,and H.S.Chan for comments and suggestions.@1#K.A.Dill,Biochemistry 29,7133 ~1990!;H.S.Chan and K.A.Dill,Macromolecules 22,4559 ~1989!;H.Li et al.,Science273,666 ~1996!;E.D.Nelson and J.N.Onuchic,Proc.Natl.Acad.Sci.U.S.A.95,10 682 ~1998!;G.Tiana,R.A.Broglia,and E.I.Shakhnovich,Proteins:Struct.,Funct.,Genet.39,244~2000!;H.S.Chan and K.A.Dill,ibid.30,2 ~1998!;R.A.Goldstein et al.,Proc.Natl.Acad.Sci.U.S.A.89,9029 ~1992!;P.G.Wolynes,Nat.Struct.Biol.4,871 ~1997!;L.R.Murphyet al.,Protein Eng.13,149 ~2000!.@2#L.Regan and W.F.Degrado,Science 241,976 ~1988!;S.Kamteker et al.,Science 262,1680 ~1993!;A.R.Davidsonet al.,Nat.Struct.Biol.2,856 ~1995!.@3#D.S.Riddle et al.,Nat.Struct.Biol.4,805 ~1997!.@4#K.W.Plaxco et al.,Curr.Opin.Struct.Biol.8,80 ~1998!.@5#H.Li et al.,Phys.Rev.Lett.79,765 ~1997!.@6#P.D.Thomas and K.A.Dill,Proc.Natl.Acad.Sci.U.S.A.93,11 628 ~1996!.@7#S.Miyazawa and R.J.Jernigan,J.Mol.Biol.256,623 ~1996!.@8#J.Wang and W.Wang,Nat.Struct.Biol.6,1033 ~1999!.@9#P.G.Wolynes et al.,Science 267,1619 ~1995!.@10#B.Shoemaker and P.G.Wolynes,J.Mol.Biol.287,657~1999!.@11#It is found that the percentage of overlap of the representativeresidues is larger than 75% for N>3 for three interaction ma-trices used in this paper.@12#K.A.Dill,J.Biol.Chem.272,701 ~1997!.@13#M.Vendruscolo,R.Najmanovich,and E.Domany,Proteins:Struct.,Funct.,Genet.38,134 ~2000!.@14#H.S.Chan,Proteins:Struct.,Funct.,Genet.40,543 ~2000!.@15#H.Kaya and H.S.Chan,Proteins:Struct.,Funct.,Genet.40,637 ~2000!.@16#H.Kaya and H.S.Chan,Phys.Rev.Lett.85,4823 ~2000!.@17#S.Takada,Z.Luthey-Schulten,and P.G.Wolynes,J.Chem.Phys.110,11 616 ~2000!.@18#C.J.Camacho and D.Thirumalai,Proc.Natl.Acad.Sci.U.S.A.90,6369 ~1993!.@19#K.Fan,J.Wang,and W.Wang,Phys.Rev.E 64,041 907~2001!.GROUPING OF RESIDUES BASED ON THEIR CONTACT...PHYSICAL REVIEW E 65 041911041911-5