Background: Protein-protein interactions do not provide any direct information re‌garding the domains within the proteins that mediate the interactions. The majority of proteins are multi domain proteins and the interaction between them is often defined by the pairs of their domains. Most of the former studies focus only on interacting do‌main pairs. However they do not consider the interactions that require the presence of a third domain. Objectives: In this manuscript, we define the concept of necessary and sufficient triplets of domains and mediator domain. Materials and Methods: We approximate these conditions by pragmatic statistical defi‌nitions on a set of gold-standard interacting protein pairs and a set of gold-standard non-interacting protein pairs. Results: In this paper we introduce a new method for the prediction of the interaction between two domains using third domains as a mediator.we show that the mediator domain has an effective role in the interaction between proteins. Conclusions: By using these concepts, we introduce a method for the prediction of the interaction between two domains. Subsequently by evaluating the performance of our method on the yeast protein interactions data set, we show that the mediator domain has an effective role in the interaction between proteins.

1. BackgroundMore than half of eukaryotic proteins are multi-domain proteins (1). It is often assumed that the interaction between two proteins involves binding two or more specifi domains (2) or binding a domain in one protein toshort regions (approximately three to eight residues) of the other protein (3). For example, multiple domainsof Nkx3.1 are involved in contacting SRF (4). While more than two domains could be involved in mediating theinteraction of two proteins, most of the former works have been developed to identify interacting domainpairs either for the purpose of predicting or explaining the protein interactions (5-7). In particular, they have neglected the identifiation of interactions that require the presence of a third domain. Few exceptions include (8-10). However, while these works have considered domain combinations for predicting the protein interactions,they have not evaluated whether these domain combinations are required to mediate protein interactions.Recently, Hou et al., use the concept of mediate domainin yeast proteome to construct a mediate protein-proteininteraction network.2. ObjectivesIn this study we fid triplet of domains (A, B, C), suchthat the domain C has an effctive role in the interactionof the proteins X and Y containing the domains A and Brespectively. For this purpose, we emphasis on two related issues; fistly, identifying those domain pairs thatoccur frequently in interacting proteins and may not benecessary or suffient for mediating the interactions ofthese proteins and secondly, we characterize those domain triplets that are necessary and suffient for mediating the interactions of these proteins. A domain combination (a triplet of domains) is suffient for mediatingthe interaction of a pair of co-localized proteins if theyinteract whenever the domain combination is observedin them. A domain combination is necessary for mediating a pair of interacted proteins if the deletion of any domains in the domain combination, stops the interactionof those proteins.3. Materials and MethodsCharacterizing domain combinations are necessaryand suffient for mediating protein interactions. Theconditions of being “necessary” and “suffient” cannotalways be determined without additional laboratory experiments. For instance, to determine necessity, one hasto delete a domain from a pair of interacting proteins andthen test them in a laboratory whether the two proteinsstill interact or not. We approximate these conditions bypragmatic statistical defiitions.Let D (X) be the set of domains of the protein X. For theproteins X and Y and the domains A, B and C, we have:(A,B) ϵ (X,Y) if A ϵ D(X) & B ϵ D(Y) ( Figure 1 ).We defie (A,B,C) ϵ (X,Y) if (A,B) ϵ (X,Y) and C ϵ D(X) or C ϵD (Y) ( Figure 2 ).

We have (X, Y) contains (A,B) or (X,Y) contains (A,B,C) if(A,B) ϵ (X,Y) or (A,B,C) ϵ (X,Y) respectively.Let I be a set of gold-standard interacting protein pairsand NI be a set of gold-standard non-interacting proteinpairs. Let O be a domain combination (a triplet of domains). We defie I0 = {(X,Y) ϵ I | O ϵ (X,Y)} and NIo = {(X,Y)ϵ NI| O ϵ (X,Y)}.A domain combination that is observed in only onepair of interacting proteins is not easily justifid as a realmediator of the interaction. Therefore it is reasonable torestrict our attention to domain combinations that areobserved in at least k interacting protein pairs (|I0| ≥ k). Inthis manuscript we consider k ϵ {2,3,4}.Let O be a domain combinations that is observed in atleast k interacting protein pairs, we set equation (Equation 1) , which can be thought as the probability that Omediating the interaction of the proteins containingit. Since |NI|/|I| is independent of O, in our calculationwe show it by m (in the results section it is shown thatm=797.09).| NIo || NI |Odds (O) = =| Io || I | | Io || NIo || NI || I |Equation 1. The Probability that O Mediating the Interaction of theContaining ProteinsLet μP and σP be the mean and standard deviation of theodds of all domain combinations. We assume of μP as theodds expected of a random domain combination. Moreover, for a pair of proteins (X, Y) containing O, let:Pr ob (X,Y|O) = Pr ecision (O) = |I0|/(|I0 U NI0|)Which is the probability of the interaction of a pair ofproteins (X, Y) containing O. Let μR and σR be the meanand standard deviation of the precision of all domaincombinations. We would assume of μR as the precisionexpected of a random domain combination. Considertwo thresholds t1 and t2.Let U be the set of domain combinations that:(i) Are observed in at least k pairs of interacting proteins.(ii) Have odds at least t1.(iii) Have precision at least t2.In the next section we would discuss how to obtain the

thresholds t1 and t2. A domain combination O is defiedto be “suffient” for mediating the interaction of a pairof proteins (X, Y) if O is contained in (X,Y) and O is in U. Weconsider a domain combination O is “necessary” for mediating the interaction of a pair of proteins (X, Y) if theredoes not exist a domain combination O’ that is “more suffiient” (i.e., has higher precision) for mediating (X,Y)than O’. Note that O and O’ need not have any domain incommon.Thus the domain combination O is necessary and sufiient for mediating the interaction of a pair of proteins(X,Y) if and only if:O= arg max O’ ϵ U, O’ ϵ (X,Y) Pr ob ( X,Y|O’).Such a domain combination O is also the best explanation for the interaction of (X,Y); and we describe it asa necessary and suffient domain combination, “ns domain combination”, of (X,Y).We denote the set of all ns domain combinations by U’.3.1. Odds and Precision Threshold (t1 and t2.)In this section we do particular statistical approaches toobtain the threshold t1 and t2. These statistic evaluationshave been done in the data set which will be explained inthe result section. It is obvious that all the above formulasdepend on k. In the Figures 3 , 4 , 5 , 6 , 7 and 8 the distribution of Odd and precision of domain combinations fordiffrent k ϵ {2,3,4} are presented.

Table 1 reveals some of the statistical parameters of oddsand precision distributions.As the Figures 3 , 4 , 5 , 6 , 7 and 8 have shown the distribution of odds and precision are not distributed normaly. To indicate the threshold t1 and t2; let Н0(null hypothesis) be the assumption of the interaction between twoproteins containing at least one ns domain combinationand Н1(alternative hypothesis) be the assumption thatthere is no interaction between two proteins contain

ing at least one ns domain combination. We fid t1 andt2 such that the related type-I error for the mentionedhypothesis, H0 does not exceed γ. We obtain t1 and t2 bysolving P (Precision (O) ≥ t2 |H0) = γ and P (Odds(O) ≥ t1 |H0)= γ for each data set. In the present study, we consider γ =0.05, 0.1, and 0.2. A hypothesis test is considered statistically signifiant if its P-value is less than or equal to a signifiance level. In this circumstance the null hypothesisis rejected.

Typically the values of this signifiance level are considered to be 0.05 and 0.1. To test more domain combinations, we consider 0.2 as the signifiance level too. Thenumber of ns domain combination respect to and areshown in Table 2 .In the method section, we defied three laws for prediction of interaction between proteins with respect to nsdomain combinations.

3.2. PredictionLet O = (A,B,C) be ns domain combination and X, Y andZ be the three proteins. It is predicted that X and Y wouldmerit an interaction by the following laws:Law I) It is predicted X and Y have an interaction if:1. X and Y have common localization2. O = (A,B,C) ϵ (X,Y) (See Figure 9 )

Law II) It is predicted X and have an interaction if:1. X, Y and Z have common localization.2. (A,B) ϵ (X,Y) and C ϵ O (Z)3. X and Z interact and Y and Z interact (see Figure 10)Law III) It is predicted X and have an interaction if:Law I or Law II holds.4. Results4.1. DatasetThe dataset DIP containing yeast protein interaction(http://dip.doe-mbi.ucla.edu/dip/Downlod.cgi) has beenused. This dataset contains 4928 proteins and 17451 interactions and there are 3593 various domains in these proteins. In order to fiding domains following address hasbeen used: http://dip.doe-mbi.ucla.edu/dip/servises.cgi.In this data set there are two diffrent tyos of domains:1) Domains that are obtained experimentally. The number of these domains in this data set is 1077 (prefi ofpfam codes of these domains is PF).2) Domains that are obtained by automatic methods.The number of these domains in this data set is 2516 (prefi of pfam codes of these domains is PB).We derive a reliable subset I from this dataset by including only those interactions that the two interacting proteins (i) have common localization; and (ii) a commonpartner. The localization of a protein is the location ofthe protein in the cell. This information can be obtainedfrom Gen Ontology database which is available at www.genontology.org. Each of these conditions were highly associated with reliable interactions (12).

Therefore we consider the set I as the gold standard interaction protein pairs. The resulting subset I has 6955 interactions. Subsequently a set NI from those protein pairsthat are assumed to be non-interacting with a high probability as follows had been constructed. A pair of interacting proteins that (i) do not have a common localization;and (ii) do not have a common partner, has been derived.As these protein pairs violate all the key conditions associated with reliable interactions (12), it is believed that NIis a gold standard of non-interacting proteins. The constructed set NI have 5543821 protein pairs.Therefore in the above calculations m= |NI|/ |I| = 797.09.4.2. Evaluation of PredictionTo evaluate the performance of our prediction twomeasurements had been expressed; precision and recallwhich are defied by:Recall = TP/|W|Precision = TP/ (TP+FP)With respect to the refied data set (sets I and NI), it hasbeen expressed that:W = ITP = the number of predicted edges that are in I.FP = the number of predicted edges that are in NI.And with respect to the primary data set, we defie:W = the primary data set.TP = the number of predicted edges that exist in the dataset.FP = the number of predicted edges that do not exist inthe data set.In Tables 3 and 4 the results of prediction with respectto the refied and primary data sets using the three different laws had been described.

On the other hand, by our laws, we predict interactionbetween a pair of proteins (X,Y) if they contain at leastone pair of domain (A,B) which is contained in at leastone ns domain combination. There are numerous interactions in the data set that do not contain any such pairof domains. Therefore it is expected that recall is nothigh. The best recall is obtained when we consider LawIII, γ = 0.1 and k = 3 with respect to both data sets. In thenext section we reveal the effctiveness of the mediatordomain in the interaction between two proteins.4.3. Mediator DomainsWe estimate the effctiveness of the mediator domain Cin the interaction of the two proteins X and Y that contain

(A, B), in each ns domain combination O = (A,B,C).Considering protein pairs that contain (A, B), the recalland the precision is obtained by:Recall = TP/|W|Precision = TP/ (TP+FP)With respect to the refied data set (sets I and NI), wedefie:W = {(X,Y) ϵ I| Ǝ O = (A,B,C) ϵ U’ s.t (A,B) ϵ (X,Y)}TP = the number of predicted edges that are in I.FP = the number of predicted edges that are in NI.NW = {(X,Y) ϵ NI| Ǝ O = (A,B,C) ϵ U’ s.t (A,B) ϵ (X,Y)}And with respect to the primary data set, we defie:NW = {(X,Y) | Ǝ O = (A,B,C) ϵ U’ s.t (A,B) ϵ (X,Y) and (X,Y) areint eracted }TP = the number of predicted edges that are in data set.FP = the number of predicted edges that are not in dataset.

NW = {(X,Y) | Ǝ O = (A,B,C) ϵ U’ s.t (A,B) ϵ (X,Y) and (X,Y) arenot int eracted }In Tables 5 and 6 it has been revealed that the mediatordomain C has an appropriate effctive role in the interaction between the two proteins X and Y that contain (A,B),in each ns domain combination O = (A,B,C). For exampleaccording to Table 5 , under the case of Law III, γ = 0.05, k= 4, the precision is 0.9691 while the ratio |W|/ (|W|+|NW|)= 0.1297. This means that the incorporation of C into LawIII has improved the precision by 0.0901/0.1297 = 7.4719fold. That is, assuming the absence of errors in the datasets, a pair of proteins exhibiting a ns domain combina

tion (A, B, C) is, on average, 7.4719 times more likely to interact than a pair of proteins exhibiting the domain pair(A,B). It means that the domains X and Y ((A,B) ϵ (X,Y)) canbe applicable for the interaction between proteins X andY (A,B) ϵ (X,Y) if:C is in D(X) U D(Y)or- There is a protein Z such that, C ϵ D (Z), (Y, Z) and (X, Z)interact.Therefore C has been named the “mediator domain” forA and B. According to the above results, we predicted theinteraction between some pair of proteins, using ns domain combination and mediator domain in a good manner.

5. DiscussionIn the present manuscript a method for the predictionof the protein interaction using ns domain combinationand mediator domain is presented. It is revealed that themediator domains have an effctive role in the predictionprotein interactions. Using ns domain combinations andmediator domains, we have predicted high reliable interactions. That is, a pair of proteins exhibiting a ns domaincombination (A, B, C) is more likely to interact than a pairof proteins exhibiting the domain pair (A, B).AcknowledgementsI would like to thank the department of research affirsof Shahid Beheshti University.Authors’ ContributionThe whole manuscript has been conducted by C. Eslahchi.Financial DisclosureNone declared.Funding/ SupportUniversity of Shahid Beheshti and IPM.