tol_MARG=1E-3; -- used for regType=3,4,5 (those involving pseudo-counts), the target probabilities are regularized with pseudo-counts once at the start of the run, unlike the adjustable regType=1,2 (L1,L2) h & J regularization, we iteratively converge the regularized P1 and P2 targets to assure marginal consistency and present a viable convergence goal

cntr_max_MARG=1000; -- maximum number of iterations in P2 marginal renormalization for consistency with P1 under regularization

Each position in the protein i is associated with a 21-element vector h_i specifying the contributions of each of the 20 amino acids plus the unknown residue (if the residue identity cannot be unambiguously determined by sequencing) to the fictitious Potts model energy, and each pair of positions i and j with a 21-by-21-element matrix J_ij specifying the contributions of pairwise epistatic interactions. To reduce computational and memory overhead, we allow the number of residues accessible at each position nRes(i) to reflect only those actually observed in the nSeq sequences comprising the MSA.

The one-body amino acid frequencies at each position are stored in a nRes(i)-element vector P1_i, and the two-body frequencies at each pair of positions in a nRes(i)-by-nRes(j)-element matrix P2_ij. The fitting procedure seeks to find the corresponding nRes(i)-element h_i vectors at each position, and nRes(i)-by-nRes(j)-element J_ij matrices at each pair of positions.

Gauge fixing

Due to the gauge invariance of the Potts Hamiltonian, we are at liberty to fix one element of each h_i vector and one row and column of each J_ij matrix. We implement this gauge fixing in the code by setting h_i[1] = 0 and J_ij[:,1]=J_ij[1,:]=0.

Newton descent

We perform multidimensional iterative Newton descent in {h_i,J_ij} to find the maximum likelihood (ML) / maximum a posteriori (MAP) estimate for the model parameters given the sequneces constituting the MSA. The ML estimate is obtained in the absence of a Bayesian prior (i.e., a maximum entropy or uniform prior), whereas the MAP estimate is achieved where an informative prior is selected (see below). By taking derivatives of the one- and two-body amino acid probabilities {P1_i,P2_ij} as a function of {h_i,J_ij}, the Potts Hamiltonian admits analytical expressions for the gradients required in the Newton search, and obviating the need for their numerical estimation.

Bayesian regularization

We stabilize the model fitting using two varieties of regularization (i) offline (modifying the target probabilities computed from the MSA) and (ii) online (stabilizing the algorithm adjusting the target probabilities as a function of h_i or J_ij). By introducing some bias into our model, we seek to reduce variance and accelerate convergence. (As such, the converged model may not precisely reproduce the {P1_i,P2_ij} computed from the MSA due to the introduction of this bias.) In practice, the strength of the regularization can be reduced to zero in a series of fitting runs each of which passes the terminal {h_i,J_ij} as the initial guess for a new run with reduced regularization coefficients.

Offline. The {P1_i} values computed from the MSA may be augmented with pseudo-counts lambda_P1/nRes(i) and the {P2_ij} values by lambda_P2/[nRes(i)*nRes(j)]. Conceptually, this process adds some (fractional) extra observations to our MSA, reflecting our belief that particular mutations or pairs of mutations may be more likely than the data would suggest. This is particularly useful in moving values in {P1_i,P2_ij} away from zero, reflecting our understanding that these mutations may not really be infinitely improbable, but just sufficiently rare not to be observed in the finite number of sequences nSeq in the MSA.

Online. We can further stabilize the fitting procedure by applying a Laplacian (L1) or Gaussian (L2) prior to the Bayesian inference procedure to penalize large magnitude values of {h_i,J_ij} and prevent them from exploding during the numerical fitting trajectory. In practice, this has the effect of modifying the target probabilities {P1_i,P2_ij} as a function of {h_i,J_ij} such that P1_i <-- P1_i + sign(h_i)*1/nSeq*lambda_h and/or P2_ij <-- P2_ij + sign(J_ij)*1/nSeq*lambda_J (L1 regularization), or P1_i <-- P1_i + 2*h_i*1/nSeq*lambda_h and/or P2_ij <-- P2_ij + 2*J_ij*1/nSeq*lambda_J (L2 regularization).