We study the performance of stochastic local search algorithms for random instances of the K-satisfiability (K-SAT) problem. We present a stochastic local search algorithm, ChainSAT, which moves in the energy landscape of a problem instance by never going upwards in energy. ChainSAT is a focused algorithm in the sense that it focuses on variables occurring in unsatisfied clauses. We show by extensive numerical investigations that ChainSAT and other focused algorithms solve large K-SAT instances almost surely in linear time, up to high clause-to-variable ratios a; for example, for K = 4 we observe linear-time performance well beyond the recently postulated clustering and condensation transitions in the solution space. The performance of ChainSAT is a surprise given that by design the algorithm gets trapped into the first local energy minimum it encounters, yet no such minima are encountered. We also study the geometry of the solution space as accessed by stochastic local search algorithms.

Generally, there is a trade-off between methods of gene expression analysis that are precise but labor-intensive, e.g. RT-PCR, and methods that scale up to global coverage but are not quite as quantitative, e.g. microarrays. In the present paper, we show how how a known method of gene expression profiling (K. Kato, Nucleic Acids Res. 23, 3685-3690 (1995)), which relies on a fairly small number of steps, can be turned into a global gene expression measurement by advanced data post-processing, with potentially little loss of accuracy. Post-processing here entails solving an ancillary combinatorial optimization problem. Validation is performed on in silico experiments generated from the FANTOM data base of full-length mouse cDNA. We present two variants of the method. One uses state-of-the-art commercial software for solving problems of this kind, the other a code developed by us specifically for this purpose, released in the public domain under GPL license.

We study the behavior of a heuristic for solving random satisfiability problems by stochastic local search near the satisfiability threshold. The heuristic for average satisfiability (ASAT), is similar to the Focused Metropolis Search heuristic, and shares the property of being focused, i.e., only variables in unsatisfied clauses are updated in each step. It is significantly simpler than the benchmark WALKSAT heuristic. We show that ASAT solves instances as large as N=10(6) in linear time, on average, up to a ratio of 4.21 clauses per variable in random three-satisfiability. For K higher than 3, ASAT appears to solve instances of K-satisfiability up to the Montanari-Ricci-Tersenghi-Parisi full replica symmetry breaking (FSRB) threshold denoted alpha(s)(K) in linear time.

We study numerically the solution space structure of random 3-SAT problems close to the SAT/UNSAT transition. This is done by considering chains of satisfiability problems, where clauses are added sequentially to a problem instance. Using the overlap measure of similarity between different solutions found on the same problem instance, we examine geometrical changes as a function of α. In each chain, the overlap distribution is first smooth, but then develops a tiered structure, indicating that the solutions are found in well separated clusters. On chains of not too large instances, all remaining solutions are eventually observed to be found in only one small cluster before vanishing. This condensation transition point is estimated by finite size scaling to be αc = 4.26 with an apparent critical exponent of about 1.7. The average overlap value is also observed to increase with α up to the transition, indicating a reduction in solutions space size, in accordance with theoretical predictions. The solutions are generated by a local heuristic, ASAT, and compared to those found by the Survey Propagation algorithm up to αc.

This paper is about quantum heat defined as the change in energy of a bath during a process. The presentation takes into account recent developments in classical strong-coupling thermodynamics and addresses a version of quantum heat that satisfies quantum-classical correspondence. The characteristic function and the full counting statistics of quantum heat are shown to be formally similar. The paper further shows that the method can be extended to more than one bath, e.g., two baths at different temperatures, which opens up the prospect of studying correlations and heat flow. The paper extends earlier results on the expected quantum heat in the setting of one bath [E. Aurell and R. Eichhorn, New .J Phys. 17, 065007 (2015); E. Aurell, Entropy 19, 595 (2017)].

The operation of a quantum computer is considered as a general quantum operation on a mixed state on many qubits followed by a measurement. The general quantum operation is further represented as a Feynman-Vernon double path integral over the histories of the qubits and of an environment, and afterward tracing out the environment. The qubit histories are taken to be paths on the two-sphere as in Klauder's coherent-state path integral of spin, and the environment is assumed to consist of harmonic oscillators initially in thermal equilibrium, and linearly coupled to to qubit operators . The environment can then be integrated out to give a Feynman-Vernon influence action coupling the forward and backward histories of the qubits. This representation allows to derive in a simple way estimates that the total error of operation of a quantum computer without error correction scales linearly with the number of qubits and the time of operation. It also allows to discuss Kitaev's toric code interacting with an environment in the same manner.

This paper aims to introduce Distributed Systems as a field where the ideas and methods of physics can potentially be applied, and to provide entry points to a wide literature. The contributions of Leslie Lamport, inspired by Relativity Theory, and of Edsger Dijkstra, which has the flavor of a growth process, are discussed at some length. The intent of the author is primarily to stimulate interest in the statistical physics community, and the discussions are therefore framed in a non-technical language; the author apologizes in advance to readers from the computer science side for the unavoidable impreciseness and ambiguities.

Strong-coupling statistical thermodynamics is formulated as the Hamiltonian dynamics of an observed system interacting with another unobserved system (a bath). It is shown that the entropy production functional of stochastic thermodynamics, defined as the log ratio of forward and backward system path probabilities, is in a one-to-one relation with the log ratios of the joint initial conditions of the system and the bath. A version of strong-coupling statistical thermodynamics where the system-bath interaction vanishes at the beginning and at the end of a process is, as is also weak-coupling stochastic thermodynamics, related to the bath initially in equilibrium by itself. The heat is then the change of bath energy over the process, and it is discussed when this heat is a functional of the system history alone. The version of strong-coupling statistical thermodynamics introduced by Seifert and Jarzynski is related to the bath initially in conditional equilibrium with respect to the system. This leads to heat as another functional of the system history which needs to be determined by thermodynamic integration. The log ratio of forward and backward system path probabilities in a stochastic process is finally related to log ratios of the initial conditions of a combined system and bath. It is shown that the entropy production formulas of stochastic processes under a general class of time reversals are given by the differences of bath energies in a larger underlying Hamiltonian system. The paper highlights the centrality of time reversal in stochastic thermodynamics, also in the case of strong coupling.

We introduce a criterion how to price derivatives in incomplete markets, based on the theory of growth optimal strategy in repeated multiplicative games. We present reasons why these growth-optimal strategies should be particularly relevant to the problem of pricing derivatives. tinder the assumptions of no trading costs, and no restrictions on lending, we find an appropriate equivalent martingale measure that prices the underlying and the derivative security. We compare our result with other alternative pricing procedures in the literature, and discuss the limits of validity of the lognormal approximation. We also generalize the pricing method to a market with correlated stocks. The expected estimation error of the optimal investment fraction is derived in a closed form, and its validity is checked with a small-scale empirical test.

A particle with internal unobserved states diffusing in a force field will generally display effective advection-diffusion. The drift velocity is proportional to the mobility averaged over the internal states, or effective mobility, while the effective diffusion has two terms. One is of the equilibrium type and satisfies an Einstein relation with the effective mobility while the other is quadratic in the applied force. In this contribution we present two new methods to obtain these results, on the one hand using large deviation techniques and on the other by a multiple-scale analysis, and compare the two. We consider both systems with discrete internal states and continuous internal states. We show that the auxiliary equations in the multiple-scale analysis can also be derived in second-order perturbation theory in a large deviation theory of a generating function (discrete internal states) or generating functional (continuous internal states). We discuss that measuring the two components of the effective diffusion give a way to determine kinetic rates from only first and second moments of the displacement in steady state.

We calculate the effective long-term convective velocity and dispersive motion of an ellipsoidal Brownian particle in three dimensions when it is subjected to a constant external force. This long-term motion results as a "net" average behavior from the particle rotation and translation on short time scales. Accordingly, we apply a systematic multi-scale technique to derive the effective equations of motion valid on long times. We verify our theoretical results by comparing them to numerical simulations.

In the absence of RecA-mediated cleavage of the repressor, the X prophage is exceptionally stable. We. develop a stochastic model that predicts the stability of such epigenetic states from affinities of the molecular components. We find that the stability, in particular, depends on the maximum possible cI protein production. and on the number of translated cro proteins per transcribed mRNA. We apply the model to the behavior of recently published mutants of O-R and find, in particular, that a mutant that overexpress cro behaves in a different way than what was predicted, thus suggesting that the current view of the O-R switch is incomplete. The approach described here should be generally applicable to the stability of expressed states.

We present a new method to close the Master Equation representing the continuous time dynamics of Ising interacting spins. The method makes use of the the theory of Random Point Processes to derive a master equation for local conditional probabilities. We analytically test our solution studying two known cases, the dynamics of the mean field ferromagnet and the dynamics of the one dimensional Ising system. We then present numerical results comparing our predictions with Monte Carlo simulations in three different models on random graphs with finite connectivity: the Ising ferromagnet, the Random Field Ising model, and the Viana-Bray spin-glass model.

The purpose of this note is to point out analogies between causal analysis in statistics and the correlation-response theory in statistical physics. It is further shown that for some systems the dynamic cavity offers a way to compute the stationary state of a non-equilibrium process effectively, which could then be taken an alternative starting point of causal analysis.

We introduce an alternative solution to Glauber multispin dynamics on random graphs. The solution is based on the recently introduced cavity master equation (CME), a time-closure turning the, in principle, exact dynamic cavity method into a practical method of analysis and of fast simulation. Running CME once is of comparable computational complexity as one Monte Carlo run on the same problem. We show that CME correctly models the ferromagnetic p-spin Glauber dynamics from high temperatures down to and below the spinoidal transition. We also show that CME allows an alternative exploration of the low-temperature spin-glass phase of the model.

The change of the von Neumann entropy of a set of harmonic oscillators initially in thermal equilibrium and interacting linearly with an externally driven quantum system is computed by adapting the Feynman-Vernon influence functional formalism. This quantum entropy production has the form of the expectation value of three functionals of the forward and backward paths describing the system history in the Feynman-Vernon theory. In the classical limit of Kramers-Langevin dynamics (Caldeira-Leggett model) these functionals combine to three terms, where the first is the entropy production functional of stochastic thermodynamics, the classical work done by the system on the environment in units of k(B)T, and the second and the third other functionals which have no analogue in stochastic thermodynamics.

We show that a method based on logistic regression, using all the data, solves the inverse Ising problem far better than mean-field calculations relying only on sample pairwise correlation functions, while still computationally feasible for hundreds of nodes. The largest improvement in reconstruction occurs for strong interactions. Using two examples, a diluted Sherrington-Kirkpatrick model and a two-dimensional lattice, we also show that interaction topologies can be recovered from few samples with good accuracy and that the use of l(1) regularization is beneficial in this process, pushing inference abilities further into low-temperature regimes.

Is it possible to treat large scale distributed systems as physical systems? The importance of that question stems from the fact that the behavior of many P2P systems is very complex to analyze analytically, and simulation of scales of interest can be prohibitive. In Physics, however, one is accustomed to reasoning about large systems. The limit of very large systems may actually simplify the analysis. As a first example, we here analyze the effect of the density of populated nodes in an identifier space in a P2P system. We show that while the average path length is approximately given by a function of the number of populated nodes, there is a systematic effect which depends on the density. In other words, the dependence is both on the number of address nodes and the number of populated nodes, but only through their ratio. Interestingly, this effect is negative for finite densities, showing that an amount of randomness somewhat shortens average path length.

The evolution of a planar perturbation in a Einstein-de Sitter Universe is studied using a previously introduced Lagrangian scheme. An approximate discrete dynamical system is derived, which describes the mass agglomeration process. Quantitative predictions for the late-time mean density profile are obtained therefrom, and validated by numerical simulations. A simple but important result is that the characteristic scale of a mass agglomeration is an increasing function of cosmological time t. For one kind of initial conditions we further find a scaling regime for the density profile of a collapsing object. These results are compared with analogous investigations for the adhesion model (Burgers equation with positive viscosity). We further study the mutual motion of two mass agglomerations, and show that they oscillate around each other for long times, like two heavy particles. Individual particles in the two agglomerations do not mix effectively on the time scale of the inter-agglomeration motion.

The dynamics of a 1D self-gravitating medium with initial density almost uniform is studied. Numerical experiments are performed with ordered and with Gaussian random initial conditions. The phase space portraits art shown to be qualitatively similar to shock waves, in particular with initial conditions of Brownian type. The PDF of the mass distribution is investigated.

Transcription regulation is largely governed by the profile and the dynamics of transcription factors' binding to DNA. Stochastic effects are intrinsic to this dynamics, and the binding to functional sites must be controlled with a certain specificity for living organisms to be able to elicit specific cellular responses. Specificity stems here from the interplay between binding affinity and cellular abundance of transcription factor proteins, and the binding of such proteins to DNA is thus controlled by their chemical potential. We combine large-scale protein abundance data in the budding yeast with binding affinities for all transcription factors with known DNA binding site sequences to assess the behavior of their chemical potentials in an exponential growth phase. A sizable fraction of transcription factors is apparently bound non-specifically to DNA, and the observed abundances are marginally sufficient to ensure high occupations of the functional sites. We argue that a biological cause of this feature is related to its noise-filtering consequences: abundances below physiological levels do not yield significant binding of functional targets and mis-expressions of regulated genes may thus be tamed.

We establish a refined version of the Second Law of Thermodynamics for Langevin stochastic processes describing mesoscopic systems driven by conservative or non-conservative forces and interacting with thermal noise. The refinement is based on the Monge-Kantorovich optimal mass transport and becomes relevant for processes far from quasi-stationary regime. General discussion is illustrated by numerical analysis of the optimal memory erasure protocol for a model for micron-size particle manipulated by optical tweezers.

Survey propagation is a powerful technique from statistical physics that has been applied to solve the 3-SAT problem both in principle and in practice. We give, using only probability arguments, a common derivation of survey propagation, belief propagation and several interesting hybrid methods. We then present numerical experiments which use WSAT (a widely used random-walk based SAT solver) to quantify the complexity of the 3-SAT formulae as a function of their parameters, both as randomly generated and after simplification, guided by survey propagation. Some properties of WSAT which have not previously been reported make it an ideal tool for this purpose - its mean cost is proportional to the number of variables in the formula (at a fixed ratio of clauses to variables) in the easy-SAT regime and slightly beyond, and its behavior in the hard- SAT regime appears to refiect the underlying structure of the solution space that has been predicted by replica symmetry-breaking arguments. An analysis of the tradeoffs between the various methods of search for satisfying assignments shows WSAT to be far more powerful than has been appreciated, and suggests some interesting new directions for practical algorithm development.

Minimal absent words (MAW) of a genomic sequence are subsequences that are absent themselves but the subwords of which are all present in the sequence. The characteristic distribution of genomic MAWs as a function of their length has been observed to be qualitatively similar for all living organisms, the bulk being rather short, and only relatively few being long. It has been an open issue whether the reason behind this phenomenon is statistical or reflects a biological mechanism, and what biological information is contained in absent words. % In this work we demonstrate that the bulk can be described by a probabilistic model of sampling words from random sequences, while the tail of long MAWs is of biological origin. We introduce the novel concept of a core of a minimal absent word, which are sequences present in the genome and closest to a given MAW. We show that in bacteria and yeast the cores of the longest MAWs, which exist in two or more copies, are located in highly conserved regions the most prominent example being ribosomal RNAs (rRNAs). We also show that while the distribution of the cores of long MAWs is roughly uniform over these genomes on a coarse-grained level, on a more detailed level it is strongly enhanced in 3' untranslated regions (UTRs) and, to a lesser extent, also in 5' UTRs. This indicates that MAWs and associated MAW cores correspond to fine-tuned evolutionary relationships, and suggest that they can be more widely used as markers for genomic complexity.

We study stationary states in a diluted asymmetric (kinetic) Ising model. We apply the recently introduced dynamic cavity method to compute magnetizations of these stationary states. Depending on the update rule, different versions of the dynamic cavity method apply. We here study synchronous updates and random sequential updates, and compare local properties computed by the dynamic cavity method to numerical simulations. Using both types of updates, the dynamic cavity method is highly accurate at high enough temperatures. At low enough temperatures, for sequential updates the dynamic cavity method tends to a fixed point, but this does not agree with numerical simulations, while for parallel updates, the dynamic cavity method may display oscillatory behavior. When it converges and is accurate, the dynamic cavity method offers a huge speed-up compared to Monte Carlo, particularly for large systems.

We compare dynamic mean-field and dynamic cavity methods to describe the stationary states of dilute kinetic Ising models. We compute dynamic mean-field theory by expanding in interaction strength to third order, and we compare to the exact dynamic mean-field theory for fully asymmetric networks. We show that in diluted networks, the dynamic cavity method generally predicts magnetizations of individual spins better than both first-order ("naive") and second-order ("TAP") dynamic mean-field theory.

We study the dynamic cavity method for dilute kinetic Ising models with synchronous update rules. For he parallel update rule we find for fully asymmetric models that the dynamic cavity equations reduce to a Markovian dynamics of the (time-dependent) marginal probabilities. For the random sequential update rule, also an instantiation of a synchronous update rule, we find on the other hand that the dynamic cavity equations do not reduce to a Markovian dynamics, unless an additional assumption of time factorization is introduced. For symmetric models we show that a fixed point of ordinary Belief propagation is also a fixed point of the dynamic cavity equations in the time factorized approximation. For clarity, the conclusions of the paper are formulated as three lemmas.

We study the problem of optimizing released heat or dissipated work in stochastic thermodynamics. In the overdamped limit these functionals have singular solutions, previously interpreted as protocol jumps. We show that a regularization, penalizing a properly defined acceleration, changes the jumps into boundary layers of finite width. We show that in the limit of vanishing boundary layer width no heat is dissipated in the boundary layer, while work can be done. We further give an alternative interpretation of the fact that the optimal protocols in the overdamped limit are given by optimal deterministic transport (Burgers equation).

Thermodynamics of small systems has become an important field of statistical physics. Such systems are driven out of equilibrium by a control, and the question is naturally posed how such a control can be optimized. We show that optimization problems in small system thermodynamics are solved by (deterministic) optimal transport, for which very efficient numerical methods have been developed, and of which there are applications in cosmology, fluid mechanics, logistics, and many other fields. We show, in particular, that minimizing expected heat released or work done during a nonequilibrium transition in finite time is solved by the Burgers equation and mass transport by the Burgers velocity field. Our contribution hence considerably extends the range of solvable optimization problems in small system thermodynamics.

In this paper we consider the thermal power of a heat flow through a qubit between two baths. The baths are modeled as a set of harmonic oscillators initially at equilibrium, at two temperatures. Heat is defined as the change of energy of the cold bath, and thermal power is defined as expected heat per unit time, in the long-time limit. The qubit and the baths interact as in the spin-boson model, i.e., through qubit operator sigma(z). We compute thermal power in an approximation analogous to a "noninteracting blip" (NIBA) and express it in the polaron picture as products of correlation functions of the two baths, and a time derivative of a correlation function of the cold bath. In the limit of weak interaction we recover known results in terms of a sum of correlation functions of the two baths, a correlation functions of the cold bath only, and the energy split.

We investigate the growth optimal strategy over a finite time horizon for a stock and bond portfolio in an analytically solvable multiplicative Markovian market model. We show that the optimal strategy consists in holding the amount of capital invested in stocks within an interval around an ideal optimal investment. The size of the holding interval is determined by the intensity of the transaction costs and the time horizon.

We investigate the optimal strategy over a finite time horizon for a portfolio of stock and bond and a derivative in an multiplicative Markovian market model with transaction costs (friction). The optimization problem is solved by a Hamilton-Bellman-Jacobi equation, which by the verification theorem has well-behaved solutions if certain conditions on a potential are satisfied. In the case at hand, these conditions simply imply arbitrage-free ("Black-Scholes") pricing of the derivative. While pricing is hence not changed by friction allow a portfolio to fluctuate around a delta hedge. In the limit of weak friction, we determine the optimal control to essentially be of two parts: a strong control, which tries to bring the stock-and-derivative portfolio towards a Black-Scholes delta hedge; and a weak control, which moves the portfolio by adding or subtracting a Black-Scholes hedge. For simplicity we assume growth-optimal investment criteria and quadratic friction.

We study the performance and convergence properties of the susceptibility propagation (SusP) algorithm for solving the Inverse Ising problem. We first study how the temperature parameter (T) in a Sherrington-Kirkpatrick model generating the data influences the performance and convergence of the algorithm. We find that at the high temperature regime (T > 4), the algorithm performs well and its quality is only limited by the quality of the supplied data. In the low temperature regime (T < 4), we find that the algorithm typically does not converge, yielding diverging values for the couplings. However, we show that by stopping the algorithm at the right time before divergence becomes serious, good reconstruction can be achieved down to T a parts per thousand 2. We then show that dense connectivity, loopiness of the connectivity, and high absolute magnetization all have deteriorating effects on the performance of the algorithm. When absolute magnetization is high, we show that other methods can be work better than SusP. Finally, we show that for neural data with high absolute magnetization, SusP performs less well than TAP inversion.

In this paper we analyze Belief Propagation over a Gaussian model in a dynamic environment. Recently, this has been proposed as a method to average local measurement values by a distributed protocol (Consensus Propagation, Moallemi C. C. and Van Roy B., IEEE Trans. Inf. Theory, 52 (2006) 4753) where the average is available for read-out at every single node. In the case that the underlying network is constant but the values to be averaged fluctuate ("dynamic data"), convergence and accuracy are determined by the spectral properties of an associated Ruelle-Perron-Frobenius operator. For Gaussian models on Erdos-Renyi graphs, numerical computation points to a spectral gap remaining in the large- size limit, implying exceptionally good scalability. In a model where the underlying network also fluctuates ("dynamic network"), averaging is more effective than in the dynamic data case. Altogether, this implies very good performance of these methods in very large systems, and opens a new field of statistical physics of large (and dynamic) information systems.

We develop a framework to discuss the stability of epigenetic states as first exit problems in dynamical systems with noise. We consider in particular the stability of the lysogenic state of the A prophage. The formalism defines a quantitative measure of robustness of inherited states.

We propose an alternative notion of time reversal in open quantum systems as represented by linear quantum operations, and a related generalization of classical entropy production in the environment. This functional is the ratio of the probability to observe a transition between two states under the forward and the time reversed dynamics, and leads, as in the classical case, to fluctuation relations as tautological identities. As in classical dynamics in contact with a heat bath, time reversal is not unique, and we discuss several possibilities. For any bistochastic map its dual map preserves the trace and describes a legitimate dynamics reversed in time, in that case the entropy production in the environment vanishes. For a generic stochastic map we construct a simple quantum operation which can be interpreted as a time reversal. For instance, the decaying channel, which sends the excited state into the ground state with a certain probability, can be reversed into the channel transforming the ground state into the excited state with the same probability.

We consider the optimization of the average entropy production in inhomogeneous temperature environments within the framework of stochastic thermodynamics. For systems modeled by Langevin equations (e.g. a colloidal particle in a heat bath) it has been recently shown that a space-dependent temperature breaks the time reversal symmetry of the fast velocity degrees of freedom resulting in an anomalous contribution to the entropy production of the overdamped dynamics. We show that optimization of entropy production is determined by an auxiliary deterministic problem formally analogous to motion on a curved manifold in a potential. The "anomalous contribution" to entropy plays the role of the potential and the inverse of the diffusion tensor is the metric. We also find that entropy production is not minimized by adiabatically slow, quasi-static protocols but there is a finite optimal duration for the transport process. As an example we discuss the case of a linearly space-dependent diffusion coefficient.

Particle motion at the microscale is an incessant tug-of-war between thermal fluctuations and applied forces on one side and the strong resistance exerted by fluid viscosity on the other. Friction is so strong that completely neglecting inertia - the overdamped approximation - gives an excellent effective description of the actual particle mechanics. In sharp contrast to this result, here we show that the overdamped approximation dramatically fails when thermodynamic quantities such as the entropy production in the environment are considered, in the presence of temperature gradients. In the limit of vanishingly small, yet finite, inertia, we find that the entropy production is dominated by a contribution that is anomalous, i.e., has no counterpart in the overdamped approximation. This phenomenon, which we call an entropic anomaly, is due to a symmetry breaking that occurs when moving to the small, finite inertia limit. Anomalous entropy production is traced back to futile phase-space cyclic trajectories displaying a fast downgradient sweep followed by a slow upgradient return to the original position.

Motivation: Estimation of bacterial community composition from a high-throughput sequenced sample is an important task in metagenomics applications. As the sample sequence data typically harbors reads of variable lengths and different levels of biological and technical noise, accurate statistical analysis of such data is challenging. Currently popular estimation methods are typically time-consuming in a desktop computing environment.

Results: Using sparsity enforcing methods from the general sparse signal processing field (such as compressed sensing), we derive a solution to the community composition estimation problem by a simultaneous assignment of all sample reads to a pre-processed reference database. A general statistical model based on kernel density estimation techniques is introduced for the assignment task, and the model solution is obtained using convex optimization tools. Further, we design a greedy algorithm solution for a fast solution. Our approach offers a reasonably fast community composition estimation method, which is shown to be more robust to input data variation than a recently introduced related method.

Availability and implementation: A platform-independent Matlab implementation of the method is freely available at http://www.ee.kth.se/ctsoftware; source code that does not require access to Matlab is currently being tested and will be made available later through the above Web site.

A method to approximately close the dynamic cavity equations for synchronous reversible dynamics on a locally treelike topology is presented. The method builds on (a) a graph expansion to eliminate loops from the normalizations of each step in the dynamics and (b) an assumption that a set of auxilary probability distributions on histories of pairs of spins mainly have dependencies that are local in time. The closure is then effectuated by projecting these probability distributions on n-step Markov processes. The method is shown in detail on the level of ordinary Markov processes (n = 1) and outlined for higher-order approximations (n > 1). Numerical validations of the technique are provided for the reconstruction of the transient and equilibrium dynamics of the kinetic Ising model on a random graph with arbitrary connectivity symmetry.

Macroscopic fluctuation theory has shown that a wide class of non-equilibrium stochastic dynamical systems obey a large deviation principle, but except for a few one-dimensional examples these large deviation principles are in general not known in closed form. We consider the problem of constructing successive approximations to an (unknown) large deviation functional and show that the non-equilibrium probability distribution the takes a Gibbs-Boltzmann form with a set of auxiliary (non-physical) energy functions. The expectation values of these auxiliary energy functions and their conjugate quantities satisfy a closed system of equations which can imply a considerable reduction of dimensionality of the dynamics. We show that the accuracy of the approximations can be tested self-consistently without solving the full non- equilibrium equations. We test the general procedure on the simple model problem of a relaxing 1D Ising chain.

We consider a one-step replica symmetry breaking description of the Edwards–Anderson spin glass model in 2D. The ingredients of this description are a Kikuchi approximation to the free energy and a second-level statistical model built on the extremal points of the Kikuchi approximation, which are also fixed points of a generalized belief propagation (GBP) scheme. We show that a generalized free energy can be constructed where these extremal points are exponentially weighted by their Kikuchi free energy and a Parisi parameter y, and that the Kikuchi approximation of this generalized free energy leads to second-level, one-step replica symmetry breaking (1RSB), GBP equations. We then proceed analogously to the Bethe approximation case for tree-like graphs, where it has been shown that 1RSB belief propagation equations admit a survey propagation solution. We discuss when and how the one-step-replica symmetry breaking GBP equations that we obtain also allow a simpler class of solutions which can be interpreted as a class of generalized survey propagation equations for the single instance graph case.

Direct-coupling analysis is a group of methods to harvest information about coevolving residues in a protein family by learning a generative model in an exponential family from data. In protein families of realistic size, this learning can only be done approximately, and there is a trade-off between inference precision and computational speed. We here show that an earlier introduced l(2)-regularized pseudolikelihood maximization method called plmDCA can be modified as to be easily parallelizable, as well as inherently faster on a single processor, at negligible difference in accuracy. We test the new incarnation of the method on 143 protein family/structure-pairs from the Protein Families database (PFAM), one of the larger tests of this class of algorithms to date.

Spatially proximate amino acids in a protein tend to coevolve. A protein's three-dimensional (3D) structure hence leaves an echo of correlations in the evolutionary record. Reverse engineering 3D structures from such correlations is an open problem in structural biology, pursued with increasing vigor as more and more protein sequences continue to fill the data banks. Within this task lies a statistical inference problem, rooted in the following: correlation between two sites in a protein sequence can arise from firsthand interaction but can also be network-propagated via intermediate sites; observed correlation is not enough to guarantee proximity. To separate direct from indirect interactions is an instance of the general problem of inverse statistical mechanics, where the task is to learn model parameters (fields, couplings) from observables (magnetizations, correlations, samples) in large systems. In the context of protein sequences, the approach has been referred to as direct-coupling analysis. Here we show that the pseudolikelihood method, applied to 21-state Potts models describing the statistical properties of families of evolutionarily related proteins, significantly outperforms existing approaches to the direct-coupling analysis, the latter being based on standard mean-field techniques. This improved performance also relies on a modified score for the coupling strength. The results are verified using known crystal structures of specific sequence instances of various protein families. Code implementing the new method can be found at http://plmdca.csc.kth.se/.

This paper gives a brief summary of our experience in applying a physics-style approach for analyzing the behavior of structured overlay networks that deploy self-organization and self-repair policies. Such systems are not always simple to model analytically and simulation of scales of interest can be prohibitive. Physicists deal with scale by characterizing a system using intensive variables, i.e. variables that are size independent. The approach proved its substantial usefulness when applied to satisfiability theory and it is the hope that it can be as useful in the field of large-scale distributed systems. We report here our finding of one simple self-organization-related intensive variable, and a more complex self-repair-related intensive variable.

In the majority of structured peer-to-peer overlay networks a graph with a desirable topology is constructed. In most cases, the graph is maintained by a periodic activity performed by each node in the graph to preserve the desirable structure in face of the continuous change of the set of nodes. The interaction of the autonomous periodic activities of the nodes renders the performance analysis of such systems complex and simulation of scales of interest can be prohibitive. Physicists, however, are accustomed to dealing with scale by characterizing a system using intensive variables, i.e. variables that are size independent. The approach has proved its usefulness when applied to satisfiability theory. This work is the first attempt to apply it in the area of distributed systems. The contribution of this paper is two-fold. First, we describe a methodology to be used for analyzing the performance of large scale distributed systems. Second, we show how we applied the methodology to find an intensive variable that describe the characteristic behavior of the Chord overlay network, namely, the ratio of the magnitude of perturbation of the network (joins/failures) to the magnitude of periodic stabilization of the network.

In the majority of structured peer-to-peer overlay networks a graph with a desirable topology is constructed. In most cases, the graph is maintained by a periodic activity performed by each node in the graph to preserve the desirable structure in face of the continuous change of the set of nodes. The interaction of the autonomous periodic activities of the nodes renders the performance a﻿nalysis of such systems complex and simulation of scales of interest can be prohibitive. Physicists, however, are accustomed to dealing with scale by characterizing a system using intensive variables, i.e. variables that are size independent. The approach has proved its usefulness when applied to satisfiability theory. This work is the first attempt to apply it in the area of distributed systems. The contribution of this paper is two-fold. First, we describe a methodology to be used for analyzing the performance of large scale distributed systems. Second, we show how we applied the methodology to find two intensive variables that describe the characteristic behavior of the Chord overlay network, the variables are: 1) The density of nodes in the identifier space and 2) The ratio of the magnitude of perturbation of the network (joins/failures) to the magnitude of periodic stabilization of the network.