It is well known that many complex systems, in both nature and technology, exhibit hierarchical modularity: smaller modules, each of them providing a certain function, are used within larger modules that perform more complex functions. What is not well understood, however, is how this hierarchical structure (which is fundamentally a network property) emerges initially, how it relates to the desired input-output function of the entire system, and how it evolves over time.

We present an optimization framework, called Lexis, that constructs a given set of target strings through a minimum-cost hierarchy of concatenations of shorter strings. Lexis has close ties to the classic Smallest Grammar Problem (SGP) and along this connection, we propose a generalized formulation of SGP that produces smaller models than the current best approximations, and better captures the syntactic structure of natural language.

We also use Lexis as a modeling framework, referred to as Evo-Lexis, that can capture the emergence and dynamics of hierarchical structure in evolving systems. By modeling evolutionary mechanisms such as mutation, recombination, and selection, we show that Evo-Lexis produces deep hierarchies with interesting properties, such as a small but relatively stable core despite frequent changes at the higher layers of the hierarchy. This modeling study provides a plausible explanation for the hourglass (or bow-tie) structure that is often observed in some natural and technological hierarchical systems. Furthermore, our results suggest the existence of "major transitions" and "punctuated equilibria" in the evolutionary trajectory of hourglass-shaped hierarchies. The extinction of central modules is found to be the main factor behind this effect.

Finally, we analyze the iGEM dataset, which relates to a synthetic-DNA competition that has been running for more than 15 years, in order to examine the validity of the Evo-Lexis model. Although the analysis shows that the iGEM Lexis-derived hierarchies are less cost-efficient and less deep than their Evo-Lexis counterparts (mostly due to an expanding set of source nodes), iGEM exhibits the Evo-Lexis bias to reuse certain nodes more often than others. This results in hierarchies that exhibit the hourglass effect with stable core nodes, which is consistent with the predictions of the Evo-Lexis modeling framework.