Some elementary formal grammars in BNF-like style are used to show how a string with self-referential properties can be
described with an AFFIX operator that in essence attaches one part of the string to another. In the text, least-energy
methods are for the most part ignored for the sake of clarity and O(n) linearity; Thus, structures and structural components
accepted (matched) by these methods are “possible” structures and structural components only.

Linear – Regular Expressions (RE)– Type 3 Grammars
Not much needs to be said about regular expression other than the fact they they are a powerful adjunct to context free
and context sensitive grammars.
Written as a production an RE may look like this:

Attenuation Sequences are systems of stem and loop sequences that can fold in different ways depending on the presence of certain molecules. They may contain 3 or 4 areas that contain compliment bases:

Segment 1 can form a stem with segment 2. Segment 2 can form a stem with segment 3. Segment 3 and 4 can also form a stem.

On close inspection the second figure above contains 3 tandem compliment base pairs, each one with crossing properties (e.g. AGT-TCA) but can be handled similarly (grammar not shown.)

Discussion
RNA, in these cases is considered an abstract string with a 4-letter alphabet that is unable to form true knots. Here, only a few simple bonding rules are applied – this is not always the case in real RNA where wobble pairs and other types of unusual base pairing may exist. None the less, we show that no matter how complex a pattern is, all secondary patterns that RNA can form are reduced to combinations of the two configurations; nested and crossing. The actual grammars used to parse real
RNA secondary structures can be quite complex and nuanced, containing rules that allow for least-energy configurations such that computation structures more closely match natural structures. These grammatical methods could, in theory, also describe strings of any size alphabet that fold in the same basic ways such as polypeptides- more rules are simply added to the existing framework.

An Example of a Grammar-only “Least Energy” Filter for Stem-Loop Structures.

Consider ((((::::)))). Filling in the parenthesis with nucleotides that fit we can begin with
AAAAxxxxTTTT, GGGGxxxxCCCC, AGAAxxxxTTCT and so on such that any of these are accepted by the above example grammar. In fact, only a few of these configurations have enough strength through weak hydrogen bonding to exist as a natural structure such as GGGGxxxxCCCC. Thus, some set of filters must be set to weed out weaker configurations. One way is to check the final match against a rule set that's known to work. For a stem of length 4 there are n^r or 256 possible combinations, 81 such rules that work leaving 256 – 81 = 175 that do not work:

While amino acid interactions (proteins) are more complex than nucleotide (RNA) interactions, the underlying principles can still be based on similar context-free grammatical production predicates. The secondary shapes of proteins topologically equivalent to the stem-loop and pseudoknot structures found in RNA.

The figure above represents a toy protein where the helix is held into its shape at 12 points, lettered from a to l.

Projected to a flat topology map (below) the crossing properties of each connection becomes more clear: a and c connect across b and so forth. The second set of letters represents how the points of connection are rewritten to represent a point and its compliment (compliment denoted by the prime strike.) Thus: a connects to a' across b, c to c' across b' and d...

If we continue adding amino acids to our toy protein to the point in which the chain may fold back upon the helix and interact with it the topology becomes more complex with an increase of crossing interactions. None the less, the topology is described in the same way.

or [a b a' c b' d e c' f g e' g f' h g' h' g' d']

We can begin to see a few very long distant amino acid interactions that cross other interactions.

To grammatically describe such a shape as above turns out to be remarkably easy if we rewrite each connection point as having a compliment point; a-a', b-b',c-c'...

where points a = a,

b = b,
c = a',
d = c',
e = b', …

The Algorithm for defining the protein shape describes just two things, a point and its compliment and the placement of each point and its compliment:

To describe the second, more complex helix:

let a be the compliment of a',
let b be the compliment of b',
let c be the compliment of c',
let d be the compliment of d',
let e be the compliment of e',
let f be the compliment of f',
let g be the compliment of g',
let h be the compliment of h'

such that a b a 'c b' d e c' f g e' g f' h g' h' g' d'

A very simple grammar for the second helix is expressed in a single predicate: where “rho <=” represents an AFFIX operator that builds a complement from its corresponding point.