But Mona
Lisa must have had the highway blues.
You can tell by the way she smiles. — Bob Dylan

What if someone hid a Mona Lisa in a genome?
Would we be able to find her?
Could we tell by the way she smiles?

We start with this tiny true-color bitmap image of the Mona Lisa, La
Gioconda.

This is what Tiny Lisa looks like when stretched out in a long line.

Adding a little height so you can actually see her.

Now, this is what a genome of the E.coli bacteria looks like when
represented as a true-color bit-map. It is visually quite indistinguishable
from random pixels. Amazingly, almost every
single pixel in this image has a unique color!

This gray image shows you where La nostra Bambina is hiding.
Now that you know
where she is, take another look above.

I should like to creep
Through the long brown grasses
That are your lashes. — Angelina Ward Grimké

This is a close-up of the image where Mona is
hiding.
You can see a bit of Mona across the center line. She's the mostly dark
green pixels.

Yon strange blue city crowns a scarped steep
No mortal foot hath bloodlessly essayed:
Dreams and illusions beacon
from its keep.
But at the gate an Angel bares his blade— Edith Wharton

A simple
statistical algorithm quickly found our Lady Lisa. Just take the genome of
E.coli and divide it into segments. Then take the average (arithmetic
mean) of each
segment and note the one that stands out.

Mona Lisa, Mona Lisa,
men have named you.
You're so like the lady with the mystic smile. — Nat
King Cole

Technical details.

The Escherichia
coli K-12 MG1655 genome is about 4 megabases. There are four
available bases; a, c, g, t, so each base takes two bits. It takes three
bases to make a codon, but we will combine four bases into each
8-bit byte for a megabyte of data. There are three bytes in a
true-color pixel. Tiny Mona is just 10 x 14, but it is stretched
out in a line of 140 pixels.

The global
average of the E.coli genome or a random sequence is
very close to 127˝. With 2000 segments of length 500 each, then
averaging each of the 2000 segments, we can
calculate the divergence from the global average.

With a random sequence, the typical
maximum divergence is 11-13.

With the E.coli genome, the maximum
divergence is just over 17.

But Mona typically makes a strong signal
at about 25-35 from the mean as can be seen in the
"Segment Averages" graph
above.

We
can test with different length segments. The Mona signal tested strong for segment lengths
25 to 2000. This graph shows a
close-up with length 25 and indicates a distinct Mona
anomaly. Sorta in the shape of a smile.

Of course, there are many possible
statistical methods that can be used, and modern genomic analysis
stretches the limits of available computational
techniques. However, it is apparent that the crudest statistical
test would find even the teeny tiniest Mona Lisa hiding in an
E.coli genome.