Once again the oft-repeated phrase "More complex than previously thought" has been used to describe new research cataloguing thousands of proteins produced from the human genome.1 This groundbreaking biotech news is undergirded by two recent papers published in the journal Nature that describe what has been called the first rough draft of the human proteome.2,3

Unlike DNA sequencing, the extraction, isolation, and identification of proteins is no easy task. To be able to characterize the large diversity of proteins in different tissues, the technologies and chemistries need to be diverse and complex. Nevertheless, technological progress and new instrumentation has advanced to where this can be realized on a much larger scale.

Because the population of proteins (types of proteins and their amounts) differ between tissue types, many different types have to be studied. One study sampled 30 different human tissues while another did 27. Not only do populations of proteins differ in various tissues, but the same gene that encodes a type of protein can make different variants of it called isoforms. Even after a protein is made, it can be altered by cellular machines for different purposes in a process called post-translational modification.4 Thus, a catalogue of proteins for each tissue must be created to fully understand the diversity of the proteome. In fact, the task is so daunting that one of the lead researchers believes "that the human proteome is so extensive and complex that researchers' catalog of it will never be fully complete."1

Perhaps the most interesting aspect of these new reports is the discovery of hundreds of new proteins from regions of the human genome previously thought to be non-coding junk. One paper found 193 such proteins with 140 of those being produced by pseudogenes—a category of DNA formerly classified as broken genes or genomic fossils, but now proven to be important functional features of the genome.2,5 As noted by one of the researchers in an interview, "This was the most exciting part of this study, finding further complexities in the genome" and "The fact that 193 of the proteins came from DNA sequences predicted to be non-coding means that we don't fully understand how cells read DNA, because clearly those sequences do code for proteins."1

Taking a slightly different approach, the other research team found 430 new proteins produced by alleged non-coding DNA regions of the genome.3 And 404 of those originated from RNA producing areas located in between protein-coding genes called long intergenic non-coding RNA (lincRNA) regions.6 Interestingly, one of the stipulations for being called a lincRNA gene is that they supposedly don't produce proteins. Looks like a rule change needs to be considered—or perhaps even better, how about a paradigm shift that considers intelligent design and biocomplexity the norm. This makes far better sense given that "More complex than previously thought" has now become the standard response of scientists probing the mysteries of the cell.

An analogy: Imagine that a car mechanic has a stack of blueprints (genes) to make several different kinds of cars (proteins). We may assume that he does, indeed, produce one sort of car for each blueprint, however that's not what happens. Instead, the mechanic makes a few changes to each blueprint, and he produces a variety of cars (isoforms) from each blueprint. We may think, "That's fine. I can follow that." But then, after the mechanic is finished, we see another mechanic come in and start making his own modifications to the cars already made (post-translational modification).