As clearly illustrated in some of the most highly voted comments right here, imagine what could be accomplished not only in data storage but in the field of energy itself if we could but harness the power of SNARK!!

Why do CPUs work in binary? It's not like an electrical signal can only be on or off, you can have on, 50% on 100% on, and on and on infinitely, of course that infinity would be cut down by noise, and 0 couldn't REALLY be a zero, because of the noise-floor and so on, but still.

The transistors in a computer are either on or off. Yes, we can operate with analog thresholds but the fundamental logic that invades computers is Boolean operations.

That said, there's nothing that says the math inside the computer can't be used to represent base 10 rather than base 2. Long ago at the dawn of the computer age (and by dawn I believe it might have been around the time of Babbages machines) someone went to the effort of figuring out how many operations would be required to perform additions in base 10 vs. in base 2. It turns out that base 2 requires fewer operations than any other base. And since any base can be transformed into any other base quite trivially there's no penalty performance-wise. It was a long while until computers actually performed multiplies without adding numbers repetitively so for all I know it takes fewer operations to multiply in base 10 than in base 2. I certainly don't know the answer to that but someone else may. However, by that point we were committed to base 2 as the fundamental architecture of computers and that's not likely to change for a long, long, long time.

(a quick converter in Commodore BASIC, you can run this in the V.I.C.E emulator)10 input a20 p=130 for x=1 to 10040 if p<=a then p=p*350 next60 :70 a$=a$+mid$(str$(int(a/p)),2)80 if a>=p then a=a-int(a/p)*p90 if P>1 then p=p/3:goto 70100 print a$

In other languages binary->decimal->trinary for 12bit words or byte/word triples direct would normally be done. Converting to Octal, just made it easier to do the binary->decimal with a calculator. As long as the bit count in each chunk is a multiple of 3, the conversion is clean and simple.

Second you statement is only partially right: While it is true that only a fraction of the DNA actually codes for proteins, (most of) the rest isn't junk. Most of it has regulatory function. For example some of it is actually codes for a start and stop sequence. Other parts are silencers or enhancers. Other parts codes just for different types of RNA and not a protein. Then you also have telomeres, which are repetitive elements that get's shortened every time the DNA-Polymerase synthesizes a new copy of the DNA, which lead to apoptosis after a critical length has been undercut. Scientist are still trying to figure out what is really junk e.g. leftover from evolution and what not. The only real place I know "junk" is created is within the B- and T-Cells as they create their receptors by shuffling different DNA strings and mixing even new nucleotids within. This is of course by design and gives as the ability for an adaptive immune system.

I understand you though: this "misconception" has been fairly popularized by popular science media over and over again. Luckily (at least here in Germany) the correct statements are already being taught.

Well, if you learned what they taught you, then the correct information is not being taught. The 5% figure is the percentage of the genome that's evolutionarily conserved. Only about 2-3% of that is coding, and the 5% figure includes the regulatory DNA, as enhancers and promoters are generally conserved across species.

(in fact, i worked on a gene where some of the enhancers were more conserved between mice and chickens than the protein as a whole was.)

The majority of the human genome is inactive virus and transposon sequences. The majority of the rest is introns (although these can contain some regulatory sequences). As far as anyone can tell, very little of it is "functional" in the sense of directly contributing to fitness.

The regulatory genes that code for octopus are not conserved in chicken. These genes when compared will not be the same and will be counted as part of the 95%

The heavily conserved genes that are found across species lines are the function library genes that encode low level critical functions. The high level code that uses the library is unique to each custom application.

Compare the source code of two different programs written in C. You will find a number of heavily conserved function calls. This does not mean the remainder is junk code

Why do CPUs work in binary? It's not like an electrical signal can only be on or off, you can have on, 50% on 100% on, and on and on infinitely, of course that infinity would be cut down by noise, and 0 couldn't REALLY be a zero, because of the noise-floor and so on, but still.

It makes the circuit design easier. You check the value to see if it is high end or low end and arbitrarily assign the binary values 1 and 0 to each extreme.

Trinary circuits have been designed using high/middle/low values. There was a trinary logic electronic computer built, but the binary machines became standard due to simplicity of binary electronic circuits.

Fuzzy logic systems using decimal have also been designed. Having 10 discreet values per "bit" allows much more flexibility in logic. You can have True/90% True/80% True/70% True//Maybe so/Maybe not/70% False/80% False/90% False/False

Edit: Replaced the sequence below of values for 11 valued logic with the correct sequence for 10 valued logic. Note that base 11 logic has an absolute maybe value that is missing from binary and decimal.**You can have True/90% True/80% True/70% True/60% True/Maybe/60% False/70% False/80% False/90% False/False**

Now try wiring the circuit that encodes that logic using on/off circuits and you will understand the popularity of binary

The regulatory genes that code for octopus are not conserved in chicken. These genes when compared will not be the same and will be counted as part of the 95%

Do you have data to support this? I have not seen any reports of octopus genome sequences being released.

Quote:

The heavily conserved genes that are found across species lines are the function library genes that encode low level critical functions. The high level code that uses the library is unique to each custom application.

Compare the source code of two different programs written in C. You will find a number of heavily conserved function calls. This does not mean the remainder is junk code

Right, because obviously the genome is similar in properties to a C program.

Genomes are comprised of:1. DNA that is selected for, because it is required for survival and reproduction.2. DNA that is not selected against, because it has either no deleterious effect, or the deleterious effects are small enough that selection does not occur.

The chicken and octopus lineages diverged quite some time ago, so there are obviously going to be differences. However, it is clear that many genomes have far more DNA than is necessary to allow full function. As an example, frog genome sizes vary over a range of more than 10-fold. Are you sure that the ornate horned frog really has functions for all of the DNA in the much larger genome it has that are not required in the ornate burrowing frog?

As an example, frog genome sizes vary over a range of more than 10-fold. Are you sure that the ornate horned frog really has functions for all of the DNA in the much larger genome it has that are not required in the ornate burrowing frog?

The regulatory genes that code for octopus are not conserved in chicken. These genes when compared will not be the same and will be counted as part of the 95%

Do you have data to support this? I have not seen any reports of octopus genome sequences being released.

Quote:

The heavily conserved genes that are found across species lines are the function library genes that encode low level critical functions. The high level code that uses the library is unique to each custom application.

Compare the source code of two different programs written in C. You will find a number of heavily conserved function calls. This does not mean the remainder is junk code

Right, because obviously the genome is similar in properties to a C program.

Genomes are comprised of:1. DNA that is selected for, because it is required for survival and reproduction.2. DNA that is not selected against, because it has either no deleterious effect, or the deleterious effects are small enough that selection does not occur.

The chicken and octopus lineages diverged quite some time ago, so there are obviously going to be differences. However, it is clear that many genomes have far more DNA than is necessary to allow full function. As an example, frog genome sizes vary over a range of more than 10-fold. Are you sure that the ornate horned frog really has functions for all of the DNA in the much larger genome it has that are not required in the ornate burrowing frog?

Not sure if the genome has been 100% sequenced, but the studies of conserved genes are either based on speculation or sequencing. Which do you consider the more likely basis?

The program encoded in an octopus is not written in C. It is written in a programming language though. The language is interpreted by a biological device. Simple DNA programs have been written for various purposes. Various biological functions have been identified (they are called genes). Several very basic program execution processes have been identified (protein transcription for one example).

The genes that are identified as conserved have been found in multiple organisms. This is why they are called conserved.

In C code printf() is a highly conserved function that would lead an analyst sequencing C source code who is leaning C by sequencing the ASCII coded file with no prior knowledge beyond being able to identify the fact that we use a coding system that assigns 2 unique values to each bit. Once that has been pinned down, then they start looking for conserved sequences and find things like the sequence that we who are omniscient (regarding the C source anyways) know is "printf(" They will notice that this sequence of arbitrary bits is highly conserved across examples of C source and will come to believe that it is likely to be very important. They will of course find mutant forms such as sprintf( and by deliberately introducing errors in the code they can observe the results...usually fatal.

Come to think of it, that is pretty much how research on biological programs written in DNA is done

Biological engineering does not have to be confined to the laboratories of high-end industry laboratories. Rather, it is desirable to foster a more open culture of biological technology. This talk is an effort to do so; it aims to equip you with basic practical knowledge of biological engineering.

Once you get past the prejudiced attitude that a biological computer cannot be constructed, then you begin to realize that the past 60 years of research has been working on documenting the programming language & interpreters used by naturally occurring biological computers.

I will let you continue the research into compilers that generate biological machines. It will likely be a few decades yet before a paramecium can be written, but that is my bias. Imagine describing the Watson computer playing Jeopardy on television to Dirac in 1930..I am guessing he would either assume you were writing scientific fantasy or were just outright insane

Edit: The youtube video is a lecture by those idiots at MIT discussing the current state of genetic programming. Literally. Libraries of standardized parts. Custom compilation of user designed bacteria. Where to find prewritten genes. Where to send your programs to have the living material compiled and returned to you. Speculation on desktop DNA synthesizers that are currently in design phase. Safety issues...complete code for things like hemorrhagic fevers are available online ready to feed into a synthesizer (genetic compiler input: genetic sequence, output: living organism or functional virus)

Interesting lecture. The state of biological programming is much more advanced than the last time I looked into it. They are still having problems reverse with the reverse engineering of natural systems, but the International Genetic Engineering Machine competition is an annual contest for genetic hackers and 2d generation genetic programming languages now exist and are being improved. MIT has an online function library with many basic genetic parts prewritten, ready to plug into your own custom creature.

I strongly suggest watching the MIT lecture before you declare an MIT instructor to be an idiot. (No, not me, but this lecture repeats what I said and does it a lot better)

We already store things in base-10, and for us english speakers, base-26. 8-bit ASCII? 16-bit Unicode? You get the idea. What's relevant is the overhead of encoding/decoding and validation/error checking.

As I understood it, while we do store things in higher than base 2 in software, when the data is actually written to physical mediums (like a hard disk) it's converted to binary, which limits how much data can be stored in a single unit. So the software encoding is cumulative with the hardware storage in terms of optimising the space taken up

The regulatory genes that code for octopus are not conserved in chicken. These genes when compared will not be the same and will be counted as part of the 95%

Do you have data to support this? I have not seen any reports of octopus genome sequences being released.

Quote:

The heavily conserved genes that are found across species lines are the function library genes that encode low level critical functions. The high level code that uses the library is unique to each custom application.

Compare the source code of two different programs written in C. You will find a number of heavily conserved function calls. This does not mean the remainder is junk code

Right, because obviously the genome is similar in properties to a C program.

Genomes are comprised of:1. DNA that is selected for, because it is required for survival and reproduction.2. DNA that is not selected against, because it has either no deleterious effect, or the deleterious effects are small enough that selection does not occur.

The chicken and octopus lineages diverged quite some time ago, so there are obviously going to be differences. However, it is clear that many genomes have far more DNA than is necessary to allow full function. As an example, frog genome sizes vary over a range of more than 10-fold. Are you sure that the ornate horned frog really has functions for all of the DNA in the much larger genome it has that are not required in the ornate burrowing frog?

Not sure if the genome has been 100% sequenced, but the studies of conserved genes are either based on speculation or sequencing. Which do you consider the more likely basis?

The program encoded in an octopus is not written in C. It is written in a programming language though. The language is interpreted by a biological device. Simple DNA programs have been written for various purposes. Various biological functions have been identified (they are called genes). Several very basic program execution processes have been identified (protein transcription for one example).

The genes that are identified as conserved have been found in multiple organisms. This is why they are called conserved.

In C code printf() is a highly conserved function that would lead an analyst sequencing C source code who is leaning C by sequencing the ASCII coded file with no prior knowledge beyond being able to identify the fact that we use a coding system that assigns 2 unique values to each bit. Once that has been pinned down, then they start looking for conserved sequences and find things like the sequence that we who are omniscient (regarding the C source anyways) know is "printf(" They will notice that this sequence of arbitrary bits is highly conserved across examples of C source and will come to believe that it is likely to be very important. They will of course find mutant forms such as sprintf( and by deliberately introducing errors in the code they can observe the results...usually fatal.

You totally missed my point. On the assumption that I was unclear, I will try again.

The genome is comprised of a large variety of DNA sequences. Many of these sequences have clear function. Many of these sequences are clearly leftovers from an evolutionary process that tends not to discard useless parts if those useless parts are not actively harmful.

The coding sequence for ubiquitin is highly conserved, and clearly, changing it is usually fatal. However, many other sequences do not play a role in describing the biological functions that are necessary for life.

Much of the genome is the equivalent of a computer program with large quantities of comments, in which many of the comments are nonsense. This does not detract from the usefulness of the non-comment code, but does not mean that all of the characters have any meaning at all.

What we are discussing here is the exact opposite of the IGEM competitions, in which essentially the entirety of the DNA does have a function, and is included intentionally by the people doing the work. The junk DNA present in most eukaryotic genomes is largely the result of random additions to those genomes, with small amounts having useful function, and large amounts acting as a method for wasting nucleotides.

To put it another way, you could argue that the genome is like a C program. However, the C program contains every character ever typed by the programmer, including when that programmer was drunk, when the programmer was writing nasty letters to the IRS, and when the programmer's cat walked across the keyboard. The bad code is (mostly) commented out, and the gibberish is left in because it does not do much of anything, and the compilers (actually, the replication and repair DNA polymerases and the transcription RNA polymerases and transcription factors) do not choke on it.

The regulatory genes that code for octopus are not conserved in chicken. These genes when compared will not be the same and will be counted as part of the 95%

Do you have data to support this? I have not seen any reports of octopus genome sequences being released.

Quote:

The heavily conserved genes that are found across species lines are the function library genes that encode low level critical functions. The high level code that uses the library is unique to each custom application.

Compare the source code of two different programs written in C. You will find a number of heavily conserved function calls. This does not mean the remainder is junk code

Right, because obviously the genome is similar in properties to a C program.

Genomes are comprised of:1. DNA that is selected for, because it is required for survival and reproduction.2. DNA that is not selected against, because it has either no deleterious effect, or the deleterious effects are small enough that selection does not occur.

The chicken and octopus lineages diverged quite some time ago, so there are obviously going to be differences. However, it is clear that many genomes have far more DNA than is necessary to allow full function. As an example, frog genome sizes vary over a range of more than 10-fold. Are you sure that the ornate horned frog really has functions for all of the DNA in the much larger genome it has that are not required in the ornate burrowing frog?

Not sure if the genome has been 100% sequenced, but the studies of conserved genes are either based on speculation or sequencing. Which do you consider the more likely basis?

The program encoded in an octopus is not written in C. It is written in a programming language though. The language is interpreted by a biological device. Simple DNA programs have been written for various purposes. Various biological functions have been identified (they are called genes). Several very basic program execution processes have been identified (protein transcription for one example).

The genes that are identified as conserved have been found in multiple organisms. This is why they are called conserved.

In C code printf() is a highly conserved function that would lead an analyst sequencing C source code who is leaning C by sequencing the ASCII coded file with no prior knowledge beyond being able to identify the fact that we use a coding system that assigns 2 unique values to each bit. Once that has been pinned down, then they start looking for conserved sequences and find things like the sequence that we who are omniscient (regarding the C source anyways) know is "printf(" They will notice that this sequence of arbitrary bits is highly conserved across examples of C source and will come to believe that it is likely to be very important. They will of course find mutant forms such as sprintf( and by deliberately introducing errors in the code they can observe the results...usually fatal.

You totally missed my point. On the assumption that I was unclear, I will try again.

The genome is comprised of a large variety of DNA sequences. Many of these sequences have clear function. Many of these sequences are clearly leftovers from an evolutionary process that tends not to discard useless parts if those useless parts are not actively harmful.

The coding sequence for ubiquitin is highly conserved, and clearly, changing it is usually fatal. However, many other sequences do not play a role in describing the biological functions that are necessary for life.

Much of the genome is the equivalent of a computer program with large quantities of comments, in which many of the comments are nonsense. This does not detract from the usefulness of the non-comment code, but does not mean that all of the characters have any meaning at all.

What we are discussing here is the exact opposite of the IGEM competitions, in which essentially the entirety of the DNA does have a function, and is included intentionally by the people doing the work. The junk DNA present in most eukaryotic genomes is largely the result of random additions to those genomes, with small amounts having useful function, and large amounts acting as a method for wasting nucleotides.

To put it another way, you could argue that the genome is like a C program. However, the C program contains every character ever typed by the programmer, including when that programmer was drunk, when the programmer was writing nasty letters to the IRS, and when the programmer's cat walked across the keyboard. The bad code is (mostly) commented out, and the gibberish is left in because it does not do much of anything, and the compilers (actually, the replication and repair DNA polymerases and the transcription RNA polymerases and transcription factors) do not choke on it.

What is driving the trend to deprecate the term "junk" is the amount of functional code with an unidentified function. Yes there is a lot of code that has been disabled by "maintenance", but there is also a lot of that code that gets activated. Sometimes by rare environmental triggers, sometimes as an unidentified portion of a known gene and sometimes as an unidentified gene.

That there is genuine junk is certain. One of the items mentioned in the lecture is the work of a French lab that recreated an extinct virus by using error correction on multiple introns the virus left in human DNA. However the focus of that was the recreation of an extinct retrovirus. It is possible that some of the viral genome is being reused by a human gene, but that is part of the ongoing attempt to try to map out the functions of the code. The difficulty of identifying functional vs non-functional was emphasized in the bit on the refactoring of a bacterial virus. They know that the genes they have identified are required for a functional virus, approx 40% of the viral sequences that are known to be genes have no identified function. In addition to that, there is the efficient packing where a single sequence may be part of 2 or more functional genes. Not mentioned in the lecture is that the transcription process often skips portions of the sequence during read-out.

The natural genetic code is not a clean well structured design. Instead it is the result of copy errors being used to make the organism more efficient. The coding style is not spaghetti ... spaghetti code is far easier to read and understand than evolved code is. Evolved code is closer to what you would get by unrolling a boxful of yarn balls on the floor, adding a few dozen wound up kittens, then sweeping the result up and stuffing it back in the box..of course that is probably still neater and easier to read than the real thing

It is easy to say the 3% to 5% of the DNA is highly conserved and likely is critical even if unidentified. What cannot be determined is what percentage of the remainder is functional, marginally functional, deleterious but functioning (cancer genes for example) or non-functional remnants. Some such as the Vitamin C production genes may be only partly broken so that the primary function no longer works, but the remaining pieces are still in use for other purposes.

What is driving the trend to deprecate the term "junk" is the amount of functional code with an unidentified function. Yes there is a lot of code that has been disabled by "maintenance", but there is also a lot of that code that gets activated. Sometimes by rare environmental triggers, sometimes as an unidentified portion of a known gene and sometimes as an unidentified gene.

Some people clearly misinterpret the term "junk". Sidney Brenner remarked (he may not have been first, but he said at a seminar I attended) that the difference between "junk" and "garbage" is that both are useless, but that junk is kept, while garbage is discarded. For humans with attics filled with junk, the junk is material that is kept because the space is there, and because it might be useful in the future. The genome is somewhat similar, although the genome sometimes keeps ticking time-bombs, while most humans tend to discard those.

The junk is the source material for, and the residue of, many evolutionary events. Most of the genome (probably at least 70%, and likely more than 90%) has no current function, and it is difficult to justify the statement that deleting it would necessarily be harmful. Some expressed genes are redundant, and can be deleted with no detectable effects on the organism. Some of the non-functional DNA may be activated in the future, but that is not the same as claiming that it has a current role. The term junk is appropriate for the material that may have a function in the future, but exists now merely as additional sequences that the replication polymerases copy because they have no mechanism to allow avoiding doing so.

Is it worth editing the genome to remove the junk? (Humans currently lack the technology to do so for eukaroytic genomes, so the question remains hypothetical.) Some junk might be useful to remove. As an example, chromosome 6 has CYP21 and CYP21A sequences. CYP21A is non-functional, and crossing-over that joins part of CYP21 with CYP21A results in a relatively frequent, potentially fatal, genetic abnormality known as 21-hydroxylase deficiency.

MPAA/RIAA are already after out Internet freedom and privacy. Does this mean they could be going after our DNA in the future?

What I'm concerned about is royalty fees if I want to reproduce copyrighted DNA via procreation (à la Monsanto). Do you think they will fine genetic piracy per copy or per organism? It's going to be interesting when my great-great-great-grandchild is not allowed to procreate because his genes contain an illegal copy. Perhaps there will be be genetic DRM that will render sterile people who contain illegal copies of genes.