I'm working on a minimal-code program (a simple how-to for programmers and general public) that compares/displays chromosomes, using a technique I called "digital banding". Here is the illustration for Orangutan, Gorilla, Chimp, Human Chromosome 2.

The next experiment I would like to try, is allow selecting how many code letters there are per [insert proper word here]. Only nagging problem, is that the closest word I know or could find is "Codon". But the way the word is defined it's assumed to be a triplet. I still found indication that the word can be used that way, even though it's not used in biology. I'm tempted to just use the word in the program for any number of letters per codon, but thought I had better ask here in case there is a word that more specifically covers singlets, doublets, triplets, quartets, etc..

Thanks for the excellent keywords (BLAST algorithm) that worked great for finding ideas in the google search engine.

I am familiar with the NCBI’s BLAST search engine and interface to their supercomputers, but until now the program has been more precisely looking for abundances (as opposed to similarity which introduces an added level of complexity) so I kept the code basic chemistry, and used terminology from the northwestern.edu link that describes two “groups” of 32 different “triplets” as well as a singlet “spectrum” that should get interesting when the band width is just the right width to see what Guenter is describing. In the cognitive computer model that I explain this corresponds with “addressing” while the genes are the “data”:

The DNA program I am currently writing more or less goes with the Intelligence Design Lab. It is to help visualize what should make it surprisingly easy to model “molecular intelligence” that I describe in the theory of you know what, that thankfully some in the Biology-Online forum(s) years ago knew about so I don't feel out of place or obliged to all over again go into the scientific reasons why I had no choice but to call it this:

In my latest software project I started at the whole chromosome level, where it’s then just a matter of zooming in for more detail where eventually there should be individual letters showing this type of addressing structure:

The main purpose of the new program is to help figure out how to accurately model not yet fully understood processes. At the chromosome level there are fusions, which the new program already helps show. I now need to see more.

Since DNA is by itself only the lifeless memory core of a living system, a more complete computer model also relies on molecular networks like I described here to parallel programmers and GPU board manufacturers who all enjoy knowing what’s new in science that scientists next need to model:

While the GPU technology is coming of age I’m working on the problems that do not require a supercomputer to solve. The chromosome banding program is one of them. It does not need to search for similarity like the BLAST algorithm does, this algorithm just has to most simply dissect and display the singlet, doublet, triplet, etc. contents of Fasta files with bitmap images that I then convert to png format. Even with Visual Basic 6 (that I still use because of its rapid development and easy to understand/code into other languages) it makes a complicated illustration in much less than a minute, which is fast enough where it only has to be properly compiled only once then it’s done.

I’m now stuck on basic terminology, but your suggestion has already been helpful. I will try to use “word” in the code comments (where instructions and how it works are first explained) to see how well it fits in with all else that the program has to include for terminology. There is still something peculiar about the word “Codon” though. It is short and still infers that it is a code unit, which makes and an excellent word to use, as though it was originally intended for any codon size but since biology (at least mostly) uses triplets it has lost its meaning. For example it has been speculatively theorized that two letter coding led to the current three in which case it still seems to be a codon, but there is again the problem of what to call a two letter code element. Since it’s my program I can call it anything I want. But I do can't go off on my own naming things. I need to find the most appropriate and would rather use a word that is precise but somehow losing its meaning, than one that right away causes confusion because of not being specific enough and suggests that two or 4+ letter codons cannot exist anywhere in the universe. It's like my instincts are telling me that "codon" might still be the most specific word to use, even though biology textbooks and dictionaries assume three letters. The word "word" is now competing with it, and I hope that something will soon turn up that makes it easy to decide which of the two I need to use.

Ah ha!Thinking it terms of words/letters sent Google Scholar straight for something in a 1982 paper that I did not know about, where in its Abstract it theorizes a two letter codon:

THE POSSIBLE ROLE OF ASSIGNMENT CATALYSTS IN THE ORIGIN OF THE GENETIC CODE

VAHE BEDIAN * Department of Biophysical Sciences, State University of New York at Buffalo, Buffalo, New York, 14214, U.S.A.

(Received 29 January, 1982; Accepted 5 April, 1982)

Abstract.

A model is presented for the emergence of a primitive genetic code through the selection of a family of proteins capable of executing the code and catalyzing their own formation from polynucleotide templates. These proteins are assignment catalysts capable of modulating the rate of incorporation of different amino acids at the position of different codons. The starting point of the model is a polynucleotide based polypeptide construction process which maintains colinearity between template and product, but may not maintain a coded relationship between amino acids and codons. Among the primitive proteins made are assumed to be assignment catalysts characterized by structural and functional parameters which are used to formulate the production kinetics of these catalysts from available templates. Application of the model to the simple case oftwo letter codon and amino acid alphabets has been analyzed in detail. As the structural, functional, and kinetic parameters are varied, the dynamics undergoes many bifurcations, allowing an initially ambiguous system of catalysts to evolve to a coded, self-reproductive system. The proposed selective pressure of this evolution is the efficiency of utilization of monomers and energy. The model also simulates the qualitative features of suppression, in which a deleterious mutation is partly corrected by the introduction of translational error.http://www.springerlink.com/content/r5vk622056mk8227/

Since the program is for origin of life science like this, I think I found what definitively settles this dilemma!

I'll morph your idea with that then show you what it looks like on the screen. It has a "Chromotype" listbox to select organism assembly and other controls that (by their function in the program) establish and operationally define the scientific terminology that is then in turn required by the theory. Only words that are absolutely needed end up remaining, resulting in a streamlined vocabulary that helps make all this science easier to figure out and use.

To make it as easy as possible for us to see all the variables and how they are formed I wrote a minimal code subroutine to save on disk (and in RAM) a list of 1 to 8 letter codons in Base4 or Base5. The lines of code to save to disk can later be removed, for use in the chromosome illustration program that currently only works for triplets and needs this added.

'=================================================================================================='Program to create diskfiles for Base4 (ACGT) and Base5 (ACGTN) Codons for checkbox selected Sizes.'The letter "N" represents an unknown ACTG letter, resulting in more codons of the same Codon Size.'By Gary S. Gaulin, 2012, as part of the Chromosome Illustrator project which will use this method.'==================================================================================================Option Explicit'Arrays for storing Codon string, its Complement and Reverse Complement strings.'This program does not need to save codons in RAM, it's included for other programs that do. Dim Codon() As String Dim CodonCompl() As String Dim CodonRevCompl() As String

Private Sub Form_Load()'After clicking on the compiled .exe file the program starts here.'On reaching End the form stays on screen waiting for something to be click.End Sub

Private Sub CreateCodonFile(MathBase As Long, CodonSize As Long) Dim MathBaseStr As String Dim MathBaseNumStr As String Dim Base10Str As String Dim FormatZeros As String Dim TotalCodons As Long Dim CodonNum As Long Dim LetterNum As Long Dim DigitVal As Long Dim PowerOf4or5 As Long Dim DigitOnes As Double Dim Space1 As String Dim Space2 As String