JAB5 at VAXB.YORK.AC.UK writes:
>Dear Colleagues,
>I,m trying to collate DNA data from a particular bacterium.
>When calculating GC content, is it wise to take only coding
>sequence??(There is a marked codon usage bias). It would
>seem that taking a large amount of flanking DNA could unduly
>bias the numbers, eg alternating py tracts, terminator seqs.
>etc...
>I realise that gross figures for the whole genome are
>sometimes quoted (from physical methods), but what is the consensus
> of deriving the number from sequence data- surely the constraints
>are only selected for in the coding regions? Can one thus include
>non-translated RNAs in the analysis?
>Any opinions welcomed
>Best wishes,
>Jim Brannigan
I would guess that the answers to your questions depend on
what question you are really asking. Put more bluntly, why would
anyone care what the GC content is? Most likely what you wish to
do is to look at several categories (coding, non-coding, 3rd codon
position) and see if they disagree with each other, in the hopes
of finding an anomaly.
Keith Robison
Harvard University
Department of Cellular and Developmental Biology
Department of Genetics / HHMI
robison at biosun.harvard.edu