1
00:00:00 --> 00:00:04
So what I want to do today is recap
a little bit what we talked about
2
00:00:04 --> 00:00:08
last time, reiterate some of the
important points,
3
00:00:08 --> 00:00:13
and then show you how we can learn
something about microorganisms in
4
00:00:13 --> 00:00:17
the environment by talking about
in-situ identification of
5
00:00:17 --> 00:00:21
microorganisms as well as genomics.
We'll first talk about genomics and
6
00:00:21 --> 00:00:26
general and then talk about some
applications of genomics to
7
00:00:26 --> 00:00:30
environmental microbiology because I
think there is some of the most
8
00:00:30 --> 00:00:35
exciting new developments are in the
area, actually.
9
00:00:35 --> 00:00:44
So last time we talked about
molecular evolution and ecology.
10
00:00:44 --> 00:00:53
And just to recap, some of the main
points were that we can actually use
11
00:00:53 --> 00:01:02
genes or gene sequences for a couple
of very important questions that we
12
00:01:02 --> 00:01:15
want to explore.
The first one was gene sequences act
13
00:01:15 --> 00:01:33
as evolutionary chronometers.
14
00:01:33 --> 00:01:40
Now, what do I mean by that?
Basically what we said last time
15
00:01:40 --> 00:01:48
was that each gene,
each sequence in the genome
16
00:01:48 --> 00:01:55
accumulates mutations with a certain
probability. So what we mean is
17
00:01:55 --> 00:02:03
that all genes accumulate
mutations over time.
18
00:02:03 --> 00:02:07
Now, these of course are the
mutations that do not kill the
19
00:02:07 --> 00:02:12
organisms, so not the deleterious
mutations, but these are mutations
20
00:02:12 --> 00:02:16
that are either slightly deleterious,
or don't matter,
21
00:02:16 --> 00:02:21
or are beneficial mutations.
OK, and what this entails is that
22
00:02:21 --> 00:02:25
each gene accumulates mutation with
a certain probability over time.
23
00:02:25 --> 00:02:30
It basically means that two
organisms that come from species
24
00:02:30 --> 00:02:34
that are relatively closely related
to each other have gene sequences
25
00:02:34 --> 00:02:39
that will be much more similar to
each other than genes from an
26
00:02:39 --> 00:02:44
organism that comes from a species
that's much more distantly related.
27
00:02:44 --> 00:02:48
So, in practical terms what this
means is your genes are much,
28
00:02:48 --> 00:02:52
much more similar to those of a
monkey than they are to a crocodile,
29
00:02:52 --> 00:02:56
for example. And we can take
advantage of that by applying some
30
00:02:56 --> 00:03:00
algorithms, some mathematical
modeling essentially,
31
00:03:00 --> 00:03:04
to constrain these relationships in
those phylogenetic trees that we
32
00:03:04 --> 00:03:09
talked about last time.
And I also mentioned that the
33
00:03:09 --> 00:03:14
ribosomal RNA genes are particularly
important for that process.
34
00:03:14 --> 00:03:19
In principle, you could do it with
any protein coding machine or any
35
00:03:19 --> 00:03:24
kind of gene in the genome,
but we use the ribosomal RNA genes
36
00:03:24 --> 00:03:29
in particular because all organisms
have them. They're part of a
37
00:03:29 --> 00:03:34
handful of genes that are what we
called universally distributed
38
00:03:34 --> 00:03:38
last time.
And what this allows us to do is
39
00:03:38 --> 00:03:42
then construct phylogenetic
relationships for all living
40
00:03:42 --> 00:03:46
organisms. And I just want to
remind you of the tree of life that
41
00:03:46 --> 00:03:50
I showed you last time where we can
really explore the relationships
42
00:03:50 --> 00:03:54
amongst all living organisms.
And some of the important points
43
00:03:54 --> 00:03:58
there that we made were,
for example, that the tree of life
44
00:03:58 --> 00:04:02
supports the endosymbiont theory,
that when you actually look on the
45
00:04:02 --> 00:04:06
tree where the mitochondria and the
chloroplasts tree, they fall
46
00:04:06 --> 00:04:10
into the bacteria.
Now, there is a question where
47
00:04:10 --> 00:04:15
somebody asked in the online survey,
can the mitochondria and
48
00:04:15 --> 00:04:21
chloroplasts still live outside of
the eukaryotic cell?
49
00:04:21 --> 00:04:26
And the answer is no,
they can't anymore because over
50
00:04:26 --> 00:04:31
evolutionary time the two organisms
have become so integrated that the
51
00:04:31 --> 00:04:37
mitochondria and chloroplasts both
lost their ability to live outside
52
00:04:37 --> 00:04:42
of the eukaryotic host cell.
Another important point that we made
53
00:04:42 --> 00:04:46
last time that I want to reiterate
here is that gene sequences,
54
00:04:46 --> 00:04:51
when we go into the environment,
and obtain them directly from the
55
00:04:51 --> 00:04:55
environment act as a proxy for
microbial diversity in
56
00:04:55 --> 00:05:08
the environment.
So, the number of genes recovered
57
00:05:08 --> 00:05:30
directly from the environment is a
measure of diversity.
58
00:05:30 --> 00:05:34
And we said that this actually plays
a very, very important role in the
59
00:05:34 --> 00:05:39
analysis of microbial communities,
and I showed you the example here
60
00:05:39 --> 00:05:43
where we went and took some ocean
water and basically apply this
61
00:05:43 --> 00:05:48
technique that outlined last time
where we can actually amplify
62
00:05:48 --> 00:05:53
ribosomal RNA genes from
environmental samples,
63
00:05:53 --> 00:05:57
clone them, determine the sequence,
and then constructs phylogenetic
64
00:05:57 --> 00:06:01
trees.
And what you see here is a tree
65
00:06:01 --> 00:06:05
where we summarize the major groups
that we found in the sample have
66
00:06:05 --> 00:06:09
been only for two of those groups
where we show the entire set of
67
00:06:09 --> 00:06:13
sequences that we actually obtained
because there were so many of them
68
00:06:13 --> 00:06:16
out there. And what we basically
found was that over 1500 bacterial
69
00:06:16 --> 00:06:20
16S ribosomal RNA gene sequences
coexist in this environment.
70
00:06:20 --> 00:06:24
And what we said also last time was
that the analyses like these have
71
00:06:24 --> 00:06:28
really taught us that microorganisms
are the most diverse organisms
72
00:06:28 --> 00:06:32
on the planet.
So, most diversity is amongst the
73
00:06:32 --> 00:06:36
microorganisms,
and one of the big questions now is
74
00:06:36 --> 00:06:40
what are all those microorganisms
doing in the environment?
75
00:06:40 --> 00:06:44
And so, today what I want to do
with you is basically explore this
76
00:06:44 --> 00:06:48
question of how we can actually
figure out what those microorganisms
77
00:06:48 --> 00:06:52
are all doing in environmental
samples?
78
00:06:52 --> 00:07:09
So we can say we are exploring the
79
00:07:09 --> 00:07:21
function of microbes in the
environment. At first,
80
00:07:21 --> 00:07:33
I want to cover how we can actually
identify them in the environment.
81
00:07:33 --> 00:07:38
And I want to show you one specific
example, and then I want to talk
82
00:07:38 --> 00:07:44
about genomics in general,
and then basically end with an
83
00:07:44 --> 00:07:50
application of genomics to
environmental questions.
84
00:07:50 --> 00:07:56
So, let's first talk about the
in-situ identification
85
00:07:56 --> 00:08:14
of microorganisms.
86
00:08:14 --> 00:08:18
And the basic problem that I alluded
to already before is that most
87
00:08:18 --> 00:08:23
microbes are only known
[SOUND OFF/THEN ON]
88
00:08:23 --> 00:08:33
-- from 16S ribosomal
89
00:08:33 --> 00:08:48
RNA clone libraries.
90
00:08:48 --> 00:09:03
And we basically want to search and
identify them in the environment.
91
00:09:03 --> 00:09:07
OK, and I'll show you a specific
example of that later on.
92
00:09:07 --> 00:09:12
Now last time, we said that the
ribosomal RNA sequences consist
93
00:09:12 --> 00:09:17
really, like all gene sequences,
in fact. We identified several
94
00:09:17 --> 00:09:22
stretches of nucleotides,
types of stretches, that can be
95
00:09:22 --> 00:09:26
found. We said the A type stretches
and B type stretches that are very
96
00:09:26 --> 00:09:31
important for construction of
phylogenetic relationships,
97
00:09:31 --> 00:09:36
because we can align them and look
for changes in the nucleotide
98
00:09:36 --> 00:09:41
sequences because they are the same
length and only differ in mutation
99
00:09:41 --> 00:09:46
and single nucleotide
base pair changes.
100
00:09:46 --> 00:09:50
But then there's also those C type
stretches, if you remember,
101
00:09:50 --> 00:09:55
and those we said vary at much
faster rates because they are not
102
00:09:55 --> 00:10:00
functionally constrained
in those genes.
103
00:10:00 --> 00:10:08
OK, so they can actually also
accumulate length changes.
104
00:10:08 --> 00:10:17
And, it's these C type stretches
that we can use sort of as
105
00:10:17 --> 00:10:25
diagnostic sequence stretches for
microorganisms.
106
00:10:25 --> 00:10:34
So, what we can say is we identify
organisms by the C type stretches,
107
00:10:34 --> 00:10:47
C type sequence stretches.
And we call those signature
108
00:10:47 --> 00:11:03
sequences. OK,
and they allow the differentiation
109
00:11:03 --> 00:11:16
of closely related organisms,
-- because they vary at very fast
110
00:11:16 --> 00:11:24
rates between organisms.
And the way we do this is that we
111
00:11:24 --> 00:11:32
construct so-called phylogenetic
probes. I should probably
112
00:11:32 --> 00:11:38
write this over here.
Now what are those phylogenetic
113
00:11:38 --> 00:11:42
probes? They're basically short
pieces of DNA that have a
114
00:11:42 --> 00:11:46
fluorescent molecule
attached to them.
115
00:11:46 --> 00:12:03
-- DNA molecules that are roughly 20
116
00:12:03 --> 00:12:11
nucleotides in length,
and they carry a florescent molecule.
117
00:12:11 --> 00:12:19
Now what the short,
single-stranded stretches of DNA
118
00:12:19 --> 00:12:27
basically are is they are
complementary to those C type
119
00:12:27 --> 00:12:45
sequence stretches --
120
00:12:45 --> 00:12:52
-- in the ribosomal RNA.
And so basically what we can do is
121
00:12:52 --> 00:12:59
we can collect microbial cells from
the environment --
122
00:12:59 --> 00:13:17
-- make them permeable --
123
00:13:17 --> 00:13:33
-- and then basically mix them with
124
00:13:33 --> 00:13:48
those phylogenetic probes.
125
00:13:48 --> 00:13:52
And these probes will then permeate
into the cell and bind to their
126
00:13:52 --> 00:14:15
complementary sequences.
127
00:14:15 --> 00:14:17
Then we wash away the
unbound probe --
128
00:14:17 --> 00:14:33
-- and we can view it in a
129
00:14:33 --> 00:14:53
microscope under UV light.
130
00:14:53 --> 00:14:57
Let me show you an example of this.
What you see here is basically a
131
00:14:57 --> 00:15:01
light micrograph.
So this is what you see basically
132
00:15:01 --> 00:15:06
when you collect microbial cells
from the environment under the
133
00:15:06 --> 00:15:10
microscope. Most bacteria look the
same, so you cannot actually
134
00:15:10 --> 00:15:15
differentiate them all by just
looking at them.
135
00:15:15 --> 00:15:19
But then these cells were fixed and
permeabilized and then basically
136
00:15:19 --> 00:15:24
mixed with two different
phylogenetic probes that identified
137
00:15:24 --> 00:15:28
two different types of organisms.
One was labeled with a red Fluor,
138
00:15:28 --> 00:15:33
the other one with a green Fluor.
139
00:15:33 --> 00:15:42
And what you see is that you can now
differentiate those two organisms.
140
00:15:42 --> 00:15:52
Now, why is this especially
interesting? Well here's just a
141
00:15:52 --> 00:16:02
specific example where people were
looking for bacteria capable of
142
00:16:02 --> 00:16:12
nitrogen oxidation.
These are bacteria that are very
143
00:16:12 --> 00:16:22
important in, for example,
sewage treatment. And it was known
144
00:16:22 --> 00:16:32
that there were two different types
out there, one that oxidizes
145
00:16:32 --> 00:16:41
ammonia to nitrite,
-- and that a second one that
146
00:16:41 --> 00:16:47
oxidizes nitrite to nitrate.
And by doing this type of analysis
147
00:16:47 --> 00:16:53
what people basically learned is
that those two organisms live in
148
00:16:53 --> 00:17:00
very, very close proximity
at all times.
149
00:17:00 --> 00:17:04
So the organisms that oxidized
ammonia to nitrite are really
150
00:17:04 --> 00:17:08
attached, and oftentimes even
surround by the organisms that take
151
00:17:08 --> 00:17:12
the nitrite to nitrate.
So, where you have is a very close
152
00:17:12 --> 00:17:16
cooperation between two different
types of microorganisms,
153
00:17:16 --> 00:17:21
and the transfer of one of the
substrates that's a product of the
154
00:17:21 --> 00:17:25
metabolism of one of the organisms
to another one: so extremely
155
00:17:25 --> 00:17:29
efficient process that really is
very important to take into
156
00:17:29 --> 00:17:33
consideration when you want to
understand processes like sewer
157
00:17:33 --> 00:17:37
treatment, but also nitrogen
biogeochemistry and
158
00:17:37 --> 00:17:45
the environment.
Any questions?
159
00:17:45 --> 00:17:55
OK, so for the remainder of the
lecture I want to talk
160
00:17:55 --> 00:18:03
about genomics,
-- and then in particular also its
161
00:18:03 --> 00:18:07
application to questions of
environmental microbiology and
162
00:18:07 --> 00:18:12
environmental science.
So first, what I want to do is give
163
00:18:12 --> 00:18:16
you a little bit of the definition
of genomics, and then cover how it
164
00:18:16 --> 00:18:21
is actually possible that we can
sequence entire genomes,
165
00:18:21 --> 00:18:25
and I want to give you some
highlights of what we have found by
166
00:18:25 --> 00:18:30
comparing different genomes
to each other.
167
00:18:30 --> 00:18:36
And then I want to talk about this
field about environmental genomics
168
00:18:36 --> 00:18:43
where we can use genomic techniques
to actually learn something about
169
00:18:43 --> 00:18:49
the function of different uncultured
microorganisms in the environment.
170
00:18:49 --> 00:18:56
So first, our definition, it's
basically to interpret
171
00:18:56 --> 00:19:03
or to sequence,
-- interpret, and compare whole
172
00:19:03 --> 00:19:11
genomes. And as you will see the
comparison part actually plays an
173
00:19:11 --> 00:19:18
increasingly important role because
we have now actually genome
174
00:19:18 --> 00:19:26
sequences available from almost all,
or from at least some of the major
175
00:19:26 --> 00:19:32
groups of life.
So this, again,
176
00:19:32 --> 00:19:36
is a different kind of
representation of the tree of life.
177
00:19:36 --> 00:19:40
You have bacteria, archaea, and
eukarya again.
178
00:19:40 --> 00:19:44
And as you can see,
we have a lot of representatives.
179
00:19:44 --> 00:19:49
In fact, this doesn't even come
close to the diversity that we have
180
00:19:49 --> 00:19:53
now sequenced as well over a hundred
bacterial genome sequence now,
181
00:19:53 --> 00:19:57
several archeael genomes, and
increasingly also in
182
00:19:57 --> 00:20:02
eukaryotic genomes.
Now, genomes, so how is this done?
183
00:20:02 --> 00:20:08
How can we actually sequence
genomes? Well,
184
00:20:08 --> 00:20:13
on the face of it we use very large
facilities where you have sequencing
185
00:20:13 --> 00:20:19
machines present.
There's one very important one at
186
00:20:19 --> 00:20:24
MIT, actually at the Broad Institute,
and here you see all those really
187
00:20:24 --> 00:20:30
industrial scale production
lines actually.
188
00:20:30 --> 00:20:40
But the basic problem is that
genomes are large.
189
00:20:40 --> 00:20:50
E. coli, for example,
has roughly 4.4 million base pairs,
190
00:20:50 --> 00:21:00
and the human genome is even much,
much larger.
191
00:21:00 --> 00:21:08
It has about 3 billion base pairs.
OK, so genomes are very, very large.
192
00:21:08 --> 00:21:22
But a single sequencing reaction--
193
00:21:22 --> 00:21:29
-- gives you only roughly 500-1,
00 nucleotides or base pairs. So
194
00:21:29 --> 00:21:36
how is it that we can actually
sequence entire genomes?
195
00:21:36 --> 00:21:43
I'm going to walk you through this,
and there is some variation on the
196
00:21:43 --> 00:21:50
theme, but this is still a major
approach that's still used in some
197
00:21:50 --> 00:21:57
of the sequencing facilities.
Now, you start out by extracting
198
00:21:57 --> 00:22:04
genomic DNA from organisms,
and then you use restriction enzymes
199
00:22:04 --> 00:22:11
to cut the DNA into relatively large
pieces of DNA, so about 160
200
00:22:11 --> 00:22:17
kilobase pairs long.
On average, this is shown here.
201
00:22:17 --> 00:22:22
Kilo means a thousand, so 160,000
base pairs long.
202
00:22:22 --> 00:22:28
These pieces are then cloned into
specific cloning vectors that are
203
00:22:28 --> 00:22:43
called BAC vectors.
204
00:22:43 --> 00:23:01
So therefore, cloning large pieces
of DNA, and BAC stands for Bacterial
205
00:23:01 --> 00:23:17
Artificial Chromosome.
And what they basically are,
206
00:23:17 --> 00:23:31
are plasmids, very special plasmids
that can carry large pieces of
207
00:23:31 --> 00:23:40
genome, or large genome fragments.
So, by cloning into those BAC
208
00:23:40 --> 00:23:45
vectors, what you do is you
basically divide up the genome,
209
00:23:45 --> 00:23:50
and then the step number three is
mostly done for eukaryotic genomes
210
00:23:50 --> 00:23:55
because they are so much larger.
You can actually map and analyze
211
00:23:55 --> 00:24:00
the fragments,
and map them onto genome maps where
212
00:24:00 --> 00:24:05
you know the location of different
restriction fragments and different
213
00:24:05 --> 00:24:10
genes, actually.
For bacteria, this step is mostly
214
00:24:10 --> 00:24:15
skipped, actually.
What you do with each one of those
215
00:24:15 --> 00:24:20
BACs, is you cut them further up
into 1 kilobase per fragment,
216
00:24:20 --> 00:24:25
so much smaller fragments. And
these are called,
217
00:24:25 --> 00:24:30
and these are cloned then into
normal plasmid vectors.
218
00:24:30 --> 00:24:35
And so you generate what are called
shotgun clones.
219
00:24:35 --> 00:24:40
So, these are then cloned into E.
coli, you go through the same type
220
00:24:40 --> 00:24:45
of steps that we discussed before
already with environmental clone
221
00:24:45 --> 00:24:50
libraries. And you can actually
determine the sequence of each one
222
00:24:50 --> 00:24:55
of those pieces of DNA.
And what you will then get,
223
00:24:55 --> 00:25:00
is small fragments of overlapping
DNA sequences.
224
00:25:00 --> 00:25:04
That it shown here.
You'll find overlaps,
225
00:25:04 --> 00:25:09
basically, which piece together the
whole genome. And so,
226
00:25:09 --> 00:25:14
first to assemble, you piece
together these genome fragments that
227
00:25:14 --> 00:25:19
are present in the BACs,
and then finally you piece together
228
00:25:19 --> 00:25:24
the entire genome propose large
sequence pieces,
229
00:25:24 --> 00:25:29
and you get a so-called draft genome
sequence.
230
00:25:29 --> 00:25:39
The next step in this analysis,
then, is that you do so-called
231
00:25:39 --> 00:25:49
genome annotation is.
And the first very important step
232
00:25:49 --> 00:26:00
is that you translate the gene
sequences into amino acids.
233
00:26:00 --> 00:26:05
So, the nucleotide sequences into
amino acids particularly in
234
00:26:05 --> 00:26:10
prokaryotes. This step can
be done right away --
235
00:26:10 --> 00:26:31
-- and what this allows you to do,
236
00:26:31 --> 00:26:37
is you can look for what we call
open reading frames,
237
00:26:37 --> 00:26:44
or ORFs. And what you look for is a
start codon and a stop codon that
238
00:26:44 --> 00:26:50
basically branches or frames a
stretch of amino acids encoded by
239
00:26:50 --> 00:26:57
the nucleotides. So
you look for ORFs.
240
00:26:57 --> 00:27:13
And these are your putative genes.
241
00:27:13 --> 00:27:26
The next step that you can do,
242
00:27:26 --> 00:27:32
then, is you can go to databases and
now you compare your ORFs to
243
00:27:32 --> 00:27:39
information that is present in the
databases. So basically,
244
00:27:39 --> 00:27:45
you inquire the database and ask,
is a gene sequence that is similar
245
00:27:45 --> 00:27:52
to the one that I have statistically
significantly similar present that
246
00:27:52 --> 00:27:58
allows me to say something about the
function of this particular gene?
247
00:27:58 --> 00:28:05
So function, can then be identified
by comparison with databases.
248
00:28:05 --> 00:28:29
Any questions?
249
00:28:29 --> 00:28:34
OK, so that allows you,
then, to basically say something
250
00:28:34 --> 00:28:39
about the different genes that you
have found in the genome,
251
00:28:39 --> 00:28:44
but to give you an impression of how
new this field really is and how
252
00:28:44 --> 00:28:49
little we still know about the
diversity of genes and organisms,
253
00:28:49 --> 00:28:54
on average when we sequence a new
bacterial genome we find about 30%
254
00:28:54 --> 00:28:59
of the genes, or a third of the
genes have no known functional
255
00:28:59 --> 00:29:05
analog of the databases.
OK, so there's a lot to learn about
256
00:29:05 --> 00:29:11
the diversity of life and about the
functional diversity of life.
257
00:29:11 --> 00:29:18
In eukaryotes, there are some
little twists,
258
00:29:18 --> 00:29:24
as you all know.
And basically, that is that genes
259
00:29:24 --> 00:29:31
of course consist of introns
and exons, right?
260
00:29:31 --> 00:29:35
And so it's basically relatively
difficult to directly identify those
261
00:29:35 --> 00:29:40
open reading frames.
And what you have to do is that you
262
00:29:40 --> 00:29:45
have to actually oftentimes,
so let's write this down.
263
00:29:45 --> 00:30:01
And what people oftentimes do,
264
00:30:01 --> 00:30:12
then, is that they search for
matching sequences in so-called cDNA
265
00:30:12 --> 00:30:24
libraries. Now what are cDNA
libraries? Let me just show you
266
00:30:24 --> 00:30:32
this on the next slide. Skip this.
Basically what you can do is you can
267
00:30:32 --> 00:30:38
isolate messenger RNA from cells and
that translate the messenger RNA by
268
00:30:38 --> 00:30:43
a process called reverse
transcription that the viral enzyme
269
00:30:43 --> 00:30:49
that translates RNA into DNA,
so you can translate it into DNA
270
00:30:49 --> 00:30:54
fragments. And you can then clone
those DNA fragments into plasmids,
271
00:30:54 --> 00:31:00
sequence those, and then basically
see what are the pieces that are
272
00:31:00 --> 00:31:06
actually, what are the
introns in the genes?
273
00:31:06 --> 00:31:11
What are the pieces that are excised
when the messenger RNA is actually
274
00:31:11 --> 00:31:29
created from the genome?
275
00:31:29 --> 00:31:33
And so, let me just cover now a few
of the major insights that people
276
00:31:33 --> 00:31:37
have come up with.
Of course, it's a very growing
277
00:31:37 --> 00:31:41
field and a lot of excitement
is coming out.
278
00:31:41 --> 00:31:58
And I first want to talk about
279
00:31:58 --> 00:32:09
bacteria and archaea --
280
00:32:09 --> 00:32:13
-- and then say a few words also
about eukaryotes or eukaryote.
281
00:32:13 --> 00:32:17
First of all, what we learned about,
bacteria and archaea,
282
00:32:17 --> 00:32:21
is that their genomes are very
compact.
283
00:32:21 --> 00:32:35
Whenever they have pieces of DNA
284
00:32:35 --> 00:32:43
that are not frequently used,
they're actually lost from the
285
00:32:43 --> 00:32:51
genome. OK, so they lose genes,
I should say, relatively easily, and
286
00:32:51 --> 00:32:59
we can see this that the genome size
is correlated to metabolic
287
00:32:59 --> 00:33:12
diversity.
288
00:33:12 --> 00:33:23
So, for example,
we have Mycoplasma genetalium and
289
00:33:23 --> 00:33:37
Streptomyces --
290
00:33:37 --> 00:33:42
coelicor are two very different
bacteria. The first one is an
291
00:33:42 --> 00:34:01
obligate intracellular parasite.
292
00:34:01 --> 00:34:08
OK, so, which means it's actually
bathed in a nutrient solution in the
293
00:34:08 --> 00:34:16
eukaryotic cells that it invades.
It doesn't have to make amino acids.
294
00:34:16 --> 00:34:23
It gets it just from the host cell.
And it turns out it has a very
295
00:34:23 --> 00:34:31
small genome, so only 0.
8-based mega-base pairs, so 580,
296
00:34:31 --> 00:34:37
00 base pairs, and only 517 genes.
And interestingly,
297
00:34:37 --> 00:34:41
actually people are now using this
organism to try and ask,
298
00:34:41 --> 00:34:46
well, what's the minimum number of
genes that organism can actually
299
00:34:46 --> 00:34:50
will live with?
And so, they are deleting in a
300
00:34:50 --> 00:34:55
stepwise fashion the different genes
in this organism,
301
00:34:55 --> 00:34:59
and it turns out that you need about
two to 300 genes minimum in order to
302
00:34:59 --> 00:35:03
make the things survive.
On the other hand,
303
00:35:03 --> 00:35:15
streptomyces is a soil bacterium --
304
00:35:15 --> 00:35:20
-- has a very complex lifestyle,
can degrade a lot of environmental
305
00:35:20 --> 00:35:26
substrates, and it has a very big
genome, one of the biggest bacterial
306
00:35:26 --> 00:35:31
genomes. And so,
those two organisms basically span
307
00:35:31 --> 00:35:37
pretty much the range of
bacterial genome sizes.
308
00:35:37 --> 00:35:41
And so, it's thought that it has
about 7,846 genes.
309
00:35:41 --> 00:35:57
Now, we also have a very large
310
00:35:57 --> 00:36:09
genetic diversity --
311
00:36:09 --> 00:36:23
-- between species.
And typically what you find is that
312
00:36:23 --> 00:36:38
roughly 15 to 30% of genes are
unique to a specific species.
313
00:36:38 --> 00:36:44
And that's really because bacteria
and archaea have the capability to
314
00:36:44 --> 00:36:50
affect a lot of chemical reactions
that eukaryotes,
315
00:36:50 --> 00:36:56
for example, cannot.
There's about 20 million known
316
00:36:56 --> 00:37:02
organic substances,
organic chemicals, and almost all of
317
00:37:02 --> 00:37:07
them are biodegradable by bacteria.
Even the minutest compounds if it
318
00:37:07 --> 00:37:12
were not biodegradable bacteria,
would build up in the environment,
319
00:37:12 --> 00:37:16
OK? So, if it just where a cofactor
that some organism produces because
320
00:37:16 --> 00:37:21
we have such a long period of time
of evolution on this planet and
321
00:37:21 --> 00:37:26
evolutionary history,
you probably would be able to dig it
322
00:37:26 --> 00:37:32
up in your backyard.
One of the other very important and
323
00:37:32 --> 00:37:39
interesting insights that has come
out with comparing genomes for
324
00:37:39 --> 00:37:46
microorganisms is that lateral gene
transfer is a very important process
325
00:37:46 --> 00:38:07
amongst microorganisms.
326
00:38:07 --> 00:38:11
Now what do I mean by lateral gene
transfer? It basically means that
327
00:38:11 --> 00:38:16
we find evidence among bacterial
genomes that they have actually
328
00:38:16 --> 00:38:20
taken genes from completely
unrelated organisms.
329
00:38:20 --> 00:38:25
And I just want to show you one
example here from that of
330
00:38:25 --> 00:38:38
thermotoga maritima --
331
00:38:38 --> 00:38:47
-- which lives in hot springs.
This is a very interesting
332
00:38:47 --> 00:38:56
bacterium that lives in hot water of
around 80°C and thrives only in
333
00:38:56 --> 00:39:05
those kinds of environments.
And they coexist there with many
334
00:39:05 --> 00:39:14
archaea. And when people sequenced
the genome of thermotoga maritima
335
00:39:14 --> 00:39:23
what they found was that about 25%
of the genes have their closest
336
00:39:23 --> 00:39:32
relatives in archaeal genomes.
So roughly 25% of genes in
337
00:39:32 --> 00:39:39
thermotoga are of archaeal origin.
And how can we actually figure
338
00:39:39 --> 00:39:44
something like that out?
Well, the most important technique
339
00:39:44 --> 00:39:49
is, again, phylogenetic tree
construction. And so when you have,
340
00:39:49 --> 00:39:54
for example, gene A, well let me
draw this, actually,
341
00:39:54 --> 00:40:10
on a new board.
342
00:40:10 --> 00:40:15
So you're comparing,
say, three organisms,
343
00:40:15 --> 00:40:21
organism A, B, and C and you compare
gene one with gene two.
344
00:40:21 --> 00:40:27
And you notice that most genes
adhere to this pattern,
345
00:40:27 --> 00:40:33
but that every now and then there's
a gene that gives you this
346
00:40:33 --> 00:40:38
type of pattern.
What you can then conclude is that
347
00:40:38 --> 00:40:43
this gene, C, has not coevolved with
the other genes in the genome of
348
00:40:43 --> 00:40:48
these organisms but was actually
transferred into it from another
349
00:40:48 --> 00:40:53
source. And I don't have time to go
actually into the mechanisms.
350
00:40:53 --> 00:40:58
If you're interested, I teach a
graduate class that undergraduates
351
00:40:58 --> 00:41:03
actually take in our department,
environmental microbiology, where we
352
00:41:03 --> 00:41:08
discussed a lot of the mechanisms.
It's basically a lot of viruses can
353
00:41:08 --> 00:41:13
affect gene transfer but also
plasmids and transposons.
354
00:41:13 --> 00:41:18
But for bacteria, again, you should
remember that often new function is
355
00:41:18 --> 00:41:23
actually oftentimes arises by
lateral gene transfer.
356
00:41:23 --> 00:41:28
And one of the interesting things
is that lateral gene transfer is
357
00:41:28 --> 00:41:34
actually very important in the
evolution of pathogenic bacteria.
358
00:41:34 --> 00:41:48
So, the so-called virulence genes,
359
00:41:48 --> 00:41:57
which are the genes that basically
affect pathogenesis. Do
360
00:41:57 --> 00:42:13
you have a question?
Among pathogenic bacteria,
361
00:42:13 --> 00:42:35
often arise by lateral gene transfer.
OK. Any questions?
362
00:42:35 --> 00:42:43
OK, now for eukarya,
I just want to make the point that
363
00:42:43 --> 00:42:52
their genomes are generally orders
of magnitudes larger --
364
00:42:52 --> 00:43:16
OK, and that the exons,
365
00:43:16 --> 00:43:23
so the stretches that really encode
the protein that make up the
366
00:43:23 --> 00:43:30
organism, the exons are only
typically a few percent
367
00:43:30 --> 00:43:37
of the genome.
That's particularly in higher
368
00:43:37 --> 00:43:44
eukaryotes. Yeasts,
for example, have a much more
369
00:43:44 --> 00:43:50
compact genome also.
We, for example, are full of DNA
370
00:43:50 --> 00:43:57
that people still have a very hard
time figuring out what that actually
371
00:43:57 --> 00:44:04
does. But it seems that the
majority of the genome,
372
00:44:04 --> 00:44:10
so-called repeated sequences --
-- many of which seems to be ancient
373
00:44:10 --> 00:44:15
retroviruses that have inserted
themselves into the genome and have
374
00:44:15 --> 00:44:21
since then lost actually their
function. OK,
375
00:44:21 --> 00:44:26
so the remaining time I want to just
give you an example of how we can
376
00:44:26 --> 00:44:31
now use these techniques that I
outlined before to learn something
377
00:44:31 --> 00:44:37
about microorganisms
in the environment.
378
00:44:37 --> 00:44:47
It's called environmental.
379
00:44:47 --> 00:45:04
Basically, the way this all started
380
00:45:04 --> 00:45:08
was by going into the environment
and extracting nuclear gases and
381
00:45:08 --> 00:45:13
treating them exactly the same way
as if you had a single genome.
382
00:45:13 --> 00:45:18
But, again, remember, we have a
very large mixture of microorganisms
383
00:45:18 --> 00:45:22
present in the environment.
And where this is mostly done was
384
00:45:22 --> 00:45:27
in the ocean, actually.
And what people did, was they
385
00:45:27 --> 00:45:32
constructed those BAC clones
directly from the environment and
386
00:45:32 --> 00:45:36
then looked amongst those BAC clones
for specific 16S ribosomal
387
00:45:36 --> 00:45:41
RNA genes.
Remember, this is the marker that we
388
00:45:41 --> 00:45:45
have for microorganisms in the
environment. We know the diversity
389
00:45:45 --> 00:45:49
of microorganisms through those
types of genes,
390
00:45:49 --> 00:45:53
and we have a lot of the data
available. And so,
391
00:45:53 --> 00:45:57
in order to link a specific function
of such an organism that we only
392
00:45:57 --> 00:46:01
know from the 16S ribosomal
RNA genes.
393
00:46:01 --> 00:46:06
So, to ask the question of what much
of this organism might be carrying
394
00:46:06 --> 00:46:12
out in the environment,
it's very useful to sequence BAC
395
00:46:12 --> 00:46:17
clones that have 16S ribosomal RNA
genes on them,
396
00:46:17 --> 00:46:23
and determine what kinds of protein
coding genes are on there that might
397
00:46:23 --> 00:46:28
reveal some of the function of the
organism in the environment.
398
00:46:28 --> 00:46:34
And one example that I want to show
you is that of the proteorhodopsin.
399
00:46:34 --> 00:46:45
So basically, the initial task was
to sequence BAC clones containing
400
00:46:45 --> 00:46:57
ribosomal RNA genes,
and look for other genes that might
401
00:46:57 --> 00:47:15
reveal some of the function.
402
00:47:15 --> 00:47:18
So, you don't want to look for all
the genes that encode proteins that
403
00:47:18 --> 00:47:22
are important to the cell cycle and
things like that,
404
00:47:22 --> 00:47:25
but really sort of metabolic genes
that might tell you something about
405
00:47:25 --> 00:47:29
the type of metabolism that this
organism carries out
406
00:47:29 --> 00:47:33
in the environment.
And so, what the first example that
407
00:47:33 --> 00:47:39
turned out to be really,
really important is that people
408
00:47:39 --> 00:47:44
found rhodopsin genes on one of
those BAC fragments,
409
00:47:44 --> 00:47:50
and it turns out this rhodopsin
catalyzes or these rhodopsin genes
410
00:47:50 --> 00:47:55
produce a protein that inserts
itself into the bacterial membrane,
411
00:47:55 --> 00:48:01
and it's a photoreceptor that when
it's hit by light,
412
00:48:01 --> 00:48:06
it actually becomes a proton pump.
So, it expels protons from the cell
413
00:48:06 --> 00:48:11
interior to the outside,
and you already know that this is
414
00:48:11 --> 00:48:16
important in energy generation in
all living cells.
415
00:48:16 --> 00:48:20
So proton gradient across membranes
basically give the cells sort of a
416
00:48:20 --> 00:48:25
battery status that can be exploited
by ATPase molecules or ATPase
417
00:48:25 --> 00:48:30
proteins that equalize the proton
gradient and affect ATP
418
00:48:30 --> 00:48:35
synthesis in doing so.
Now, why is this so important?
419
00:48:35 --> 00:48:40
Well, it turned out that this type
of protein is present in almost all
420
00:48:40 --> 00:48:45
microbial cells that were previously
thought to be heterotrophs alone in
421
00:48:45 --> 00:48:49
the ocean in the parts of the ocean
that receive enough life.
422
00:48:49 --> 00:48:54
And what this means is that our
estimates of the global carbon
423
00:48:54 --> 00:48:59
budget of the ocean were basically
wrong because most microorganisms in
424
00:48:59 --> 00:49:12
the ocean have this.
425
00:49:12 --> 00:49:31
So most prokaryotes in the ocean
have a light-driven proton pump
426
00:49:31 --> 00:49:52
which is called proteorhodopsin.
And it basically allows them to gain
427
00:49:52 --> 00:50:06
energy from sunlight.
And there's an increasing number of
428
00:50:06 --> 00:50:12
such examples now where we are
learning to interpret environmental
429
00:50:12 --> 00:50:18
communities, and the function of
environmental microbial communities
430
00:50:18 --> 00:50:23
through those genomic approaches.
And it reveals basically an
431
00:50:23 --> 00:50:29
enormous diversity of organisms out
there. And what we also are
432
00:50:29 --> 00:50:35
learning to do now is to assemble
entire genomes from those samples by
433
00:50:35 --> 00:50:40
applying genomic techniques.
And this is an example here where
434
00:50:40 --> 00:50:44
you see, this was published last
year, where people went out and
435
00:50:44 --> 00:50:49
basically were able to piece
together from pieces of genes
436
00:50:49 --> 00:50:53
obtained from the environment,
entire genomes or fragments of
437
00:50:53 --> 00:50:58
entire genomes.
And that's shown here.
438
00:50:58 --> 00:51:02
Those are contiguous sequences.
OK, so if you have any questions
439
00:51:02 --> 00:51:07
let me know by e-mail,
or if you're interested in pursuing
440
00:51:07 --> 00:51:11
this further I also teach another
class in civil and environmental
441
00:51:11 --> 00:51:14
engineering.