Abstract:

In Caenorhabditis elegans, lin-4 and let-7 enclode 22- and 21 -nucleotide
RNAs, respectively, that function as key regulators of developmental
timing. Because the appearance of these short RNAs is regulated during
development, they are also referred to as "small temporal RNAs" (stRNAs).
We show that many more 21- and 22-nt expressed RNAs, termed microRNAs,
(miRNAs), exist in invertebrates and vertebrates, and that some of these
novel RNAs, similar to let-7 stRAN, are also highly conserved. This
suggests that sequence-specific post-transcriptional regulatory
mechanisms mediated by small RNAs are more general than previously
appreciated.

Claims:

1. An isolated nucleic acid molecule having a nucleotide sequence selected
from the group consisting of:(a) a nucleotide sequence as shown in SEQ ID
NO: 199 or SEQ ID NO: 393;(b) a nucleotide sequence which is the
complement of (a);(c) a nucleotide sequence consisting of 18 to 25
nucleotides which has an identity of at least 80% to SEQ ID NO: 199 or
the complement thereof; and(d) a nucleotide sequence consisting of 60-80
nucleotides which has an identity of at least 80% to SEQ ID NO: 393 or
the complement thereof.

2. The nucleic acid molecule of claim 1, wherein the identity of sequence
(c) is at least 90%.

3. The nucleic acid molecule of claim 1, wherein the identity of sequence
(c) is at least 95%.

4. The nucleic acid molecule of claim 1 which is a miRNA precursor
molecule having the nucleobase sequence as shown in SEQ ID NO: 393, or a
DNA molecule coding therefor.

5. The nucleic acid molecule of claim 1, which is single-stranded.

6. The nucleic acid molecule of claim 1, which is at least partially
double-stranded.

7. The nucleic acid molecule of claim 1, which is selected from RNA, DNA
or nucleic acid analog molecules.

8. The nucleic acid molecule of claim 7, which is a molecule containing at
least one modified nucleotide analog.

9. A composition comprising at least one nucleic acid molecule of claim 1
and a pharmaceutically acceptable carrier.

10. The composition of claim 9 wherein said pharmaceutically acceptable
carrier is suitable for diagnostic applications.

11. The composition of claim 9 wherein said pharmaceutically acceptable
carrier is suitable for therapeutic applications.

12. The composition of claim 9 as a marker or modulator of developmental
disorders.

13. The composition of claim 9 as a marker or modulator of gene
expression.

14. The nucleic acid molecule of claim 1, wherein the identity of sequence
(c) is 100%.

Description:

[0001]This Application is a divisional of U.S. Ser. No. 11/747,409 filed
May 11, 2007, which is a divisional of U.S. Pat. No. 7,232,806 issued
Jun. 19, 2007, which is a 371 of International Application
PCT/EP2002/10881 filed Sep. 27, 2002, the disclosure of which is
incorporated herein in its entirety by reference.

[0003]In Caenorhabditis elegans, lin-4 and let-7 encode 22- and
21-nucleotide RNAs, respectively (1, 2), that function as key regulators
of developmental timing (3-5). Because the appearance of these short RNAs
is regulated during development, they are also referred to as "microRNAs"
(miRNAs) or small temporal RNAs (stRNAs) (6). lin-4 and let-21 are the
only known miRNAs to date.

[0005]We show that many more short, particularly 21- and 22-nt expressed
RNAs, termed microRNAs (miRNAs), exist in invertebrates and vertebrates,
and that some of these novel RNAs, similar to let-7 RNA (6), are also
highly conserved. This suggests that sequence-specific
post-transcriptional regulatory mechanisms mediated by small RNAs are
more general than previously appreciated.

[0006]The present invention relates to an isolated nucleic acid molecule
comprising: [0007](a) a nucleotide sequence as shown in Table 1, Table
2, Table 3 or Table 4 [0008](b) a nucleotide sequence which is the
complement of (a), [0009](c) a nucleotide sequence which has an identity
of at least 80%, preferably of at least 90% and more preferably of at
least 99%, to a sequence of (a) or (b) and/or [0010](d) a nucleotide
sequence which hybridizes under stringent conditions to a sequence of
(a), (b) and/or (c).

[0012]Preferably the identity of sequence (c) to a sequence of (a) or (b)
is at least 90%, more preferably at least 95%. The determination of
identity (percent) may be carried out as follows:

I=n:L

wherein I is the identity in percent, n is the number of identical
nucleotides between a given sequence and a comparative sequence as shown
in Table 1, Table 2, Table 3 or Table 4 and L is the length of the
comparative sequence. It should be noted that the nucleotides A, C, G and
U as depicted in Tables 1, 2, 3 and 4 may denote ribonucleotides,
deoxyribonucleotides and/or other nucleotide analogs, e.g. synthetic
non-naturally occurring nucleotide analogs. Further nucleobases may be
substituted by corresponding nucleobases capable of forming analogous
H-bonds to a complementary nucleic acid sequence, e.g. U may be s
substituted by T.

[0013]Further, the invention encompasses nucleotide sequences which
hybridize under stringent conditions with the nucleotide sequence as
shown in Table 1, Table 2, Table 3 or Table 4, a complementary sequence
thereof or a highly identical sequence. Stringent hybridization
conditions comprise washing for 1 h in 1×SSC and 0.1% SOS at
45° C., preferably at 48° C. and more preferably at
50° C., particularly for 1 h in 0.2×SSC and 0.1% SDS.

[0014]The isolated nucleic acid molecules of the invention preferably have
a length of from 18 to 100 nucleotides, and more preferably from 18 to 80
nucleotides. It should be noted that mature miRNAs usually have a length
of 19-24 nucleotides, particularly 21, 22 or 23 nucleotides. The miRNAs,
however, may be also provided as a precursor which usually has a length
of 50-90 nucleotides, particularly 60-80 nucleotides. It should be noted
that the precursor may be produced by processing of a primary transcript
which may have a length of >100 nucleotides.

[0015]The nucleic acid molecules may be present in single-stranded or
double-stranded form. The miRNA as such is usually a single-stranded
molecule, while the mi-precursor is usually an at least partially
self-complementary molecule capable of forming double-stranded portions,
e.g. stem- and loop-structures. DNA molecules encoding the miRNA and
miRNA precursor molecules. The nucleic acids may be selected from RNA,
DNA or nucleic acid analog molecules, such as sugar- or backbone-modified
ribonucleotides or deoxyribonucleotides. It should be noted, however,
that other nucleic analogs, such as peptide nucleic acids (PNA) or locked
nucleic acids (LNA), are also suitable.

[0016]In an embodiment of the invention the nucleic acid molecule is an
RNA- or DNA molecule, which contains at least one modified nucleotide
analog, i.e. a naturally occurring ribonucleotide or deoxyribonucleotide
is substituted by a non-naturally occurring nucleotide. The modified
nucleotide analog may be located for example at the 5'-end and/or the
3'-end of the nucleic acid molecule.

[0017]Preferred nucleotide analogs are selected from sugar- or
backbone-modified ribonucleotides. It should be noted, however, that also
nucleobase-ribonucleotides, i.e. ribonucleotides, containing a
non-naturally occurring nucleobase instead of a naturally occurring
nucleobase such as uridines or cytidines modified at the 5-position, e.g.
5-(2-amino)propyl uridine, 5-bromo uridine; adenosines and guanosines
modified at the 8-position, e.g. 8-bromo guanosine; deaza nucleotides,
e.g. 7-deaza-adenosine; O- and N-alkylated nucleotides, e.g. N6-methyl
adenosine are suitable. In preferred sugar-modified ribonucleotides the
2'-OH-group is replaced by a group selected from H, OR, R, halo, SH, SR,
NH2, NHR, NR2 or CN, wherein R is C1-C6 alkyl,
alkenyl or alkynyl and halo is F, Cl, Br or I. In preferred
backbone-modified ribonucleotides the phosphoester group connecting to
adjacent ribonucleotides is replaced by a modified group, e.g. of
phosphothioate group. It should be noted that the above modifications may
be combined.

[0018]The nucleic acid molecules of the invention may be obtained by
chemical synthesis methods or by recombinant methods, e.g. by enzymatic
transcription from synthetic DNA-templates or from DNA-plasmids isolated
from recombinant organisms. Typically phage RNA-polymerases are used for
transcription, such as T7, T3 or SP6 RNA-polymerases.

[0019]The invention also relates to a recombinant expression vector
comprising a recombinant nucleic acid operatively linked to an expression
control sequence, wherein expression, i.e. transcription and optionally
further processing results in a miRNA-molecule or miRNA precursor
molecule as described above. The vector is preferably a DNA-vector, e.g.
a viral vector or a plasmid, particularly an expression vector suitable
for nucleic acid expression in eukaryotic, more particularly mammalian
cells. The recombinant nucleic acid contained in said vector may be a
sequence which results in the transcription of the miRNA-molecule as
such, a precursor or a primary transcript thereof, which may be further
processed to give the miRNA-molecule.

[0020]Further, the invention relates to diagnostic or therapeutic
applications of the claimed nucleic acid molecules. For example, miRNAs
may be detected in biological samples, e.g. in tissue sections, in order
to determine and classify certain cell types or tissue types or
miRNA-associated pathogenic disorders which are characterized by
differential expression of miRNA-molecules or miRNA-molecule patterns.
Further, the developmental stage of cells may be classified by
determining temporarily expressed miRNA-molecules.

[0021]Further, the claimed nucleic acid molecules are suitable for
therapeutic applications. For example, the nucleic acid molecules may be
used as modulators or targets of developmental processes or disorders
associated with developmental dysfunctions, such as cancer. For example,
miR-15 and miR-16 probably function as tumor-suppressors and thus
expression or delivery of these RNAs or analogs or precursors thereof to
tumor cells may provide therapeutic efficacy, particularly against
leukemias, such as B-cell chronic lymphocytic leukemia (B-CLL). Further,
miR-10 is a possible regulator of the translation of Hox Genes,
particularly Hox 3 and Hox 4 (or Scr and Dfd in Drosophila).

[0022]In general, the claimed nucleic acid molecules may be used as a
modulator of the expression of genes which are at least partially
complementary to said nucleic acid. Further, miRNA molecules may act as
target for therapeutic screening procedures, e.g. inhibition or
activation of miRNA molecules might modulate a cellular differentiation
process, e.g. apoptosis.

[0023]Furthermore, existing miRNA molecules may be used as starting
materials for the manufacture of sequence-modified miRNA molecules, in
order to modify the target-specificity thereof, e.g. an oncogene, a
multidrug-resistance gene or another therapeutic target gene. The novel
engineered miRNA molecules preferably have an identity of at least 80% to
the starting miRNA, e.g. as depicted in Tables 1, 2, 3 and 4. Further,
miRNA molecules can be modified, in order that they are symetrically
processed and then generated as double-stranded siRNAs which are again
directed against therapeutically relevant targets.

[0024]Furthermore, miRNA molecules may be used for tissue reprogramming
procedures, e.g. a differentiated cell line might be transformed by
expression of miRNA molecules into a different cell type or a stem cell.

[0025]For diagnostic or therapeutic applications, the claimed RNA
molecules are preferably provided as a pharmaceutical composition. This
pharmaceutical composition comprises as an active agent at least one
nucleic acid molecule as described above and optionally a
pharmaceutically acceptable carrier.

[0026]The administration of the pharmaceutical composition may be carried
out by known methods, wherein a nucleic acid is introduced into a desired
target cell in vitro or in vivo.

[0027]Commonly used gene transfer techniques include calcium phosphate,
DEAE-dextran, electroporation and microinjection and viral methods [30,
31, 32, 33, 34]. A recent addition to this arsenal of techniques for the
introduction of DNA into cells is the use of cationic liposomes [35].

[0029]The composition may be in form of a solution, e.g. an injectable
solution, a cream, ointment, tablet, suspension or the like. The
composition may be administered in any suitable way, e.g. by injection,
by oral, topical, nasal, rectal application etc. The carrier may be any
suitable pharmaceutical carrier. Preferably, a carrier is used, which is
capable of increasing the efficacy of the RNA molecules to enter the
target-cells. Suitable examples of such carriers are liposoms,
particularly cationic liposomes.

[0030]Further, the invention relates to a method of identifying novel
microRNA-molecules and precursors thereof, in eukaryotes, particularly in
vertebrates and more particularly in mammals, such as humans or mice.
This method comprises: ligating 5'- and 3'-adapter-molecules to the end
of a size-fractionated RNA-population, reverse transcribing said
adapter-ligated RNA-population, and characterizing said reverse
transcribed RNA-molecules, e.g. by amplification, concatamerization,
cloning and sequencing.

[0031]A method as described above already has been described in (8),
however, for the identification of siRNA molecules. Surprisingly, it was
found now that the method is also suitable for identifying the miRNA
molecules or precursors thereof as claimed in the present application.

[0032]Further, it should be noted that as 3'-adaptor for derivatization of
the 3'-OH group not only 4-hydroxymethylbenzyl but other types of
derivatization groups, such as alkyl, alkyl amino, ethylene glycol or
3'-deoxy groups are suitable.

[0033]Further, the invention shall be explained in more detail by the
following Figures and Examples:

FIGURE LEGENDS

[0034]FIG. 1A. Expression of D. melanogaster miRNAs. Northern blots of
total RNA isolated from staged populations of D. melanogaster were probed
for the indicated miRNAs. The position of 76-nt val-tRNA is also
indicated on the blots. 5S rRNA serves as loading control. E, embryo; L,
larval stage; P, pupae; A, adult; S2, Schneider-2 cells. It should be
pointed out, that S2 cells are polyclonal, derived from an unknown subset
of embryonid tissues, and may have also lost some features of their
tissue of origin while maintained in culture. miR-3 to miR-6 RNAs were
not detectable in S2 cells (data not shown). miR-14 was not detected by
Northern blotting and may be very weakly expressed, which is consistent
with its cloning frequency. Similar miRNA sequences are difficult to
distinguish by Northern blotting because of potential cross-hybridization
of probes.

[0035]FIG. 1B: Expression of vertebrate miRNAs. Northern blots of total
RNA isolated from HeLa cells, mouse kidneys, adult zebrafish, frog
ovaries, and S2 cells were probed for the indicated miRNAs. The position
of 76-nt val-tRNA is also indicated on the blots. 5S rRNA from the
preparations of total RNA from the indicated species is also shown. The
gels used for probing of miR-18, miR-19a, miR-30, and miR-31 were not run
as far as the other gels (see tRNA marker position). miR-32 and miR-33
were not detected by Northern blotting, which is consistent with their
low cloning frequency. Oligodeoxynucleotides used as Northern probes
were:

[0036]FIG. 2. Genomic organization of miRNA gene clusters. The precursor
structure is indicated as box and the location of the miRNA within the
precursor is shown in gray; the chromosomal location is also indicated to
the right. (A) D. melanogaster miRNA gene clusters. (B) Human miRNA gene
clusters. The cluster of let-7a-1 and let-7f-1 is separated by 26500 nt
from a copy of let-7d on chromosome 9 and 17. A cluster of let-7a-3 and
let-7b, separated by 938 nt on chromosome 22, is not illustrated.

[0037]FIG. 3. Predicted precursor structures of D. melanogaster miRNAs.
RNA secondary structure prediction was performed using mfold version 3.1
[28] and manually refined to accommodate G/U wobble base pairs in the
helical segments. The miRNA sequence is underlined. The actual size of
the stem-loop structure is not known experimentally and may be slightly
shorter or longer than represented. Multicopy miRNAs and their
corresponding precursor structures are also shown.

[0041]FIG. 7. Predicted precursor structures of miRNAs, sequence accession
numbers and homology information. RNA secondary structure prediction was
performed using mfold version 3.1 and manually refined to accommodate G/U
wobble base pairs in the helical segments. Dashes were inserted into the
secondary structure presentation when asymmetrically bulged nucleotides
had to be accommodated. The excised miRNA sequence is underlined. The
actual size of the stem-loop structure is not known experimentally and
may be slightly shorter or longer than represented. Multicopy miRNAs and
their corresponding precursor structures are also shown. In cases where
no mouse precursors were yet which correspond to D. melanogaster or human
sequences are included. Published C. elegans miRNAs [36, 37] are also
included in the table. A recent set of new HeLa cell miRNAs is also
indicated [46]. If several ESTs were retrieved for one organism in the
database, only those with different precursor sequences are listed: miRNA
homologs found in other species are indicated. Chromosomal location and
sequence accession numbers, and clusters of miRNA genes are indicated:
Sequences from cloned miRNAs were searched against mouse and human in
GenBank (including trace data), and against Fugu rubripes and Dania ratio
at www.jgi.doe.gov and www.sanger.ac.uk, respectively.

EXAMPLE 1

MICRORNAS FROM D. MELANOGASTER AND HUMAN

[0042]We previously developed a directional cloning procedure to isolate
siRNAs after processing of long dsRNAs in Drosophila melanogaster embryo
lysate (8). Briefly, 5' and 3' adapter molecules were ligated to the ends
of a size-fractionated RNA population, followed by reverse transcription,
PCR amplification, concatamerization, cloning and sequencing. This
method, originally intended to isolate siRNAs, led to the simultaneous
identification of 14 novel 20- to 23-nt short RNAs which are encoded in
the D. melanogaster genome and which are expressed in 0 to 2 h embryos
(Table 1). The method was adapted to clone RNAs in a similar size range
from HeLa cell total RNA (14), which led to the identification of 19
novel human stRNAs (Table 2), thus providing further evidence for the
existence of a large class of small RNAs with potential regulatory roles.
According to their small size, we refer to these novel RNAs as microRNAs
or miRNAs. The miRNAs are abbreviated as miR-1 to miR-33, and the genes
encoding miRNAs are named mir-1 to mir-33. Highly homologous miRNAs are
classified by adding a lowercase letter, followed by a dash and a number
for designating multiple genomic copies of a mir gene.

[0043]The expression and size of the cloned, endogenous short RNAs was
also examined by Northern blotting (FIG. 1, Table 1 and 2). Total RNA
isolation was performed by acid guanidinium thiocyanate-phenol-chloroform
extraction [45]. Northern analysis was performed as described [1], except
that the total RNA was resolved on a 15% denaturing polyacrylamide gel,
transferred onto Hybond-N+membrane (Amersham Pharmacia Biotech), and the
hybridization and wash steps were performed at 50° C.
Oligodeoxynucleotides used as Northern probes were 5'-32P-phosphorylated,
complementary to the miRNA sequence and. 20 to 25 nt in length.

[0044]5S rRNA was detected by ethidium staining of polyacrylamide gels
prior to transfer. Blots were stripped by boiling in 0.1% aqueous sodium
dodecylsulfate/0.1×SSC (15 mM sodium chloride, 1.5 mM sodium
citrate, pH 7.0) for 10 min, and were re-probed up to 4 times until the
21-nt signals became too weak for detection. Finally, blots were probed
for val-tRNA as size marker.

[0045]For analysis of D. melanogaster RNAs, total RNA was prepared from
different developmental stages, as well as cultured Schneider-2 (S2)
cells, which originally derive from 20-24 h D. melanogaster embryos [15]
(FIG. 1, Table 1). miR-3 to miR-7 are expressed only during embryogenesis
and not at later developmental stages. The temporal expression of miR-1,
miR-2 and miR-8 to miR-13 was less restricted. These miRNAs were observed
at all developmental stages though significant variations in the
expression levels were sometimes observed. Interestingly, miR-1, miR-3 to
miR-6, and miR-8 to miR-11 were completely absent from cultured
Schneider-2 (S2) cells, which were originally derived from 20-24 h D.
melanogaster embryos [15], while. miR-2, miR-7, miR-12, and miR-13 were
present in S2 cells, therefore indicating cell type-specific miRNA
expression. miR-1, miR-8, and miR-12 expression patterns are similar to
those of lin-4 stRNA in C. elegans, as their expression is strongly
upregulated in larvae and sustained to adulthood [16]. miR-9 and miR-11
are present at all stages but are strongly reduced in the adult which may
reflect a maternal contribution from germ cells or expression in one sex
only.

[0046]The mir-3 to mir-6 genes are clustered (FIG. 2A), and mir-6 is
present as triple repeat with slight variations in the mir-6 precursor
sequence but not in the miRNA sequence itself: The expression profiles of
miR-3 to miR-6 are highly similar (Table 1), which -suggests that a
single embryo-specific precursor transcript may give rise to the
different miRNAs, or that the same enhancer regulates miRNA-specific
promoters. Several other fly miRNAs are also found in gene clusters (FIG.
2A).

[0047]The expression of HeLa cell miR-15 to miR-33 was examined by
Northern blotting using HeLa cell total RNA, in addition to total RNA
prepared from mouse kidneys, adult zebrafish, Xenopus laevis ovary, and
D. melanogaster S2 cells. (FIG. 1B, Table 2). miR-15 and miR-16 are
encoded in a gene cluster (FIG. 2B) and are detected in mouse kidney,
fish, and very weakly in frog ovary, which may result from miRNA
expression in somatic ovary tissue rather than oocytes. mir-17 to mir-20
are also clustered (FIG. 2B), and are expressed in HeLa cells and fish,
but undetectable in mouse kidney and frog ovary (FIG. 1, Table 2), and
therefore represent a likely case of tissue-specific miRNA expression.

[0048]The majority of vertebrate and invertebrate miRNAs identified in
this study are not related by sequence, but a few exceptions, similar to
the highly conserved let-7 RNA [6], do exist. Sequence analysis of the D.
melanogaster miRNAs revealed four such examples of sequence conservation
between invertebrates and vertebrates. miR-1 homologs are encoded in the
genomes of C. elegans, C. briggsae, and humans, and are found in cDNAs
from zebrafish, mouse, cow and human. The expression of mir-1 was
detected by Northern blotting in total RNA from adult zebrafish and C.
elegans, but not in total RNA from HeLa cells or mouse kidney (Table 2
and data not shown). Interestingly, while mir-1 and let-7 are expressed
both in adult flies (FIG. 1A) [6] and are both undetected in S2 cells,
miR-1 is, in contrast to let-7, undetectable in HeLa cells. This
represents another case of tissue-specific expression of a miRNA, and
indicates that miRNAs may not only play a regulatory role in
developmental timing, but also in tissue specification. miR-7 homologs
were found by database searches in mouse and human genomic and expressed
sequence tag sequences (ESTs). Two mammalian miR-7 variants are predicted
by sequence analysis in mouse and human, and were detected by Northern
blotting in HeLa cells and fish, but not in mouse kidney (Table 2).
Similarly, we identified mouse and human miR-9 and miR-1.0 homologs by
database searches but only detected mir-10 expression in mouse kidney.

[0049]The identification of evolutionary related miRNAs, which have
already acquired multiple sequence mutations, was not possible by
standard bioinformatic searches. Direct comparison of the D. melanogaster
miRNAs with the human miRNAs identified an 11-nt segment shared between
D. melanogaster miR-6 and HeLa miR-27, but no further relationships were
detected. One may speculate that most miRNAs only act on a single target
and therefore allow for rapid evolution by covariation, and that highly
conserved miRNAs act on more than one target sequence, and therefore have
a reduced probability for evolutionary drift by covariation [6]. An
alternative interpretation is that the sets of miRNAs from D.
melanogaster and humans are fairly incomplete and that many more miRNAs
remain to be discovered, which will provide the `missing evolutionary
links.

[0050]lin-4 and let-7 stRNAs were predicted to be excised from longer
transcripts that contain approximately 30 base-pair stem-loop structures
[1, 6]. Database searches for newly identified miRNAs revealed that all
miRNAs are flanked by sequences that have the potential to form stable
stem-loop structures (FIGS. 3 and 4). In many cases, we were able to
detect the predicted, approximately 70-nt precursors by Northern blotting
(FIG. 1).

[0051]Some miRNA precursor sequences were also identified in mammalian
cDNA (EST) databases [27], indicating that primary transcripts longer
than 70-nt stem-loop precursors do also exist. We never cloned a 22-nt
RNA complementary to any of the newly identified miRNAs, and it is as yet
unknown how the cellular processing machinery distinguishes between the
miRNA and its complementary strand. Comparative analysis of the precursor
stem-loop structures indicates that the loops adjacent to the base-paired
miRNA segment can be located on either side of the miRNA sequence (FIGS.
3 and 4), suggesting that the 5' or 3' location of the stem-closing loop
is not the determinant of miRNA excision. It is also unlikely that the
structure, length or stability of the precursor stem is the critical
determinant as the base-paired structures are frequently imperfect and
interspersed by less stable, non-Watson-Crick base pairs such as G/A,
U/U, C/U, A/A, and G/U wobbles. Therefore, a sequence-specific
recognition process is a likely determinant for miRNA excision, perhaps
mediated by members of the Argonaute (rde-1/ago1/piwi) protein family.
Two members of this family, alg-1 and alg-2, have recently been shown to
be critical for stRNA processing in C. elegans [13]. Members of the
Argonaute protein family are also involved in RNAi and PTGS. In D.
melanogaster, these include argonaute2, a component of the
siRNA-endonuclease complex (RISC) [17], and its relative aubergine, which
is important for silencing of repeat genes [18]. In other species, these
include rde-1, argonautel, and qde-2, in C. elegans [19], Arabidopsis
thaliana [20], and Neurospora crassa [21], respectively. The Argonaute
protein family therefore represents, besides the RNase III Dicer [12,
13], another evolutionary link between RNAI and miRNA maturation.

[0052]Despite advanced genome projects, computer-assisted detection of
genes encoding functional RNAs remains problematic [22]. Cloning of
expressed, so short functional RNAs, similar to EST approaches (RNomics),
is a powerful alternative and probably the most efficient method for
identification of such novel gene products [23-26]. The number of
functional RNAs has been widely underestimated and is expected to grow
rapidly because of the development of new functional RNA cloning
methodologies.

[0053]The challenge for the future is to define the function and the
potential targets of these novel miRNAs by using bioinformatics as well
as genetics, and to establish a complete catalogue of time- and
tissue-specific distribution of the already identified and yet to be
uncovered miRNAs. lin-4 and let-7 stRNAs negatively regulate the
expression of proteins encoded by mRNAs whose 3' untranslated regions
contain sites of complementarity to the stRNA [3-5].

[0054]Thus, a series of 33 novel genes, coding for 19- to 23-nucleotide
microRNAs (miRNAs), has been cloned from fly embryos and human cells.
Some of these miRNAs are highly conserved between vertebrates and
invertebrates and are developmentally or tissue-specifically expressed.
Two of the characterized human miRNAs may function as tumor suppressors
in B-cell chronic lymphocytic leukemia. miRNAs are related to a small
class of previously described 21- and 22-nt RNAs (lin-4 and let-7 RNAs),
so-called small temporal RNAs (stRNAs), and regulate developmental timing
in C. elegans and other species. Similar to stRNAs, miRNAs are presumed
to regulate translation of specific target mRNAs by binding to partially
complementary sites, which are present in their 3'-untranslated regions.

[0055]Deregulation of miRNA expression may be a cause of human disease,
and detection of expression of miRNAs may become useful as a diagnostic.
Regulated expression of miRNAs in cells or tissue devoid of particular
miRNAs may be useful for tissue engineering, and delivery or transgenic
expression of miRNAs may be useful for therapeutic intervention. miRNAs
may also represent valuable drug targets itself. Finally, miRNAs and
their precursor sequences may be engineered to recognize therapeutic
valuable targets.

EXAMPLE 2

MIRNAs FROM MOUSE

[0056]To gain more detailed insights into the distribution and function of
miRNAs in mammals, we investigated the tissue-specific distribution of
miRNAs in adult mouse. Cloning of miRNAs from specific tissues was
preferred over whole organism-based cloning because low-abundance miRNAs
that normally go undetected by Northern blot analysis are identified
clonally. Also, in situ hybridization techniques for detecting 21-nt RNAs
have not yet been developed. Therefore, 19- to 25-nucleotide RNAs were
cloned and sequenced from total RNA, which was isolated-from 18.5 -weeks
old BL6 mice. Cloning of miRNAs was performed as follows: 0.2 to 1 mg, of
total RNA was separated on a 15% denaturing polyacrylamide gel and RNA of
19- to 25-nt size was recovered. A 5'-phosphorylated 3'-adapter
oligonucleotide (5'-pUUUaaccgcgaattccagx: uppercase, RNA; lowercase, DNA;
p, phosphate; x, 3 `-Amino-Modifier C-7, ChemGenes, Ashland, Mass., USA,
Cat. No. NSS-1004; SEQ ID NO:54) and a 5'-adapter oligonucleotide
(5'-acggaattcctcactAAA: uppercase, RNA; lowercase, DNA; SEQ ID NO:55)
were ligated to the short RNAs. RT/PCR was performed with 3'-primer
(5'-GACTAGCTGGAATTCGCGGTTAAA; SEQ ID NO:56) and 5'-primer
(5'-CAGCCAACGGAATTCCTCACTAAA; SEQ ID NO:57). In order to introduce Ban I
restriction sites, a second PCR was performed using the primer pair
5'-CAGCCAACAGGCACCGAATTCCTCACTAAA (SEQ ID NO:57) and
5'-GACTAGCTTGGTGCCGAATTCGCGGTTAAA (SEQ ID NO:56), followed by
concatamerization after Ban I digestion and T4 DNA ligation. Concatamers
of 400 to 600 basepairs were cut out from 1.5% agarose gels and recovered
by Biotrap (Schleicher & Schuell) electroelution (1× TAE buffer)
and by ethanol precipitation. Subsequently, the 3' ends of the
concatamers were filled in by incubating for 15 min at 72° C. with
Taq polymerase in standard PCR reaction mixture. This solution was
diluted 3-fold with water and directly used for ligation into pCR2.1 TOPO
vectors. Clones were screened for inserts by PCR and 30 to 50 samples
were subjected to sequencing. Because RNA was prepared from combining
tissues of several mice, minor sequence variations that were detected
multiple times in multiple clones may reflect polymorphisms rather than
RT/PCR mutations. Public database searching was used to identify the
genomic sequences encoding the approx. 21-nt RNAs. The occurrence of a 20
to 30 basepair fold-back structure involving the immediate upstream or
downstream flanking sequences was used to assign miRNAs [36-38].

[0057]We examined 9 different mouse tissues and identified 34 novel
miRNAs, some of which are highly tissue-specifically expressed (Table 3
and FIG. 5). Furthermore, we identified 33 new miRNAs from different
mouse tissues and also from human Soas-2 osteosarcoma cells (Table 4).
miR-1 was previously shown by Northern analysis to be strongly expressed
in adult heart, but not in brain, liver, kidney, lung or colon [37]. Here
we show that miR-1 accounts for 45% of all mouse miRNAs found in heart,
yet miR-1 was still expressed at a low level in liver and midbrain even
though it remained undetectable by Northern analysis. Three copies or
polymorphic alleles of miR-1 were found in mice. The conservation of
tissue-specific miR-1 expression between mouse and human provides
additional evidence for a conserved regulatory role of this miRNA. In
liver, variants of miR-122 account for 72% of all cloned miRNAs and
miR-122 was undetected in all other tissues analyzed. In spleen, miR-143
appeared to be most abundant, at a frequency of approx. 30%. In colon,
miR-142-as, was cloned several times and also appeared at a frequency of
30%. In small intestine, too few miRNA sequences were obtained to permit
statistical analysis. This was due to strong RNase activity in this
tissue, which caused significant breakdown of abundant non-coding RNAs,
e.g. rRNA, so that the fraction of miRNA in the cloned sequences was very
low. For the same reason, no miRNA sequences were obtained from pancreas.

[0058]To gain insights in neural tissue miRNA distribution, we analyzed
cortex, cerebellum and midbrain. Similar to heart, liver and small
intestine, variants of a particular miRNA, miR-124, dominated and
accounted for 25 to 48% of all brain miRNAs. miR-101, -127, -128, -131,
and -132, also cloned from brain tissues, were further analyzed by
Northern blotting and shown to be predominantly brain-specific. Northern
blot analysis was performed as described in Example 1. tRNAs and 5S rRNA
were detected by ethidium staining of polyacrylamide gels prior to
transfer to verify equal loading. Blots were stripped by boiling in
deionized water for 5 min, and reprobed up to 4 times until the 21-nt
signals became too weak for detection.

[0059]miR-125a and miR-125b are very similar to the sequence of C. elegans
lin-4 stRNA and may represent its orthologs (FIG. 6A). This is of great
interest because, unlike let-7 that was readily detected in other
species, lin-4 has acquired a few mutations in the central region and
thus escaped bioinformatic database searches. Using the mouse sequence
miR-125b, we could readily identify its ortholog in the D. melanogaster
genome. miR-125a and miR-125b differ only by a central diuridine
insertion and a U to C change. miR-125b is very similar to lin-4 stRNA
with the differences located only in the central region, which is
presumed to be bulged out during target mRNA recognition [41]. miR-125a
and miR-125b were cloned from brain tissue, but expression was also
detected by Northern analysis in other tissues, consistent with the role
for lin-4 in regulating neuronal remodeling by controlling lin-14
expression [43]. Unfortunately, orthologs to C. elegans lin-14 have not
been described and miR-125 targets remain to be identified in D.
melanogaster or mammals. Finally, miR-125b expression is also
developmentally regulated and only detectable in pupae and adult but not
in embryo or larvae of D. melanogaster (FIG. 6B).

[0060]Sequence comparison of mouse miRNAs with previously described miRNA
reveals that miR-99b and miR-99a are similar to D. melanogaster, mouse
and human miR-10 as well as C. elegans miR-51 [36], miR-141 is similar to
D. melanogaster miR-8 miR-29b is similar to C. elegans miR-83 , and
miR-131 and miR-142-s are similar to D. melanogaster miR-4 and C. elegans
miR-79 [36]. miR-124a is conserved between invertebrates and vertebrates.
In this respect it should be noted that for almost every miRNA cloned
from mouse was also encoded in the human genome, and frequently detected
in other vertebrates, such as the pufferfish, Fugu rubripes, and the
zebrafish, Danio rerio. Sequence conservation may point to conservation
in function of these miRNAs. Comprehensive information about orthologous
sequences is listed in FIG. 7.

[0061]In two cases both strands of miRNA precursors were cloned (Table 3),
which was previously observed once for a C. elegans miRNA [36]. It is
thought that the most frequently cloned strand of a miRNA precursor
represents the functional miRNA, which is miR-30c-s and miR-142-as, s and
as indicating the 5' or 3'side of the fold-back structure, respectively.

[0062]The mir-142 gene is located on chromosome 17, but was also found at
the breakpoint junction of a t(8;17) translocation, which causes an
aggressive B-cell leukemia due to strong up-regulation of a translocated
MYC gene [44]. The translocated MYC gene, which was also truncated at the
first exon, was located only 4-nt downstream of the 3'-end of the miR-142
precursor. This suggests that translocated MYC was under the control of
the upstream miR-142 promoter. Alignment of mouse and human miR-142
containing EST sequences indicate an approximately 20 nt conserved
sequence element downstream of the mir-142 hairpin. This element was lost
in the translocation. It is conceivable that the absence of the conserved
downstream sequence element in the putative miR-142/mRNA fusion prevented
the recognition of the transcript as a miRNA precursor and therefore may
have caused accumulation of fusion transcripts and overexpression of MYC.

[0063]miR-155, which was cloned from colon, is excised from the known
noncoding BIC RNA [47]. BIC was originally identified as a gene
transcriptionally activated by promoter insertion at a common retroviral
integration site in B cell lymphomas induced by avian leukosis virus.
Comparison of BIC cDNAs from human, mouse and chicken revealed 78%
identity over 138 nucleotides [47]. The identity region covers the
miR-155 fold-back precursor and a few conserved boxes downstream of the
fold-back sequence. The relatively high level of expression of BIC in
lymphoid organs and cells in human, mouse and chicken implies an
evolutionary conserved function, but BIC RNA has also been detected at
low levels in non-hematopoietic tissues [47].

[0064]Another interesting observation was that segments of perfect
complementarity to miRNAs are not observed in mRNA sequences or in
genomic sequences outside the miRNA inverted repeat. Although this could
be fortuitous, based on the link between RNAi and miRNA processing [11,
13, 43] it may be speculated that miRNAs retain the potential to cleave
perfectly complementary target RNAs. Because translational control
without target degradation could provide more flexibility it may be
preferred over mRNA degradation.

[0065]In summary, 63 novel miRNAs were identified from mouse and 4 novel
miRNAs were identified from human Soas-2 osteosarcoma cells (Table 3 and
Table 4), which are conserved in human and often also in other
non-mammalian vertebrates. A few of these miRNAs appear to be extremely
tissue-specific, suggesting a critical role for some miRNAs in
tissue-specification and cell lineage decisions. We may have also
identified the fruitfly and mammalian ortholog of C. elegans lin-4 stRNA.
The establishment of a comprehensive list of miRNA sequences will be
instrumental for bioinformatic approaches that make use of completed
genomes and the power of phylogenetic comparison in order to identify
miRNA-regulated target mRNAs.

[0113]Human miRNAs, From 220 short RNAs sequenced, 100 (45%) corresponded
to miRNAs, 53 (24%) to already characterized functional RNAs (rRNA,
snRNAs, tRNAs), and 67 (30%) sequences with no database entry. Results of
Northern blotting of total RNA isolated from different vertebrate species
and S2 cells are indicated. For legend, see Table 1.

[0114]Mouse miRNAs. The sequences indicated represent the longest miRNA
sequences identified by cloning. The 3'-terminus of miRNAs is often
truncated by one or two nucleotides. miRNAs that are more than 85%
identical in sequence (i.e. share 18 out of 21 nucleotides) or contain 1-
or 2-nucleotide internal deletions are referred to by the same gene
number followed by a lowercase letter. Minor sequence variation's between
related miRNAs are generally found near the ends of the miRNA sequence
and are thought to not compromise target RNA recognition. Minor sequence
variations may also represent A to G and C to U changes, which are
accommodated as G-U wobble base pairs during target recognition. miRNAs
with the suffix -s or -as indicate RNAs derived from either the 5'-half
or the 3'-half of a miRNA precursor. Mouse brains were dissected into
midbrain, mb, cortex, cx, cerebellum, cb. The tissues analyzed were
heart, ht; liver, lv; small intestine, si; colon, co; cortex, ct;
cerebellum, cb; midbrain, mb.

21 nucleotides) or contain 1- or 2-nucleotide internal deletions are
referred to by the same gene number followed by a lowercase letter.
Minor sequence variations between related miRNAs are generally found near
the ends of the miRNA sequence and are thought to not compromise target
RNA recognition.
Minor sequence variations may also represent A to G and C to U changes,
which are accommodated as G-U wobble base pairs during target
recognition.
miRNAs with the suffix-s or -as indicate RNAs derived from either the
5'-half or the 3'-half of a miRNA precursor.
Mouse brains were dissected into midbrain, mb, cortex, cx, cerebellum, cb.
The tissues analyzed were heart, ht; liver, lv; small intestine, si;
colon, co; cortex, ct; cerebellum, cb; midbrain, mb.

[0115]The originally described miR-30 was renamed to miR-30a-as in order
to distinguish it from the miRNA derived from the opposite strait of the
precursor encoded by the mir-30a gene. miR-30a-s is equivalent to mir-97
[46].

[0116]bA 1-nt length heterogeneity is found on both 5' and 3' end.
The 22-nt miR sequence is shown, but only 21-nt miRNAs were cloned.

[0117]Mouse and human miRNAs. The sequences indicated represent the
longest miRNA sequences identified by cloning. The 3' terminus of miRNAs
is often truncated by one or two nucleotides. miRNAs that are more than
85% identical in sequence (i.e. share 18 out of 21 nucleotides) or
contain 1- or 2-nucleotide internal deletions are referred to by the same
gene number followed by a lowercase letter. Minor sequence variations
between related miRNAs are generally found near the ends of the miRNA
sequence and are thought to not compromise target. RNA recognition. Minor
sequence variations may also represent A to G and C to U changes; which
are accommodated as G-U wobble base pairs during target recognition.
Mouse brains were dissected into midbrain, mb, cortex, cx, cerebellum,
cb. The tissues analyzed were lung, ln; liver, lv; spleen, sp; kidney,
kd; skin, sk; testis, ts; ovary, ov; thymus, thy; eye, ey; cortex, ct;
cerebellum, cb; midbrain, mb. The human osteosarcoma cells SAOS-2 cells
contained an inducible p53 gene (p53-, uninduced p53; p53+, induced p53);
the differences in miRNAs identified from induced and uninduced SAOS
cells were not statistically significant.