Abstract

The invention relates to novel spike-in oligonucleotides specifically for use in normalization of small RNA sequence data. The invention specifically provides sets each comprising at least two subsets of single stranded nucleic acid molecules, each nucleic acid molecule comprising a 5´phosphate, a sequence of at least 3 randomized nucleotides, a core sequence of at least 8 core nucleotides containing at least one mismatch compared to a target sequence, a sequence of at least 3 randomized nucleotides, and a 3´modification, wherein each subset comprises a plurality of nucleic acid molecules having an identical core nucleotide sequence and different randomized nucleotides, and wherein the nucleic acid molecules of each subset differ in at least one nucleotide of the core nucleotide sequence and the generation of a library containing the sets. The invention also relates to reference values in nucleotide sequencing and a method for determining the amount of target sequences in a sample.

Claims

1 . A set comprising at least two subsets of single stranded nucleic acid molecules, each nucleic acid molecule comprising from the 5'to 3'direction:

a) a 5'phosphate,

b) a sequence of at least 3 randomized nucleotides,

c) a core sequence of at least 8 nucleotides which sequence contains two or more mismatches compared to a target sequence,

d) a sequence of at least 3 randomized nucleotides, and

e) a 3'modification,

wherein each subset comprises a plurality of nucleic acid molecules having an identical core nucleotide sequence and different randomized nucleotides, and

wherein the nucleic acid molecules of each subset differ in at least one nucleotide of the core nucleotide sequence.

The set according to claim 1 , wherein the plurality of nucleic acid molecules comprises randomized nucleotide sequences containing all four nucleotide combinations of A, C, G, U or A, C, G, T.

The set according to any one of claims 1 to 2, wherein the nucleic acid molecule is an RNA molecule, specifically mimicking a small RNA, specifically selected from the group consisting of siRNA, tasiRNA, snRNA, miRNA, snoRNA, piRNA and tRNA and any precursors thereof.

The set according to any one of claims 1 to 3, wherein the core nucleotide sequence comprises from 8 to 25 nucleotides, preferably from 10 to 20 nucleotides, preferably from 12 to 18 nucleotides, preferably 13 nucleotides.

The set according to any one of claims 1 to 4, wherein the sequence of randomized nucleotides comprises from 3 to 7 nucleotides, preferably from 3 to 5 nucleotides, preferably 4 nucleotides.

The set according to any one of claims 1 to 5, wherein the 5'phosphate is selected from a group of monophosphate, diphosphate and triphosphate and wherein the 3'modification is selected from a group consisting of 2'-0-methylation [2'-0-methyl group] and hydroxylation [hydroxyl group].

The set according to any one of claims 1 to 6, wherein the subsets are present in an amount from 1 to 10000 amol, preferably from 10 to 5000 amol, specifically comprising different amounts of each subset.

The set according to any one of claims 1 to 7, wherein the target sequence can be any sequence of interest, specifically a genome or transcriptome of an organism, a sequence originating from virus, bacteria, animals, plants, specifically it is an RNA small RNA, dynamic small RNA population.

Use of a set according to any one of claims 1 to 8 as spike-in probes for normalizing sequencing data.

A method for determining the absolute amount of one or more target sequences in a sample, specifically in a cell, tissue or organ sample using a set according to any one of claims 1 to 9.