Abstract:
A brief introduction to high throughput technologies for
measuring and analyzing gene expression is given. Various
supervised and unsupervised data mining methods for analyzing
the produced high-dimensional data are discussed. The main
emphasis is on supervised machine learning methods for
classification and prediction of tumor gene expression
profiles. Furthermore, methods to rank the genes according to
their importance for the classification are explored. The
approaches are illustrated by exploratory studies using two
examples of retrospective clinical data from routine tests;
diagnostic prediction of small round blue cell tumors of
childhood and determining the estrogen receptor status of
sporadic breast cancer. The classification performance is
gauged using blind tests. These studies demonstrate the
feasibility of machine learning based molecular cancer
classification.