Σχόλια 0

Το κείμενο του εγγράφου

Explain what is meant by data normalisation and discuss why it is an important step before theanalysis of the output of microarray experiments.

2.

Describe the main sources that would lead to microarray data variability. Provide examples ofhow you may use normalisation methods to address them.

3.

You have collected data from a cDNA microarray. The green channel is to measure geneexpression in normal tissue, and the red channel is used to measure gene expression indiseased tissue. You believe that the output may biased to the green channel. Explain whattype of plot you can use to test your assumption, and how you could correct the measuredresults.

4.

Two labs are running experiments on the APO1 gene. Suggest one method that would allowthem to compare their results.

5.

In the context of microarray experiments, explain what is meant by biological and technicalvariability of the data.

(Review questions on data classification)

6.

Explain how microarrays can be used as a basis for both diagnostic and prognostic tools.

7.

Describe what is meant by a decision tree and how it can be used to classify data.

8.

Describe the generic decision tree construction algorithm and describe how information gain isused to choose the tests at the internal nodes of the tree.

9.

Describe the basic idea behind the naïve Bayes classifier.

(Problems)

10.

Given the following sets of class values. For each set, calculate the I(p,n) metric given by

Data Set 1: {p, p, p, p, p, p, p, p}.

Data Set 2: {p, p, p, p, p, p, p, n}.

Data Set 3: {p, p, p, p, p, p, n, n}.

Data Set 4: {p, p, p, p, p, n, n, n}.

Data Set 5: {p, p, p, p, n, n, n, n}.

Data Set 6: {p, p, p, n,n, n, n, n}.

Data Set 7: {p, p, n, n, n, n, n, n}.

Data Set 8: {p, n, n, n, n, n, n, n}.

Data Set 9: {n, n, n, n, n, n, n, n}.

Can you explain your findings? How does this relate to the Information gain conceptintroduced in the lectures?

11.

Giventhe following training data set about exotic dishes

a.

What is the information gain associated with choosing the attribute “Taste” as the rootof the decision tree

b.

Draw the full decision tree whose root is given by “Taste”

c.

Use the tree to predict the class value for the record given by

12.

Given the following training data set

a.

Based on the data set, calculate the amount of information needed to decide if an arbitraryrecord belongs to either class 1 or class 0.

b.

Construct a decision tree from this training data-set based on using the concept ofinformation gain as a metric for choosing the nodes of the tree.

c.

Predict the class value for thefollowing two records.

Instance

A1

A2

Class Value

11

F

N

?

12

T

N

?

d.

What is your prediction confidence in each case?

e.

Generate two decision rules from the tree

1

Note there was an error in the printed sheet. The correct entry here is “Hot” not “Sour”

Instance

Temperature

Taste

Size

Appealing

11

Hot1

Salty

Small

?

12

Cold

Sweet

Large

?

Instance

A1

A2

Class Value

1

M

N

1

2

T

N

1

3

F

N

1

4

F

N

1

5

T

O

1

6

M

N

0

7

F

O

0

8

T

O

0

9

T

N

0

10

F

O

0

Instance

Temperature

Taste

Size

Appealing

1

Hot

Salty

Small

No

2

Cold

Sweet

Large

No

3

Cold

Sweet

Large

No

4

Cold

Sour

Small

Yes

5

Hot

Sour

Small

Yes

6

Hot

Salty

Large

No

7

Hot

Sour

Large

Yes

8

Cold

Sweet

Small

Yes

9

Cold

Sweet

Small

Yes

10

Hot

Salty

Large

No

13.

Given the following training data set

Sample

Gene 1

Gene 2

Gene 3

Gene 4

Diseased

A

High

Medium

High

Medium

Yes

B

Low

Medium

Low

High

Yes

C

Medium

High

High

Medium

No

D

Low

Low

Low

Low

Yes

E

Medium

Medium

High

Medium

No

a.

Show how a naïve Bayesian classifier would classify the following sample

Sample

Gene 1

Gene 2

Gene 3

Gene 4

Diseased

X

Low

Medium

High

Low

??

14.

Given the following training data set collected during a drug efficacy study for CMV-buster.The data shows gene expression measurements for three genes A, B, C as measured inblood samples collected from people sufferingfrom the Cytomegalovirus infection beforebeing administered the CMV-buster, and indicates whether each gene was under-expressedor over-expressed compared to a control sample from healthy individuals. The last columnindicates whether the treatment was effective or not.