For this assignment, I began with the raw GLN3 data. To start analyzing it, the numbers must first be scaled and centered so they can be more accurately compared to one another. To do this, I had to find the average and standard deviation for each trial of each time period. After the data was scaled and centered, I was able to perform statistical analysis on the data. I found the Average Log Fold among all the trials for each time period. Then, I used that data to find the P-value for each gene at every time period. With this, I filtered out and calculated the number of genes with significant expression change based on predetermined P-values. Doing this allowed me to see the change in gene expression as a reaction to the cold shock, and determine if there was significant up or down regulation.

+

+

===Questions===

+

#The number of replicates for each time point in the data.

#The number of replicates for each time point in the data.

-

#*

+

#*There were four replications for each of the time periods: t15, t30, t60, t90, and t120.

#Why is the use of the dollar sign symbols in front of the number important?"

#Why is the use of the dollar sign symbols in front of the number important?"

-

#*

+

#*We must use the dollar sign symbols in front of the number to make sure that we are using the cell for average and standard deviation in the equation. Without it, Excel would take the data in incorrect cells as we copy and paste the master equation down throughout the whole column.

#How many genes have p value < 0.05?

#How many genes have p value < 0.05?

-

#*

+

#*t15: 781

+

#*t30: 1539

+

#*t60: 1559

+

#*t90: 538

+

#*t120: 564

#What about p < 0.01?

#What about p < 0.01?

-

#*

+

#*t15: 218

+

#*t30: 456

+

#*t60: 384

+

#*t90: 129

+

#*t120: 114

#What about p < 0.001?

#What about p < 0.001?

-

#*

+

#*t15: 21

+

#*t30: 55

+

#*t60: 51

+

#*t90: 9

+

#*t120: 16

#What about p < 0.0001?

#What about p < 0.0001?

-

#*

+

#*t15: 1

+

#*t30: 4

+

#*t60: 10

+

#*t90: 3

+

#*t120: 5

#How many of the genes are still significantly changed at p < 0.05 after the Bonferroni correction?

#How many of the genes are still significantly changed at p < 0.05 after the Bonferroni correction?

-

#*

+

#*t15: 1

-

#Keeping the "Pval" filter at p < 0.05, filter the "AvgLogFC" column to show all genes with an average log fold change greater than zero. How many meet these two criteria?

+

#*t30: 0

-

#*

+

#*t60: 2

+

#*t90: 1

+

#*t120: 0

+

#For time, t60, keeping the "Pval" filter at p < 0.05, filter the "AvgLogFC" column to show all genes with an average log fold change greater than zero. How many meet these two criteria?

+

#*760

#Keeping the "Pval" filter at p < 0.05, filter the "AvgLogFC" column to show all genes with an average log fold change less than zero. How many meet these two criteria?

#Keeping the "Pval" filter at p < 0.05, filter the "AvgLogFC" column to show all genes with an average log fold change less than zero. How many meet these two criteria?

-

#*

+

#*799

#Keeping the "Pval" filter at p < 0.05, How many have an average log fold change of > 0.25 and p < 0.05?

#Keeping the "Pval" filter at p < 0.05, How many have an average log fold change of > 0.25 and p < 0.05?

-

#*

+

#*727

#How many have an average log fold change of < -0.25 and p < 0.05?

#How many have an average log fold change of < -0.25 and p < 0.05?

-

#*

+

#*745

#Find NSR1 in your dataset. Is it's expression significantly changed at any timepoint? Record the average fold change and p value for NSR1 for each timepoint in your dataset.

#Find NSR1 in your dataset. Is it's expression significantly changed at any timepoint? Record the average fold change and p value for NSR1 for each timepoint in your dataset.

-

#*

+

#*Average Fold Change

+

#**t15: 1.2

+

#**t30: 1.98

+

#**t60: 1.96

+

#**t90: -0.75

+

#**t120: -0.63

+

#*P-Value

+

#**t15: 0.0046

+

#**t30: 0.0180

+

#**t60: 0.0151

+

#**t90: 0.0676

+

#**t120: 0.1061

+

#*For t15, t30, and t60, there is significant change in expression at the p < 0.05 level, however, only t15 is significant at the p < 0.01 level.

#Which gene has the smallest p value in your dataset (at any time point)? Why do you think the cell is changing this gene's expression upon cold shock?

#Which gene has the smallest p value in your dataset (at any time point)? Why do you think the cell is changing this gene's expression upon cold shock?

-

#*

+

#*SFH5 (YJL145W) has the smallest P-Value in the data set at time t90. This gene is responsible for protein transport into the plasma membrane, as well as transfer from the Golgi body. It would make sense that this gene is down regulated during recovery because as the cell is recuperating after the cold shock, most of the effort will be within the cell to repair, and there will be less need to bring molecules into the cell.

====Useful Links====

====Useful Links====

{{Kasey E. O'Connor}}

{{Kasey E. O'Connor}}

Current revision

Contents

Microarry Data Analysis

Process

For this assignment, I began with the raw GLN3 data. To start analyzing it, the numbers must first be scaled and centered so they can be more accurately compared to one another. To do this, I had to find the average and standard deviation for each trial of each time period. After the data was scaled and centered, I was able to perform statistical analysis on the data. I found the Average Log Fold among all the trials for each time period. Then, I used that data to find the P-value for each gene at every time period. With this, I filtered out and calculated the number of genes with significant expression change based on predetermined P-values. Doing this allowed me to see the change in gene expression as a reaction to the cold shock, and determine if there was significant up or down regulation.

Questions

The number of replicates for each time point in the data.

There were four replications for each of the time periods: t15, t30, t60, t90, and t120.

Why is the use of the dollar sign symbols in front of the number important?"

We must use the dollar sign symbols in front of the number to make sure that we are using the cell for average and standard deviation in the equation. Without it, Excel would take the data in incorrect cells as we copy and paste the master equation down throughout the whole column.

How many genes have p value < 0.05?

t15: 781

t30: 1539

t60: 1559

t90: 538

t120: 564

What about p < 0.01?

t15: 218

t30: 456

t60: 384

t90: 129

t120: 114

What about p < 0.001?

t15: 21

t30: 55

t60: 51

t90: 9

t120: 16

What about p < 0.0001?

t15: 1

t30: 4

t60: 10

t90: 3

t120: 5

How many of the genes are still significantly changed at p < 0.05 after the Bonferroni correction?

t15: 1

t30: 0

t60: 2

t90: 1

t120: 0

For time, t60, keeping the "Pval" filter at p < 0.05, filter the "AvgLogFC" column to show all genes with an average log fold change greater than zero. How many meet these two criteria?

760

Keeping the "Pval" filter at p < 0.05, filter the "AvgLogFC" column to show all genes with an average log fold change less than zero. How many meet these two criteria?

799

Keeping the "Pval" filter at p < 0.05, How many have an average log fold change of > 0.25 and p < 0.05?

727

How many have an average log fold change of < -0.25 and p < 0.05?

745

Find NSR1 in your dataset. Is it's expression significantly changed at any timepoint? Record the average fold change and p value for NSR1 for each timepoint in your dataset.

Average Fold Change

t15: 1.2

t30: 1.98

t60: 1.96

t90: -0.75

t120: -0.63

P-Value

t15: 0.0046

t30: 0.0180

t60: 0.0151

t90: 0.0676

t120: 0.1061

For t15, t30, and t60, there is significant change in expression at the p < 0.05 level, however, only t15 is significant at the p < 0.01 level.

Which gene has the smallest p value in your dataset (at any time point)? Why do you think the cell is changing this gene's expression upon cold shock?

SFH5 (YJL145W) has the smallest P-Value in the data set at time t90. This gene is responsible for protein transport into the plasma membrane, as well as transfer from the Golgi body. It would make sense that this gene is down regulated during recovery because as the cell is recuperating after the cold shock, most of the effort will be within the cell to repair, and there will be less need to bring molecules into the cell.