Statistical evaluation of methods for quantifying gene expression by autoradiography in histological sections.

Lazic SE - BMC Neurosci (2009)

Bottom Line:
Image segmentation based on thresholding can be subject to floor-effects and lead to biased results.Finally, converting grey level pixel intensities to optical densities or units of radioactivity is unnecessary for most applications and can lead to data with poor statistical properties.Based on these results, suggestions are made to reduce bias, increase precision, and ultimately provide more meaningful results of gene expression data.

Background: In situ hybridisation (ISH) combined with autoradiography is a standard method of measuring the amount of gene expression in histological sections, but the methods used to quantify gene expression in the resulting digital images vary greatly between studies and can potentially give conflicting results.

Results: The present study examines commonly used methods for analysing ISH images and demonstrates that these methods are not optimal. Image segmentation based on thresholding can be subject to floor-effects and lead to biased results. In addition, including the area of the structure or region of interest in the calculation of gene expression can lead to a large loss of precision and can also introduce bias. Finally, converting grey level pixel intensities to optical densities or units of radioactivity is unnecessary for most applications and can lead to data with poor statistical properties. A modification of an existing method for selecting the structure or region of interest is introduced which performs better than alternative methods in terms of bias and precision.

Conclusion: Based on these results, suggestions are made to reduce bias, increase precision, and ultimately provide more meaningful results of gene expression data.

Figure 4: Thresholding floor-effect. The dentate gyrus (A) was thresholded and the selected pixels are displayed in black (B). There are pixels that are a part of the DG (indicated in red) that should be included in the calculation of the mean GL, but have fallen below the threshold level and are therefore excluded. This biases the GL value upward, and creates a floor-effect, where no pixels below the threshold are included in the calculation of the mean GL.

Mentions:
The SD thresholding method selects the foreground as being above a certain level of the background, and the mixture model partitions the pixels into one of two Gaussian distributions (one foreground and the other background). A major disadvantage of these approaches is that spatial information is not included when distinguishing the foreground from the background, only the GL value of each pixel. Thus, when using thresholds to select the structure–especially when gene expression in the structure or region of interest is low relative to background levels–parts of the structure might be omitted, where pixels with GL values lower than the threshold are excluded from the calculation of the mean GL value, even though they are in the structure of interest and should therefore be included. This has the potential to bias the results upwards and can be seen in Figure 4, where parts of the dentate gyrus (in red) are excluded from the analysis. An analogy is with trying to determine the mean size of fish in a lake by casting a net into the water; fish that are smaller than the holes in the net will slip through and will not be included in the calculation of the mean, resulting in a higher estimated mean value than the true population value. The lighter coloured pixels below the threshold but in the DG are analogous to the smaller fish that slip through the net. This is the likely explanation for the low CV of the two thresholding methods (Fig. 2); the range of values is restricted because none can be lower than the threshold. This was examined directly by plotting the values for the line (Method 1) and SD thresholding method (Method 3) against each other (Fig. 5A). The diagonal line is not a regression line, but the line of identity (y = x) and points above the line are those for which the threshold method gave greater values than the line method, and the opposite for points below the line. A Tukey mean-difference plot was used to better examine the relationship between the two methods (Fig. 5B, [38,39]), where the difference between the threshold method and line method is plotted on the y-axis and the mean of the two methods is plotted on the x-axis. Similar to the previous graph, values above the horizontal y = 0 line (grey) are the ones for which the threshold method gave the larger value, and values below the line are ones where the line method gave the larger value. Ideally, the points would fall along the y = 0 line, indicating that on average the two methods give the same values. Alternatively, if there was an additive shift of say five units above the y = 0 line, this would represent the threshold method consistently giving higher values than the line method. Based on this, we cannot determine if the threshold method is an overestimate or if the line method is an underestimate, but given the semiquantitative nature of the technique, such a result would not pose any problems for analysis or interpretation. However, when there is a trend in the values on the mean-difference plot, it indicates that the two methods produce different results at different levels of pixel intensity. In this case the threshold method has higher values at lower GL values, indicating that the threshold method values do not decrease as quickly as the line method values at lower GL values, consistent with with a floor-effect (p = 0.014). While the trend is relatively small with this data, this is a serious limitation of thresholding methods and they should therefore be avoided. It would not be apparent if a floor-effect is present in a dataset unless the results are compared with another method.

Figure 4: Thresholding floor-effect. The dentate gyrus (A) was thresholded and the selected pixels are displayed in black (B). There are pixels that are a part of the DG (indicated in red) that should be included in the calculation of the mean GL, but have fallen below the threshold level and are therefore excluded. This biases the GL value upward, and creates a floor-effect, where no pixels below the threshold are included in the calculation of the mean GL.

Mentions:
The SD thresholding method selects the foreground as being above a certain level of the background, and the mixture model partitions the pixels into one of two Gaussian distributions (one foreground and the other background). A major disadvantage of these approaches is that spatial information is not included when distinguishing the foreground from the background, only the GL value of each pixel. Thus, when using thresholds to select the structure–especially when gene expression in the structure or region of interest is low relative to background levels–parts of the structure might be omitted, where pixels with GL values lower than the threshold are excluded from the calculation of the mean GL value, even though they are in the structure of interest and should therefore be included. This has the potential to bias the results upwards and can be seen in Figure 4, where parts of the dentate gyrus (in red) are excluded from the analysis. An analogy is with trying to determine the mean size of fish in a lake by casting a net into the water; fish that are smaller than the holes in the net will slip through and will not be included in the calculation of the mean, resulting in a higher estimated mean value than the true population value. The lighter coloured pixels below the threshold but in the DG are analogous to the smaller fish that slip through the net. This is the likely explanation for the low CV of the two thresholding methods (Fig. 2); the range of values is restricted because none can be lower than the threshold. This was examined directly by plotting the values for the line (Method 1) and SD thresholding method (Method 3) against each other (Fig. 5A). The diagonal line is not a regression line, but the line of identity (y = x) and points above the line are those for which the threshold method gave greater values than the line method, and the opposite for points below the line. A Tukey mean-difference plot was used to better examine the relationship between the two methods (Fig. 5B, [38,39]), where the difference between the threshold method and line method is plotted on the y-axis and the mean of the two methods is plotted on the x-axis. Similar to the previous graph, values above the horizontal y = 0 line (grey) are the ones for which the threshold method gave the larger value, and values below the line are ones where the line method gave the larger value. Ideally, the points would fall along the y = 0 line, indicating that on average the two methods give the same values. Alternatively, if there was an additive shift of say five units above the y = 0 line, this would represent the threshold method consistently giving higher values than the line method. Based on this, we cannot determine if the threshold method is an overestimate or if the line method is an underestimate, but given the semiquantitative nature of the technique, such a result would not pose any problems for analysis or interpretation. However, when there is a trend in the values on the mean-difference plot, it indicates that the two methods produce different results at different levels of pixel intensity. In this case the threshold method has higher values at lower GL values, indicating that the threshold method values do not decrease as quickly as the line method values at lower GL values, consistent with with a floor-effect (p = 0.014). While the trend is relatively small with this data, this is a serious limitation of thresholding methods and they should therefore be avoided. It would not be apparent if a floor-effect is present in a dataset unless the results are compared with another method.

Bottom Line:
Image segmentation based on thresholding can be subject to floor-effects and lead to biased results.Finally, converting grey level pixel intensities to optical densities or units of radioactivity is unnecessary for most applications and can lead to data with poor statistical properties.Based on these results, suggestions are made to reduce bias, increase precision, and ultimately provide more meaningful results of gene expression data.

Background: In situ hybridisation (ISH) combined with autoradiography is a standard method of measuring the amount of gene expression in histological sections, but the methods used to quantify gene expression in the resulting digital images vary greatly between studies and can potentially give conflicting results.

Results: The present study examines commonly used methods for analysing ISH images and demonstrates that these methods are not optimal. Image segmentation based on thresholding can be subject to floor-effects and lead to biased results. In addition, including the area of the structure or region of interest in the calculation of gene expression can lead to a large loss of precision and can also introduce bias. Finally, converting grey level pixel intensities to optical densities or units of radioactivity is unnecessary for most applications and can lead to data with poor statistical properties. A modification of an existing method for selecting the structure or region of interest is introduced which performs better than alternative methods in terms of bias and precision.

Conclusion: Based on these results, suggestions are made to reduce bias, increase precision, and ultimately provide more meaningful results of gene expression data.