Learn how one team developed algorithms to automatically identify tissues from big whole-slide images

In Part 1 in this series, we gave an overview of this project and explained how we scaled down the images. Part 2 showed how we investigated image filters and determined a set of filters that can be used for effective tissue segmentation with our data set. Now in this article, we’ll explain morphology operators and how we combined filters and applied filters to multiple images.

Morphology

Information about image morphology can be found at
https://en.wikipedia.org/wiki/Mathematical_morphology. The primary morphology operators are erosion, dilation, opening, and closing. With erosion, pixels along the edges of an object are removed. With dilation, pixels along the edges of an object are added. Opening is erosion followed by dilation. Closing is dilation followed by erosion. With morphology operators, a structuring element (such as a square, circle, or cross) is passed along the edges of the objects to perform the operations. Morphology operators are typically performed on binary and grayscale images. In our examples, we apply morphology operators to binary images (2-dimensional arrays of 2 values, such as True/False, 1.0/0.0, and 255/0).

Erosion

Let’s have a look at an erosion example. We create a binary image by calling the filter_grays() function on the original RGB image. The filter_binary_erosion() function uses a disk as the structuring element that erodes the edges of the “No Grays” binary image. We demonstrate erosion with disk structuring elements of radius 5 and radius 20.

Dilation

The filter_binary_dilation() function uses a disk structuring element in a similar manner as the corresponding erosion function. We’ll utilize the same “No Grays” binary image from the previous example and dilate the image using a disk radius of 5 pixels followed by a disk radius of 20 pixels.

Opening is a fairly expensive operation because it is an erosion followed by a dilation. The compute time increases with the size of the structuring element. The 5-pixel disk radius for the structuring element results in a 0.25s operation, whereas the 20-pixel disk radius results in a 2.45s operation.

Remove small objects

The scikit-image remove_small_objects() function removes objects less than a particular minimum size. The filter_remove_small_objects() function wraps this and adds additional functionality. This can be useful for removing small islands of noise from images. We’ll demonstrate it here with two sizes, 100 pixels and 10,000 pixels, and we’ll perform this on the “No Grays” binary image.

Notice in the “No grays” binary image that we see lots of scattered, small objects.

Original slide

No grays

After removing small objects with a connected size less than 100 pixels, we see that the smallest objects have been removed from the binary image. With a minimum size of 10,000 pixels, we see that many larger objects have also been removed from the binary image.

Remove small holes

The scikit-image remove_small_holes() function is similar to the remove_small_objects() function except it removes holes rather than objects from binary images. Here we demonstrate this using the filter_remove_small_holes() function with sizes of 100 pixels and 10,000 pixels.

Fill holes

The scikit-image binary_fill_holes() function is similar to the remove_small_holes() function. Using its default settings, it generates results similar to but typically not identical to the remove_small_holes() function with a high minimum size value.

In the following code, we’ll display the result of the filter_binary_fill_holes() function on the image after gray shades have been removed. After this, we’ll perform exclusive-or operations to look at the differences between “Fill holes” and “Remove small holes” with size values of 100 and 10,000.

Entropy

The scikit-image entropy() function allows us to filter images based on complexity. Beause areas such as slide backgrounds are less complex than areas of interest such as cell nuclei, filtering on entropy offers interesting possibilities for tissue identification.

In the following code, we use the filter_entropy() function to filter the grayscale image based on entropy. We display the resulting binary image. After that, we mask the original image with the entropy mask and the inverse of the entropy mask.

The results of the original image with the inverse of the entropy mask are particularly interesting. Notice that much of the white background including the shadow region at the top of the slide has been filtered out. Additionally, notice that for the stained regions, a significant amount of the pink eosin-stained area has been filtered out while a smaller proportion of the purple-stained hemotoxylin area has been filtered out. This makes sense because hemotoxylin stains regions such as cell nuclei, which are structures with significant complexity. Therefore, entropy seems like a potential tool that could be used to identify regions of interest where mitoses are occurring.

Original with entropy mask

Original with inverse of entropy mask

A drawback of using entropy is that its computation is significant. The entropy filter takes over 3 seconds to run in this example.

The sci-kit image canny() function returns a binary edge map for the detected edges in an input image. In the following example, we call the filter_canny() function on the grayscale image and display the resulting Canny edges. After this, we crop a 600×600 area of the original slide and display it. We apply the inverse of the canny mask to the cropped original slide area and display it for comparison.

Combining filters

Because our image filters use NumPy arrays, it is straightforward to combine our filters. For example, when we have filters that return boolean images for masking, we can use standard boolean algebra on our arrays to perform operations such as AND, OR, XOR, and NOT. We can also run filters on the results of other filters.

As an example, the following code runs our green pen and blue pen filters on the original RGB image to filter out the green and blue pen marks on the slide. We combine the resulting masks with a boolean AND (&) operation, and we display the resulting mask and this mask applied to the original image, masking out the green and blue pen marks from the image.

Let’s try another combination of filters that should give us fairly good tissue segmentation for this slide, where the slide background and blue and green pen marks are removed. We can do this for this slide by ANDing together the “No Grays” filter, the “Green Channel” filter, the “No Green Pen” filter, and the “No Blue Pen” filter. Additionally, we can use our “Remove Small Objects” filter to remove small islands from the mask. We display the resulting mask. We apply this mask and the inverse of the mask to the original image to visually see which parts of the slide are passed through and which parts are masked out.

In the wsi/filter.py file, the apply_filters_to_image(slide_num, save=True, display=False) function is the primary way we apply a set of filters to an image with the goal of identifying the tissue in the slide. This function lets us see the results of each filter and the combined results of different filters. If the save parameter is True, the various filter results will be saved to the file system. If the display parameter is True, the filter results will be displayed on the screen. The function returns a tuple consisting of the resulting NumPy array image and a dictionary of information that is used elsewhere for generating an HTML page to view the various filter results for multiple slides, as we will see later.

The apply_filters_to_image() function calls the apply_image_filters() function, which creates green channel, grays, red pen, green pen, and blue pen masks and combines these into a single mask using boolean ANDs. After this, small objects are removed from the mask.

After each of the above masks is created, it is applied to the original image and the resulting image is saved to the file system, displayed to the screen, or both.

Let’s try out this function. In this example, we run apply_filters_to_image() on slide #337 and display the results to the screen.

filter.apply_filters_to_image(337, display=True, save=False)

Note that this function uses the scaled-down .png image for slide #337. If we have not generated .png images for all of the slides (typically by calling slide.multiprocess_training_slides_to_images()), we can generate the individual scaled-down .png image and then apply the filters to this image.

We see the original slide #337 and the green channel filter applied to it. The original slide is marked as 0.12% masked because a small number of pixels in the original image are black (0 values for the red, green, and blue channels). Notice that the green channel filter with a default threshold of 200 removes most of the white background but only a relatively small fraction of the green pen. The green channel filter masks 72.60% of the original slide.

Slide 337, F001

Slide 337, F002

Here, we see the results of the grays filter and the red pen filter. For this slide, the grays filter masks 68.01% of the slide, which is actually less than the green channel filter. The red pen filter masks only 0.18% of the slide, which makes sense because there are no red pen marks on the slide.

Slide 337, F003

Slide 337, F004

The green pen filter masks 3.81% of the slide. Visually, we see that it does a decent job of masking out the green pen marks on the slide. The blue pen filter masks 0.12% of the slide, which is accurate because there are no blue pen marks on the slide.

Slide 337, F005

Slide 337, F006

Combining the previous filters with a boolean AND results in 74.57% masking. Cleaning up these results by removing small objects results in a masking of 76.11%. This potentially gives a good tissue segmentation that we can use for deep learning.

Slide 337, F007

Slide 337, F008

In the console, we see the slide #337 processing time takes approximately 12.6 seconds in this example. The filtering is only a relatively small fraction of this time. If we set display to False, processing only takes approximately 2.3 second.

Because the apply_filters_to_image() function returns the resulting image as a NumPy array, we can perform further processing on the image. If we look at the apply_filters_to_image() results for slide #337, we can see that some grayish greenish pen marks remain on the slide. We can filter out some of these using our filter_green() function with different threshold values and our filter_grays() function with an increased tolerance value.

We’ll compare the results by cropping two regions of the slide before and after this additional processing and displaying all four of these regions together.

After the additional processing, we see that the pen marks in the displayed regions have been significantly reduced.

Remove more green and more gray

As another example, here we can see a summary of filters applied to a slide by apply_filters_to_image() and the resulting masked image.

Filter Example

Applying filters to multiple images

When designing our set of tissue-selecting filters, one very important requirement is the ability to visually inspect the filter results across multiple slides. Ideally, we should easily be able to alternate between displaying the results for a single image, a select subset of our training image data set, and our entire data set. Additionally, multiprocessing can result in a significant performance boost, so we should be able to multiprocess our image processing if desired.

A simple, powerful way to visually inspect our filter results is to generate an HTML page for a set of images.

The following functions in the wsi/filter.py file can be used to apply filters to multiple images:

The apply_filters_to_image_list() function takes a list of image numbers for processing. It does not generate an HTML page but it does generate information that other functions can use to generate an HTML page. The save parameter if True will save the generated images to the file system. If the display parameter is True, the generated images will be displayed to the screen. If several slides are being processed, display should be set to False.

The apply_filters_to_image_range() function is similar to the apply_filters_to_image_list() function except that rather than taking a list of image numbers, it takes a starting index number and ending index number for the slides in the training set. Like the apply_filters_to_image_list() function, the apply_filters_to_image_range() function has save and display parameters.

The singleprocess_apply_filters_to_images() and multiprocess_apply_filters_to_images() functions are the primary functions that should be called to apply filters to multiple images. Both of these functions feature save and display parameters. The additional html parameter if True generates an HTML page for displaying the filter results on the image set. The singleprocess_apply_filters_to_images() and multiprocess_apply_filters_to_images() functions also feature an image_num_list parameter that specifies a list of image numbers that should be processed. If image_num_list is not supplied, all training images are processed.

As an example, let’s use a single process to apply our filters to images 1, 2, and 3. We can accomplish this with the following code:

In addition to saving the filtered images to the file system, this creates a filters.html file that displays all of the filtered slide images. If we open the filters.html file in a browser, we can see 8 images displayed for each slide. Each separate slide is displayed as a separate row. In the following images, we see the filter results for slides #1, #2, and #3 displayed in a browser.

Filters for slides 1, 2, 3

To apply all filters to all images in the training set using multiprocessing, we can use the
multiprocess_apply_filters_to_images() function. Because there are 9 generated images per slide
(8 of which are shown in the HTML summary) and 500 slides, this results in a total of 4,500 images
and 4,500 thumbnails. Generating .png images and .jpg thumbnails, this takes approximately 24 minutes on
my MacBook Pro.

filter.multiprocess_apply_filters_to_images()

If we display the filters.html file in a browser, we see that the filter results for the first 50 slides are displayed. By default, results are paginated at 50 slides per page. Pagination can be turned on and off using the FILTER_PAGINATE constant. The pagination size can be adjusted using the FILTER_PAGINATION_SIZE constant.

One useful action we can take is to group similar slides into categories. For example,
we can group slides into sets that have red, green, and blue pen marks on them.

In this way, we can make tweaks to specific filters or combinations of specific filters and see how these changes apply to the subset of relevant training images without requiring reprocessing of the entire training data set.

Red pen slides with filter results

Overmask avoidance

When developing filters and filter settings to perform tissue segmentation on the entire training
set, we have to deal with a great amount of variation in the slide samples. To begin with, some slides have a large amount of tissue on them, while other slides only have a minimal amount of tissue. There is a great deal of variation in tissue staining. We also need to deal with additional issues such as pen marks and shadows on some of the slides.

Slide #498 is an example of a slide with a large tissue sample. After filtering, the slide is 46% masked.

Slide with large tissue sample

Slide with large tissue sample after filtering

Slide #127 is an example of a small tissue sample. After filtering, the slide is 93% masked. With such a small tissue sample to begin with, we need to be careful that our filters don’t overmask this slide.

Slide with small tissue sample

Slide with small tissue sample after filtering

Being aggressive in our filtering might generate excellent results for many of the slides but might
result in overmasking of other slides, where the amount of non-tissue masking is too high. For example, if 99% of a slide is masked, it has been overmasked.

Avoiding overmasking across the entire training data set can be difficult. For example, suppose we have a slide that has only a proportionally small amount of tissue on it to start, say 10%. If this particular tissue sample has been poorly stained so that it is perhaps a light purplish grayish color, applying our grays or green channel filters might result in a significant portion of the tissue being masked out. This could also potentially result in small islands of non-masked tissue, and because we use a filter to remove small objects, this could result in the further masking out of additional tissue regions. In such a situation, masking of 95% to 100% of the slide is possible.

Slide #424 has a small tissue sample and its staining is a very faint lavender color. Slide #424 is
at risk for overmasking with our given combination of filters.

Slide with small tissue sample and faint staining

Therefore, rather than having fixed settings, we can automatically have our filters tweak parameter values to avoid overmasking, if desired. As examples, the filter_green_channel() and filter_remove_small_objects() functions have this ability. If masking exceeds a certain overmasking threshold, a parameter value can be changed to lower the amount of masking until the masking is below the overmasking threshold.

For the filter_green_channel() function, if a green_thresh value of 200 results in masking over 90%, the function will try with a higher green_thresh value (228) and the masking level will be checked. This will continue until the masking doesn’t exceed the overmask threshold of 90% or the threshold is 255.

For the filter_remove_small_objects() function, if a min_size value of 3000 results in a masking level over 95%, the function will try with a lower min_size value (1500) and the masking level will be checked. These min_size reductions will continue until the masking level isn’t over 95% or the minimum size is 0. For the image filtering specified in the apply_image_filters function, a starting min_size value of 500 for filter_remove_small_objects() is used.

Examining our full set of images using the multiprocess_apply_filters_to_images() function, we can identify slides that are at risk for overmasking. We can create a list of these slide numbers and use multiprocess_apply_filters_to_images() with this list of slide numbers to generate the filters.html page that allows us to visually inspect the filters applied to this set of slides.

Let’s have a look at how we reduce overmasking on slide 21, which is a slide that has very faint staining.

Slide 21

We’ll run our filters on slide #21.

filter.singleprocess_apply_filters_to_images(image_num_list=[21])

If we set the filter_green_channel() and filter_remove_small_objects()avoid_overmask parameters to False, 97.69% of the original image is masked by the “green channel” filter and 99.92% of the original image is masked by the subsequent “remove small objects” filter. This is significant overmasking.

Overmasked by green channel filter (97.69%)

Overmasked by remove small objects filter (99.92%)

If we set avoid_overmask to True for the filter_remove_small_objects() function, we see that the “remove small objects” filter does not perform any further masking because the 97.69% masking from the previous “green channel” filter already exceeds its overmasking threshold of 95%.

Overmasked by green channel filter (97.69%)

Avoid overmask by remove small objects filter (97.69%)

If we set avoid_overmask back to False for the filter_remove_small_objects() function and we set avoid_overmask to True for the filter_green_channel() function, we see that 87.91% of the original image is masked by the “green channel” filter (under the 90% overmasking threshold for the filter) and 97.40% of the image is masked by the subsequent “remove small objects” filter.

Avoid overmask by green channel filter (87.91%)

Overmask by remove small objects filter (97.40%)

If we set avoid_overmask to True for both the filter_green_channel() and filter_remove_small_objects() functions, we see that the resulting masking after the “remove small objects” filter has been reduced to 94.88%, which is under its overmasking threshold of 95%.

Avoid overmask by green channel filter (87.91%)

Avoid overmask by remove small objects filter (94.88%)

Thus, in this example, we’ve reduced the masking from 99.92% to 94.88%.

Summary

This third article in the automatic identification of tissues from big whole-slide images series explained morphology operators and how we combined filters and applied filters to multiple images. In Part 4, we’ll end the series with a discussion on tiling and top tile retrieval.