Thesis Examination Committee

Prof Weichuan YU, ECE/HKUST (Chairperson)

Prof Tim CHENG, ECE/HKUST (Thesis Supervisor)

Prof Albert CHUNG, CSE/HKUST

Abstract

Whole slide histopathological images, providing rich information about pathological changes of tissue microenvironment, is an important clue for to the diagnosis and prognosis of cancers. Automatically distinguishing cancers from giga-pixel whole slide image (WSI) is one of the biggest challenges in medical imaging.

Most recent advances in automated methods for WSI classification employ a patch-classifier based two-stage classification flow, i.e. patch-level classification followed by a WSI-level aggregation. The training of patch classifiers requires supervised or weak-supervised labels of each patches. This process either relies on exact labeled patches, which is time-consuming to collect and not available in most cases or is based on the assumption that most patches of a WSI are discriminative for diagnosis, otherwise the weakly-supervised WSI-level labels couldn't be used to surrogate the patch-level labels in patch-classifier training. However, in many cases tumor regions only occupy a small part of WSIs. Such label-surrogate methods aren't valid any more.

In this thesis we propose a data-driven feature aggregation approach which could learn to automatically identify discriminative features for diagnosis. Specifically, given all the patches of each WSI, we first cluster them into different clusters based on their phenotypes. Average pooling is applied among patches in each cluster. The clustering centroids are collected into a sequence according to their likelihood of being a tumor pattern. Then a data-driven approach, recurrent neural network is trained on the aggregated features and gives us a slide-level inference.

To demonstrate the effectiveness of our method in recognizing rare cancers, we use our method to predict slide-based of macro and micro metastases in sentinel lymph nodes of breast cancer patients from Camelyon dataset, where only small regions in tumor whole slides have discriminative tumor patterns. To the best of our knowledge we are the first to analyze the dataset without supervised patch-level label.