SYMPOSIUM  ORIGINAL ARTICLE Year : 2013  Volume : 4  Issue : 1  Page : 11 A gammagaussian mixture model for detection of mitotic cells in breast cancer histopathology images Adnan Mujahid Khan^{1}, Hesham ElDaly^{2}, Nasir M Rajpoot^{3}, ^{1} Department of Computer Science, University of Warwick, Coventry, UK ^{2} Department of Pathology, Addenbrookes Hospital, Cambridge, UK ^{3} Department of Computer Science, University of Warwick, Coventry, UK; Department of Computer Science and Engineering, Qatar University, Qatar Correspondence Address: In this paper, we propose a statistical approach for mitosis detection in breast cancer histological images. The proposed algorithm models the pixel intensities in mitotic and nonmitotic regions by a GammaGaussian mixture model (GGMM) and employs a context aware postprocessing (CAPP) in order to reduce false positives. Experimental results demonstrate the ability of this simple, yet effective method to detect mitotic cells (MCs) in standard H & E breast cancer histology images. Context: Counting of MCs in breast cancer histopathology images is one of three components (the other two being tubule formation, nuclear pleomorphism) required for developing computer assisted grading of breast cancer tissue slides. This is very challenging since the biological variability of the MCs makes their detection extremely difficult. In addition, if standard H & E is used (which stains chromatin rich structures, such as nucleus, apoptotic, and MCs dark blue) and it becomes extremely difficult to detect the latter given the fact that former two are densely localized in the tissue sections. Aims: In this paper, a robust MCs detection technique is developed and tested on 35 breast histopathology images, belonging to five different tissue slides. Settings and Design: Our approach mimics a pathologists«SQ» approach to MCs detections. The idea is (1) to isolate tumor areas from nontumor areas (lymphoid/inflammatory/apoptotic cells), (2) search for MCs in the reduced space by statistically modeling the pixel intensities from mitotic and nonmitotic regions, and finally (3) evaluate the context of each potential MC in terms of its texture. Materials and Methods: Our experimental dataset consisted of 35 digitized images of breast cancer biopsy slides with paraffin embedded sections stained with H and E and scanned at × 40 using an Aperio scanscope slide scanner. Statistical Analysis Used: We propose GGMM for detecting MCs in breast histology images. Image intensities are modeled as random variables sampled from one of the two distributions; Gamma and Gaussian. Intensities from MCs are modeled by a gamma distribution and those from nonmitotic regions are modeled by a gaussian distribution. The choice of GammaGaussian distribution is mainly due to the observation that the characteristics of the distribution match well with the data it models. The experimental results show that the proposed system achieves a high sensitivity of 0.82 with positive predictive value (PPV) of 0.29. Employing CAPP on these results produce 241% increase in PPV at the cost of less than 15% decrease in sensitivity. Conclusions: In this paper, we presented a GGMM for detection of MCs in breast cancer histopathological images. In addition, we introduced CAPP as a tool to increase the PPV with a minimal loss in sensitivity. We evaluated the performance of the proposed detection algorithm in terms of sensitivity and PPV over a set of 35 breast histology images selected from five different tissue slides and showed that a reasonably high value of sensitivity can be retained while increasing the PPV. Our future work will aim at increasing the PPV further by modeling the spatial appearance of regions surrounding mitotic events.
Introduction Counting of mitotic cells (MCs) in breast histopathology images is one of three components (the other two being tubule formation, nuclear pleomorphism) required for developing computer assisted grading of breast cancer tissue slides. [1] This is very challenging since the biological variability of the MCs makes their detection extremely difficult [Figure 1]. In addition, if standard H & E is used (which stains chromatin rich structures, such as nucleus, apoptotic cells, and MCs dark blue) and it becomes extremely difficult to detect the later given the fact that former two are densely localized in the tissue sections. As a consequence, two categories of relevant works have been reported in literature. One that use an additional stain (e.g., PHH3) to stain MCs exclusively and detect exclusively stained MCs in the images. [2] Other that use a video sequence to detect MCs over time by incorporating spatiotemporal information. [3] Since the exclusive stain costs additionally and videos are not at all used in standard histopathological practices, therefore a gap exists in the literature.{Figure 1} In this paper, a robust MCs detection technique is developed and tested on 35 breast histopathology images, belonging to five different tissue slides. To the best of our knowledge, there is no existing method in the literature for detection of MCs in standard H and E, breast histology images. The proposed method mimics a pathologist's approach to MCs detection under microscope. The main idea is to isolate tumor region from nontumor areas (lymphoid/inflammatory/apoptotic cells) and search for MCs in the reduced space by statistically modeling the pixel intensities from mitotic and nonmitotic regions. In order to further enhance the positive predictive value (PPV), context aware postprocessing (CAPP) has been introduced. The experimental results show that the proposed system achieves a high sensitivity of 0.82 with PPV of 0.29. Employing CAPP on these results produce 241% increase in PPV at the cost of lesser than 15% decrease in sensitivity. The Proposed Algorithm Stain Normalization Tissue staining is commonly used to highlight distinct structures in histology images. Among many different stains, H & E is one of the most commonly used. It selectively stains nuclei structures blue and cytoplasm pink. Although staining enables better visualization of tissue structures; however, due to nonstandardization in histopathological work flow, stained images vary a lot in terms of color, and intensity. Stain normalization is used to achieve a consistent color and intensity appearance. We found the algorithm proposed by Magee et al.[4] very effective for normalizing histology images. Tumor Segmentation Breast cancer histology images can be divided into two regions: tumor and nontumor. MCs may exist in both tumor and nontumor regions howeve only those MCs are considered for grading that are present in tumor regions. Therefore, an intelligent MCs detection system must first remove nontumor areas from the tissue slide in order to minimize the search space. We have used a feature based texture segmentation framework random projections with ensemble clustering [5] to segment tumor regions. Broadly, the algorithm follows the following pipeline: (1) a library of texture features is computed over a range of scales and orientations, (2) low dimensional embedding (using random projections) is performed to avoid overfitting and curse of dimensionality, and finally (3) tumor segmentation is performed in low dimensional space. This produces an accurate and totally unsupervised tumor segmentation. In order to account MCs present on the boundary of tumor and nontumor regions, morphological dilation on tumor segmentation results is performed. Although it increases the chances of detecting boundary MCs, yet it also includes some lymphoid/inflammatory cells into the tumor regions, that appear as false positives (FPs) when detecting MCs in breast histology slides. Statistical Modeling of MCs MCs appear as relatively dark, jagged, and irregularly textured structures [Figure 1]. Owing to sectioning artifacts, some appear too dim to notice with a naked eye. In terms of shape, color and textural characteristics, lymphoid/inflammatory cells and apoptotic cells that are densely present in tissue slides possess almost similar characteristics; thus, could easily be confused with MCs. In this paper, we propose gammagaussian mixture model (GGMM) for detecting MCs in breast histology images. Image intensities (L channel of La*b* color space) are modeled as random variables sampled from one of the two distributions; gammagaussian. Intensities from MCs are modeled by a Gamma distribution and those from nonmitotic regions are modeled by a Gaussian distribution. The choice of gammagaussian distribution is mainly due to the observation that the characteristics of the distribution match well with the data it models [Figure 2].{Figure 2} GGMM [Figure 2] shows two marginal distributions (solid lines) and their fitted models (dotted lines). The left and the right marginal distributions show the probability distributions of pixels belonging to mitotic and nonmitotic regions respectively. Close fit to the marginal distributions was achieved by GGMM. The GGMM is a parametric technique for estimating probability density function. In our context, it can be formulated as follows. For pixel intensities x, the proposed mixture model is given by: [INLINE:1] where ρ1 and ρ2 represent the mixing proportions (priors) of intensities belonging to mitotic and nonmitotic regions, and ρ1 + ρ2 = 1. Γ (x, 0α, β ) represents the gamma density function parameterized by α (the shape parameter) and β (the scale parameter). G (x, μ, σ) represents Gaussian density function parameterized by μ (mean) and σ (standard deviation). θ = [α, β, μ, σ, ρ1, ρ2] represents the vector of all unknown parameters in the model. Parameter Estimation In order to estimate unknown parameters (θ), we employ maximum likelihood estimation (MLE). Given image intensities x i, i = 1, 2,…., n where n is number of pixels, loglikelihood function (l) of parameter vector θ is given by [INLINE:2] where f (xi ; θ) is the mixture density function in equation (1). The MLE of θ can be represented by [INLINE:3] A convenient approach to obtain a numerical solution to the above maximization problem is provided by the expectation maximization (EM) algorithm. [6] In our context, the EM algorithm can be set up as follows. Let z ik, k = 1, 2, be indicator variables showing the component membership of each pixel x i in the mixture model. (1) Note that these indicator variables are hidden (unobserved). The loglikelihood (2) can be extended as follows: [INLINE:4] The EM algorithm finds iteratively as outlined in [Algorithm 1[SUPPORTING:1]]. Let θ(m) be the estimate of θ after m iterations of the Algorithm 1. The EM algorithm seeks to find the MLE of the marginal likelihood by iteratively applying Expectation and Maximization steps. Classification The posterior probabilities of a pixel x i belonging to class 1 (Mitotic) or 2 (NonMitotic) are calculated as follows, [INLINE:5] Given the pixelwise posterior probability maps, Otsu thresholding is then used to classify mitotic and nonmitotic pixels. It was found empirically that the area of MC was between 60 and 1,000 pixels. Therefore, area thresholding is performed to remove all potentially mitotic regions having area out of this range. CAPP The results produced as a result of the algorithmic steps stated so far achieve 86% sensitivity, however given a large no of similar looking objects (apoptotic cells, lymphoid/inflammatory cells, etc), a number of FPs are also obtained. In order to reduce the FPs without significantly reducing sensitivity, CAPP is performed on the classification results. A small context window [Figure 3] is defined around the bounding box of each potentially MC. In each context window, four representative features are computed over a set of textural features. The representative features are used to train a support vector machine (SVM) classifier using a Gaussian kernel. The trained classifier is then used to predict unseen candidate contexts of mcs.{Figure 3} Results Our experimental dataset consisted of 35 digitized images of breast cancer biopsy slides with paraffin embedded sections stained with H and E and scanned at × 40 using an Aperio ScanScope slide scanner. After stain normalization, background removal and unsupervised tumor segmentation over all 35 images, seven images were selected to extract mitotic and nonmitotic pixel intensities (L channel of La*b* color space) for model fitting using GGMM. We chose 500 iterations and tolerance (f = 0.01) for the EM algorithm. Although EM provides estimates of priors (ρ1 and ρ2 ), a more accurate estimate of priors (ρ1 = 0.0014 and ρ2 = 0.9986) was used based on the ratio of mitotic and nonmitotic data used for model fitting. [Figure 4] shows the plot of senstivity against PPV when areathreshold is varied on the candidate MCs. The set of textural features extracted from a window of size 30 × 30 pixels around the bounding box of each candidate mitosis are as follows: 32 Phase Gradient (PG) features (16 orientations, 2 scales), [7] 1 roughness feature, 1 entropy feature. From each of these 34 features, 4 representative features were computed: (1) mean, (2) standard deviation, (3) skewness, (4) kurtosis. This gave a 136dimensional features vector for each pixel inside the context window. The resulting 136 dimensional vector was used in training and testing of SVM.{Figure 4} Since the data consisting of candidate potential MCs, identified before CAPP was applied, was unbalanced (mitotic29.1%, nonmitotic70.9%) and therefore a balanced mix of mitotic and nonmitotic examples were randomly selected as training data. A total of 69.90% of data was used for training and remaining 30.10% for testing. Grid search was used to find optimal parameters for the Gaussian kernel of the in SVM. [Figure 5] demonstrates efficacy of the proposed MCs detection algorithm.{Figure 5} A higher penalty for misclassification in the SVM was set for mitotic class, since the original data was unbalanced. [Table 1] provides details of the quantitative results obtained with a fivefold crossvalidation. According to these results, more than 200% of PPV was enhanced at the cost of lesser than 15% reduction in sensitivity.{Table 1} Conclusion In this paper, we presented GGMM for detection of MCs in breast cancer histopathological images. In addition, we introduced CAPP as a tool to increase the PPV with a minimal loss in sensitivity. We evaluated the performance of the proposed detection algorithm in terms of sensitivity and PPV over a set of 35 breast histology images selected from 5 different tissue slides and showed that a reasonably high value of sensitivity can be retained although increasing the PPV. Our future work will aim at increasing the PPV further by modeling the spatial appearance of regions surrounding mitotic events. Acknowledgments The authors would like to thank the organizers of International Conference on Pattern Recognition (ICPR) 2012 contest for mitosis detection in breast cancer. The images used in this paper are part of MITOS dataset, a dataset setup for ANR French project MICO. The authors would also like to thank Dr. Derek Magee for sharing the executable for his algorithm for stain normalization. The first author gratefully acknowledges the financial support provided by Warwick Postgraduate Research Scholarship scheme and the Department of Computer Science at the University of Warwick. References


