Journal of Pathology Informatics

SYMPOSIUM - ORIGINAL ARTICLE
Year
: 2013  |  Volume : 4  |  Issue : 1  |  Page : 11-

A gamma-gaussian mixture model for detection of mitotic cells in breast cancer histopathology images


Adnan Mujahid Khan1, Hesham ElDaly2, Nasir M Rajpoot3,  
1 Department of Computer Science, University of Warwick, Coventry, UK
2 Department of Pathology, Addenbrookes Hospital, Cambridge, UK
3 Department of Computer Science, University of Warwick, Coventry, UK; Department of Computer Science and Engineering, Qatar University, Qatar

Correspondence Address:
Nasir M Rajpoot
Department of Computer Science, University of Warwick, Coventry, UK; Department of Computer Science and Engineering, Qatar University, Qatar

Abstract

In this paper, we propose a statistical approach for mitosis detection in breast cancer histological images. The proposed algorithm models the pixel intensities in mitotic and non-mitotic regions by a Gamma-Gaussian mixture model (GGMM) and employs a context aware post-processing (CAPP) in order to reduce false positives. Experimental results demonstrate the ability of this simple, yet effective method to detect mitotic cells (MCs) in standard H & E breast cancer histology images. Context: Counting of MCs in breast cancer histopathology images is one of three components (the other two being tubule formation, nuclear pleomorphism) required for developing computer assisted grading of breast cancer tissue slides. This is very challenging since the biological variability of the MCs makes their detection extremely difficult. In addition, if standard H & E is used (which stains chromatin rich structures, such as nucleus, apoptotic, and MCs dark blue) and it becomes extremely difficult to detect the latter given the fact that former two are densely localized in the tissue sections. Aims: In this paper, a robust MCs detection technique is developed and tested on 35 breast histopathology images, belonging to five different tissue slides. Settings and Design: Our approach mimics a pathologists«SQ» approach to MCs detections. The idea is (1) to isolate tumor areas from non-tumor areas (lymphoid/inflammatory/apoptotic cells), (2) search for MCs in the reduced space by statistically modeling the pixel intensities from mitotic and non-mitotic regions, and finally (3) evaluate the context of each potential MC in terms of its texture. Materials and Methods: Our experimental dataset consisted of 35 digitized images of breast cancer biopsy slides with paraffin embedded sections stained with H and E and scanned at × 40 using an Aperio scanscope slide scanner. Statistical Analysis Used: We propose GGMM for detecting MCs in breast histology images. Image intensities are modeled as random variables sampled from one of the two distributions; Gamma and Gaussian. Intensities from MCs are modeled by a gamma distribution and those from non-mitotic regions are modeled by a gaussian distribution. The choice of Gamma-Gaussian distribution is mainly due to the observation that the characteristics of the distribution match well with the data it models. The experimental results show that the proposed system achieves a high sensitivity of 0.82 with positive predictive value (PPV) of 0.29. Employing CAPP on these results produce 241% increase in PPV at the cost of less than 15% decrease in sensitivity. Conclusions: In this paper, we presented a GGMM for detection of MCs in breast cancer histopathological images. In addition, we introduced CAPP as a tool to increase the PPV with a minimal loss in sensitivity. We evaluated the performance of the proposed detection algorithm in terms of sensitivity and PPV over a set of 35 breast histology images selected from five different tissue slides and showed that a reasonably high value of sensitivity can be retained while increasing the PPV. Our future work will aim at increasing the PPV further by modeling the spatial appearance of regions surrounding mitotic events.



How to cite this article:
Khan AM, ElDaly H, Rajpoot NM. A gamma-gaussian mixture model for detection of mitotic cells in breast cancer histopathology images.J Pathol Inform 2013;4:11-11


How to cite this URL:
Khan AM, ElDaly H, Rajpoot NM. A gamma-gaussian mixture model for detection of mitotic cells in breast cancer histopathology images. J Pathol Inform [serial online] 2013 [cited 2019 Oct 23 ];4:11-11
Available from: http://www.jpathinformatics.org/text.asp?2013/4/1/11/112696


Full Text

 Introduction



Counting of mitotic cells (MCs) in breast histopathology images is one of three components (the other two being tubule formation, nuclear pleomorphism) required for developing computer assisted grading of breast cancer tissue slides. [1] This is very challenging since the biological variability of the MCs makes their detection extremely difficult [Figure 1]. In addition, if standard H & E is used (which stains chromatin rich structures, such as nucleus, apoptotic cells, and MCs dark blue) and it becomes extremely difficult to detect the later given the fact that former two are densely localized in the tissue sections. As a consequence, two categories of relevant works have been reported in literature. One that use an additional stain (e.g., PHH3) to stain MCs exclusively and detect exclusively stained MCs in the images. [2] Other that use a video sequence to detect MCs over time by incorporating spatio-temporal information. [3] Since the exclusive stain costs additionally and videos are not at all used in standard histopathological practices, therefore a gap exists in the literature.{Figure 1}

In this paper, a robust MCs detection technique is developed and tested on 35 breast histopathology images, belonging to five different tissue slides. To the best of our knowledge, there is no existing method in the literature for detection of MCs in standard H and E, breast histology images. The proposed method mimics a pathologist's approach to MCs detection under microscope. The main idea is to isolate tumor region from non-tumor areas (lymphoid/inflammatory/apoptotic cells) and search for MCs in the reduced space by statistically modeling the pixel intensities from mitotic and non-mitotic regions. In order to further enhance the positive predictive value (PPV), context aware post-processing (CAPP) has been introduced. The experimental results show that the proposed system achieves a high sensitivity of 0.82 with PPV of 0.29. Employing CAPP on these results produce 241% increase in PPV at the cost of lesser than 15% decrease in sensitivity.

 The Proposed Algorithm



Stain Normalization

Tissue staining is commonly used to highlight distinct structures in histology images. Among many different stains, H & E is one of the most commonly used. It selectively stains nuclei structures blue and cytoplasm pink. Although staining enables better visualization of tissue structures; however, due to non-standardization in histopathological work flow, stained images vary a lot in terms of color, and intensity. Stain normalization is used to achieve a consistent color and intensity appearance. We found the algorithm proposed by Magee et al.[4] very effective for normalizing histology images.

Tumor Segmentation

Breast cancer histology images can be divided into two regions: tumor and non-tumor. MCs may exist in both tumor and non-tumor regions howeve only those MCs are considered for grading that are present in tumor regions. Therefore, an intelligent MCs detection system must first remove non-tumor areas from the tissue slide in order to minimize the search space. We have used a feature based texture segmentation frame-work random projections with ensemble clustering [5] to segment tumor regions. Broadly, the algorithm follows the following pipeline: (1) a library of texture features is computed over a range of scales and orientations, (2) low dimensional embedding (using random projections) is performed to avoid overfitting and curse of dimensionality, and finally (3) tumor segmentation is performed in low dimensional space. This produces an accurate and totally unsupervised tumor segmentation.

In order to account MCs present on the boundary of tumor and non-tumor regions, morphological dilation on tumor segmentation results is performed. Although it increases the chances of detecting boundary MCs, yet it also includes some lymphoid/inflammatory cells into the tumor regions, that appear as false positives (FPs) when detecting MCs in breast histology slides.

Statistical Modeling of MCs

MCs appear as relatively dark, jagged, and irregularly textured structures [Figure 1]. Owing to sectioning artifacts, some appear too dim to notice with a naked eye. In terms of shape, color and textural characteristics, lymphoid/inflammatory cells and apoptotic cells that are densely present in tissue slides possess almost similar characteristics; thus, could easily be confused with MCs.

In this paper, we propose gamma-gaussian mixture model (GGMM) for detecting MCs in breast histology images. Image intensities (L channel of La*b* color space) are modeled as random variables sampled from one of the two distributions; gamma-gaussian. Intensities from MCs are modeled by a Gamma distribution and those from non-mitotic regions are modeled by a Gaussian distribution. The choice of gamma-gaussian distribution is mainly due to the observation that the characteristics of the distribution match well with the data it models [Figure 2].{Figure 2}

GGMM

[Figure 2] shows two marginal distributions (solid lines) and their fitted models (dotted lines). The left and the right marginal distributions show the probability distributions of pixels belonging to mitotic and non-mitotic regions respectively. Close fit to the marginal distributions was achieved by GGMM. The GGMM is a parametric technique for estimating probability density function. In our context, it can be formulated as follows.

For pixel intensities x, the proposed mixture model is given by:

[INLINE:1]

where ρ1 and ρ2 represent the mixing proportions (priors) of intensities belonging to mitotic and non-mitotic regions, and ρ1 + ρ2 = 1. Γ (x, 0α, β ) represents the gamma density function parameterized by α (the shape parameter) and β (the scale parameter). G (x, μ, σ) represents Gaussian density function parameterized by μ (mean) and σ (standard deviation). θ = [α, β, μ, σ, ρ1, ρ2] represents the vector of all unknown parameters in the model.

Parameter Estimation

In order to estimate unknown parameters (θ), we employ maximum likelihood estimation (MLE). Given image intensities x i, i = 1, 2,…., n where n is number of pixels, log-likelihood function (l) of parameter vector θ is given by

[INLINE:2]

where f (xi ; θ) is the mixture density function in equation (1). The MLE of θ can be represented by

[INLINE:3]

A convenient approach to obtain a numerical solution to the above maximization problem is provided by the expectation maximization (EM) algorithm. [6] In our context, the EM algorithm can be set up as follows.

Let z ik, k = 1, 2, be indicator variables showing the component membership of each pixel x i in the mixture model. (1) Note that these indicator variables are hidden (unobserved). The log-likelihood (2) can be extended as follows:

[INLINE:4]

The EM algorithm finds iteratively as outlined in [Algorithm 1[SUPPORTING:1]]. Let θ(m) be the estimate of θ after m iterations of the Algorithm 1. The EM algorithm seeks to find the MLE of the marginal likelihood by iteratively applying Expectation and Maximization steps.

Classification

The posterior probabilities of a pixel x i belonging to class 1 (Mitotic) or 2 (Non-Mitotic) are calculated as follows,

[INLINE:5]

Given the pixel-wise posterior probability maps, Otsu thresholding is then used to classify mitotic and non-mitotic pixels. It was found empirically that the area of MC was between 60 and 1,000 pixels. Therefore, area thresholding is performed to remove all potentially mitotic regions having area out of this range.

 CAPP



The results produced as a result of the algorithmic steps stated so far achieve 86% sensitivity, however given a large no of similar looking objects (apoptotic cells, lymphoid/inflammatory cells, etc), a number of FPs are also obtained. In order to reduce the FPs without significantly reducing sensitivity, CAPP is performed on the classification results. A small context window [Figure 3] is defined around the bounding box of each potentially MC. In each context window, four representative features are computed over a set of textural features. The representative features are used to train a support vector machine (SVM) classifier using a Gaussian kernel. The trained classifier is then used to predict unseen candidate contexts of mcs.{Figure 3}

 Results



Our experimental dataset consisted of 35 digitized images of breast cancer biopsy slides with paraffin embedded sections stained with H and E and scanned at × 40 using an Aperio ScanScope slide scanner. After stain normalization, background removal and unsupervised tumor segmentation over all 35 images, seven images were selected to extract mitotic and non-mitotic pixel intensities (L channel of La*b* color space) for model fitting using GGMM. We chose 500 iterations and tolerance (f = 0.01) for the EM algorithm. Although EM provides estimates of priors (ρ1 and ρ2 ), a more accurate estimate of priors (ρ1 = 0.0014 and ρ2 = 0.9986) was used based on the ratio of mitotic and non-mitotic data used for model fitting. [Figure 4] shows the plot of senstivity against PPV when area-threshold is varied on the candidate MCs.

The set of textural features extracted from a window of size 30 × 30 pixels around the bounding box of each candidate mitosis are as follows: 32 Phase Gradient (PG) features (16 orientations, 2 scales), [7] 1 roughness feature, 1 entropy feature. From each of these 34 features, 4 representative features were computed: (1) mean, (2) standard deviation, (3) skewness, (4) kurtosis. This gave a 136-dimensional features vector for each pixel inside the context window. The resulting 136 dimensional vector was used in training and testing of SVM.{Figure 4}

Since the data consisting of candidate potential MCs, identified before CAPP was applied, was unbalanced (mitotic-29.1%, non-mitotic-70.9%) and therefore a balanced mix of mitotic and non-mitotic examples were randomly selected as training data. A total of 69.90% of data was used for training and remaining 30.10% for testing. Grid search was used to find optimal parameters for the Gaussian kernel of the in SVM. [Figure 5] demonstrates efficacy of the proposed MCs detection algorithm.{Figure 5}

A higher penalty for misclassification in the SVM was set for mitotic class, since the original data was unbalanced. [Table 1] provides details of the quantitative results obtained with a five-fold cross-validation. According to these results, more than 200% of PPV was enhanced at the cost of lesser than 15% reduction in sensitivity.{Table 1}

 Conclusion



In this paper, we presented GGMM for detection of MCs in breast cancer histopathological images. In addition, we introduced CAPP as a tool to increase the PPV with a minimal loss in sensitivity. We evaluated the performance of the proposed detection algorithm in terms of sensitivity and PPV over a set of 35 breast histology images selected from 5 different tissue slides and showed that a reasonably high value of sensitivity can be retained although increasing the PPV. Our future work will aim at increasing the PPV further by modeling the spatial appearance of regions surrounding mitotic events.

 Acknowledgments



The authors would like to thank the organizers of International Conference on Pattern Recognition (ICPR) 2012 contest for mitosis detection in breast cancer. The images used in this paper are part of MITOS dataset, a dataset setup for ANR French project MICO. The authors would also like to thank Dr. Derek Magee for sharing the executable for his algorithm for stain normalization. The first author gratefully acknowledges the financial support provided by Warwick Post-graduate Research Scholarship scheme and the Department of Computer Science at the University of Warwick.

References

1Elston CW, Ellis IO. Pathological prognostic factors in breast cancer. I. The value of histological grade in breast cancer: Experience from a large study with long-term follow-up. Histopathology 1991;19:403-10.
2Roullier V, Lézoray O, Ta VT, Elmoataz A. Multi-resolution graph-based analysis of histopathological whole slide images: Application to mitotic cell extraction and visualization. Comput Med Imaging Graph 2011;35:603-15.
3Huh S, Ker DF, Bise R, Chen M, Kanade T. Automated mitosis detection of stem cell populations in phase-contrast microscopy images. IEEE Trans Med Imaging 2011;30:586-96.
4Magee D, Treanor D, Chomphuwiset P, Quirke P. Context aware colour classification in digital microscopy. In: Proceedings Medical Image Understanding and Analysis. United Kingdom: British Machine Vision Association (BMVA); 2010. p. 1-5.
5Khan AM, El-Daly H, Rajpoot N. Ran PE. Random projections with ensemble clustering for segmentation of tumor areas in breast histology images. In: Medical Image Understanding and Analysis (MIUA). Swansea, UK: British Machine Vision Association (BMVA); 2012. p. 17-23.
6Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Series B Stat Methodol 1977. p. 1-38.
7Khan AM, El-Daly H, Simmons E, Rajpoot NM. HyMaP: A hybrid magnitude-phase approach to unsupervised segmentation of tumor areas in breast cancer histology images. J Pathol Inform 2013;4:1.