

SYMPOSIUM  ORIGINAL ARTICLE 



J Pathol Inform 2013,
4:11 
A gammagaussian mixture model for detection of mitotic cells in breast cancer histopathology images
Adnan Mujahid Khan^{1}, Hesham ElDaly^{2}, Nasir M Rajpoot^{3}
^{1} Department of Computer Science, University of Warwick, Coventry, UK ^{2} Department of Pathology, Addenbrookes Hospital, Cambridge, UK ^{3} Department of Computer Science, University of Warwick, Coventry, UK; Department of Computer Science and Engineering, Qatar University, Qatar
Date of Submission  30Mar2013 
Date of Acceptance  31Mar2013 
Date of Web Publication  30May2013 
Correspondence Address: Nasir M Rajpoot Department of Computer Science, University of Warwick, Coventry, UK; Department of Computer Science and Engineering, Qatar University, Qatar
Source of Support: None, Conflict of Interest: None  Check 
DOI: 10.4103/21533539.112696
Abstract   
In this paper, we propose a statistical approach for mitosis detection in breast cancer histological images. The proposed algorithm models the pixel intensities in mitotic and nonmitotic regions by a GammaGaussian mixture model (GGMM) and employs a context aware postprocessing (CAPP) in order to reduce false positives. Experimental results demonstrate the ability of this simple, yet effective method to detect mitotic cells (MCs) in standard H & E breast cancer histology images. Context: Counting of MCs in breast cancer histopathology images is one of three components (the other two being tubule formation, nuclear pleomorphism) required for developing computer assisted grading of breast cancer tissue slides. This is very challenging since the biological variability of the MCs makes their detection extremely difficult. In addition, if standard H & E is used (which stains chromatin rich structures, such as nucleus, apoptotic, and MCs dark blue) and it becomes extremely difficult to detect the latter given the fact that former two are densely localized in the tissue sections. Aims: In this paper, a robust MCs detection technique is developed and tested on 35 breast histopathology images, belonging to five different tissue slides. Settings and Design: Our approach mimics a pathologists' approach to MCs detections. The idea is (1) to isolate tumor areas from nontumor areas (lymphoid/inflammatory/apoptotic cells), (2) search for MCs in the reduced space by statistically modeling the pixel intensities from mitotic and nonmitotic regions, and finally (3) evaluate the context of each potential MC in terms of its texture. Materials and Methods: Our experimental dataset consisted of 35 digitized images of breast cancer biopsy slides with paraffin embedded sections stained with H and E and scanned at × 40 using an Aperio scanscope slide scanner. Statistical Analysis Used: We propose GGMM for detecting MCs in breast histology images. Image intensities are modeled as random variables sampled from one of the two distributions; Gamma and Gaussian. Intensities from MCs are modeled by a gamma distribution and those from nonmitotic regions are modeled by a gaussian distribution. The choice of GammaGaussian distribution is mainly due to the observation that the characteristics of the distribution match well with the data it models. The experimental results show that the proposed system achieves a high sensitivity of 0.82 with positive predictive value (PPV) of 0.29. Employing CAPP on these results produce 241% increase in PPV at the cost of less than 15% decrease in sensitivity. Conclusions: In this paper, we presented a GGMM for detection of MCs in breast cancer histopathological images. In addition, we introduced CAPP as a tool to increase the PPV with a minimal loss in sensitivity. We evaluated the performance of the proposed detection algorithm in terms of sensitivity and PPV over a set of 35 breast histology images selected from five different tissue slides and showed that a reasonably high value of sensitivity can be retained while increasing the PPV. Our future work will aim at increasing the PPV further by modeling the spatial appearance of regions surrounding mitotic events. Keywords: Breast cancer grading, histopathology image analysis, mitotic cell detection, statistical modeling of mitotic cells
How to cite this article: Khan AM, ElDaly H, Rajpoot NM. A gammagaussian mixture model for detection of mitotic cells in breast cancer histopathology images. J Pathol Inform 2013;4:11 
Introduction   
Counting of mitotic cells (MCs) in breast histopathology images is one of three components (the other two being tubule formation, nuclear pleomorphism) required for developing computer assisted grading of breast cancer tissue slides. ^{[1]} This is very challenging since the biological variability of the MCs makes their detection extremely difficult [Figure 1]. In addition, if standard H & E is used (which stains chromatin rich structures, such as nucleus, apoptotic cells, and MCs dark blue) and it becomes extremely difficult to detect the later given the fact that former two are densely localized in the tissue sections. As a consequence, two categories of relevant works have been reported in literature. One that use an additional stain (e.g., PHH3) to stain MCs exclusively and detect exclusively stained MCs in the images. ^{[2]} Other that use a video sequence to detect MCs over time by incorporating spatiotemporal information. ^{[3]} Since the exclusive stain costs additionally and videos are not at all used in standard histopathological practices, therefore a gap exists in the literature.
In this paper, a robust MCs detection technique is developed and tested on 35 breast histopathology images, belonging to five different tissue slides. To the best of our knowledge, there is no existing method in the literature for detection of MCs in standard H and E, breast histology images. The proposed method mimics a pathologist's approach to MCs detection under microscope. The main idea is to isolate tumor region from nontumor areas (lymphoid/inflammatory/apoptotic cells) and search for MCs in the reduced space by statistically modeling the pixel intensities from mitotic and nonmitotic regions. In order to further enhance the positive predictive value (PPV), context aware postprocessing (CAPP) has been introduced. The experimental results show that the proposed system achieves a high sensitivity of 0.82 with PPV of 0.29. Employing CAPP on these results produce 241% increase in PPV at the cost of lesser than 15% decrease in sensitivity.
The Proposed Algorithm   
Stain Normalization
Tissue staining is commonly used to highlight distinct structures in histology images. Among many different stains, H & E is one of the most commonly used. It selectively stains nuclei structures blue and cytoplasm pink. Although staining enables better visualization of tissue structures; however, due to nonstandardization in histopathological work flow, stained images vary a lot in terms of color, and intensity. Stain normalization is used to achieve a consistent color and intensity appearance. We found the algorithm proposed by Magee et al.^{[4]} very effective for normalizing histology images.
Tumor Segmentation
Breast cancer histology images can be divided into two regions: tumor and nontumor. MCs may exist in both tumor and nontumor regions howeve only those MCs are considered for grading that are present in tumor regions. Therefore, an intelligent MCs detection system must first remove nontumor areas from the tissue slide in order to minimize the search space. We have used a feature based texture segmentation framework random projections with ensemble clustering ^{[5]} to segment tumor regions. Broadly, the algorithm follows the following pipeline: (1) a library of texture features is computed over a range of scales and orientations, (2) low dimensional embedding (using random projections) is performed to avoid overfitting and curse of dimensionality, and finally (3) tumor segmentation is performed in low dimensional space. This produces an accurate and totally unsupervised tumor segmentation.
In order to account MCs present on the boundary of tumor and nontumor regions, morphological dilation on tumor segmentation results is performed. Although it increases the chances of detecting boundary MCs, yet it also includes some lymphoid/inflammatory cells into the tumor regions, that appear as false positives (FPs) when detecting MCs in breast histology slides.
Statistical Modeling of MCs
MCs appear as relatively dark, jagged, and irregularly textured structures [Figure 1]. Owing to sectioning artifacts, some appear too dim to notice with a naked eye. In terms of shape, color and textural characteristics, lymphoid/inflammatory cells and apoptotic cells that are densely present in tissue slides possess almost similar characteristics; thus, could easily be confused with MCs.
In this paper, we propose gammagaussian mixture model (GGMM) for detecting MCs in breast histology images. Image intensities (L channel of La*b* color space) are modeled as random variables sampled from one of the two distributions; gammagaussian. Intensities from MCs are modeled by a Gamma distribution and those from nonmitotic regions are modeled by a Gaussian distribution. The choice of gammagaussian distribution is mainly due to the observation that the characteristics of the distribution match well with the data it models [Figure 2].  Figure 2: Marginal distributions (vertical bars) and fitted models (solid lines) by the twocomponent gammagaussian mixture model
Click here to view 
GGMM
[Figure 2] shows two marginal distributions (solid lines) and their fitted models (dotted lines). The left and the right marginal distributions show the probability distributions of pixels belonging to mitotic and nonmitotic regions respectively. Close fit to the marginal distributions was achieved by GGMM. The GGMM is a parametric technique for estimating probability density function. In our context, it can be formulated as follows.
For pixel intensities x, the proposed mixture model is given by:
where ρ_{1} and ρ_{2} represent the mixing proportions (priors) of intensities belonging to mitotic and nonmitotic regions, and ρ_{1} + ρ_{2} = 1. Γ (x, 0α, β ) represents the gamma density function parameterized by α (the shape parameter) and β (the scale parameter). G (x, μ, σ) represents Gaussian density function parameterized by μ (mean) and σ (standard deviation). θ = [α, β, μ, σ, ρ_{1,} ρ_{2}] represents the vector of all unknown parameters in the model.
Parameter Estimation
In order to estimate unknown parameters (θ), we employ maximum likelihood estimation (MLE). Given image intensities x _{i}, i = 1, 2,…., n where n is number of pixels, loglikelihood function (l) of parameter vector θ is given by
where f (x_{i} ; θ) is the mixture density function in equation (1). The MLE of θ can be represented by
A convenient approach to obtain a numerical solution to the above maximization problem is provided by the expectation maximization (EM) algorithm. ^{[6]} In our context, the EM algorithm can be set up as follows.
Let z _{ik}, k = 1, 2, be indicator variables showing the component membership of each pixel x _{i} in the mixture model. (1) Note that these indicator variables are hidden (unobserved). The loglikelihood (2) can be extended as follows:
The EM algorithm finds iteratively as outlined in [Algorithm 1[Additional file 1]]. Let θ^{(m)} be the estimate of θ after m iterations of the Algorithm 1. The EM algorithm seeks to find the MLE of the marginal likelihood by iteratively applying Expectation and Maximization steps.
Classification
The posterior probabilities of a pixel x _{i} belonging to class 1 (Mitotic) or 2 (NonMitotic) are calculated as follows,
Given the pixelwise posterior probability maps, Otsu thresholding is then used to classify mitotic and nonmitotic pixels. It was found empirically that the area of MC was between 60 and 1,000 pixels. Therefore, area thresholding is performed to remove all potentially mitotic regions having area out of this range.
Capp   
The results produced as a result of the algorithmic steps stated so far achieve 86% sensitivity, however given a large no of similar looking objects (apoptotic cells, lymphoid/inflammatory cells, etc), a number of FPs are also obtained. In order to reduce the FPs without significantly reducing sensitivity, CAPP is performed on the classification results. A small context window [Figure 3] is defined around the bounding box of each potentially MC. In each context window, four representative features are computed over a set of textural features. The representative features are used to train a support vector machine (SVM) classifier using a Gaussian kernel. The trained classifier is then used to predict unseen candidate contexts of mcs.  Figure 3: Four examples of 50 × 50 context patches, cropped around the bounding box of candidate MCs (detected using the proposed algorithm). First 2 (from left) are false positives, last 2 are MCs
Click here to view 
Results   
Our experimental dataset consisted of 35 digitized images of breast cancer biopsy slides with paraffin embedded sections stained with H and E and scanned at × 40 using an Aperio ScanScope slide scanner. After stain normalization, background removal and unsupervised tumor segmentation over all 35 images, seven images were selected to extract mitotic and nonmitotic pixel intensities (L channel of La*b* color space) for model fitting using GGMM. We chose 500 iterations and tolerance (f = 0.01) for the EM algorithm. Although EM provides estimates of priors (ρ_{1} and ρ_{2} ), a more accurate estimate of priors (ρ_{1} = 0.0014 and ρ_{2} = 0.9986) was used based on the ratio of mitotic and nonmitotic data used for model fitting. [Figure 4] shows the plot of senstivity against PPV when areathreshold is varied on the candidate MCs.
The set of textural features extracted from a window of size 30 × 30 pixels around the bounding box of each candidate mitosis are as follows: 32 Phase Gradient (PG) features (16 orientations, 2 scales), ^{[7]} 1 roughness feature, 1 entropy feature. From each of these 34 features, 4 representative features were computed: (1) mean, (2) standard deviation, (3) skewness, (4) kurtosis. This gave a 136dimensional features vector for each pixel inside the context window. The resulting 136 dimensional vector was used in training and testing of SVM.  Figure 4: Plot of sensitivity versus positive predictive value (PPV) when areathreshold is varied on the candidate mitotic cells. High sensitivity and low PPV is obtained when small values of areathreshold were used. Table 1 shows how introduction of CAPP appreciates PPV without significantly degrading sensitivity
Click here to view 
Since the data consisting of candidate potential MCs, identified before CAPP was applied, was unbalanced (mitotic29.1%, nonmitotic70.9%) and therefore a balanced mix of mitotic and nonmitotic examples were randomly selected as training data. A total of 69.90% of data was used for training and remaining 30.10% for testing. Grid search was used to find optimal parameters for the Gaussian kernel of the in SVM. [Figure 5] demonstrates efficacy of the proposed MCs detection algorithm.  Figure 5: Visual results of mitotic cells (MC) detection in a sample image: (a) Original image with ground truth marked MCs shown in yellow color; (b) Results of Tumor segmentation (as outlined in Section 2.2) where nontumor areas are shown in a slightly darker contrast with blue boundaries; (c) Results of MC detection (in yellow color) without CAPP (Sensitivity = 0.87, positive predictive value [PPV] = 0.54) and (d) Results of MC detection (in yellow color) with CAPP (Sensitivity = 0.87, PPV = 0.87)
Click here to view 
A higher penalty for misclassification in the SVM was set for mitotic class, since the original data was unbalanced. [Table 1] provides details of the quantitative results obtained with a fivefold crossvalidation. According to these results, more than 200% of PPV was enhanced at the cost of lesser than 15% reduction in sensitivity.  Table 1: Quantitative comparison of sensitivity and PPV with and without using CAPP for a fi xed value of area threshold=120. By employing CAPP, PPV is doubled on unseen data, without drastically reducing the sensitivity (i.e., less than 15% only)
Click here to view 
Conclusion   
In this paper, we presented GGMM for detection of MCs in breast cancer histopathological images. In addition, we introduced CAPP as a tool to increase the PPV with a minimal loss in sensitivity. We evaluated the performance of the proposed detection algorithm in terms of sensitivity and PPV over a set of 35 breast histology images selected from 5 different tissue slides and showed that a reasonably high value of sensitivity can be retained although increasing the PPV. Our future work will aim at increasing the PPV further by modeling the spatial appearance of regions surrounding mitotic events.
Acknowledgments   
The authors would like to thank the organizers of International Conference on Pattern Recognition (ICPR) 2012 contest for mitosis detection in breast cancer. The images used in this paper are part of MITOS dataset, a dataset setup for ANR French project MICO. The authors would also like to thank Dr. Derek Magee for sharing the executable for his algorithm for stain normalization. The first author gratefully acknowledges the financial support provided by Warwick Postgraduate Research Scholarship scheme and the Department of Computer Science at the University of Warwick.
References   
1.  Elston CW, Ellis IO. Pathological prognostic factors in breast cancer. I. The value of histological grade in breast cancer: Experience from a large study with longterm followup. Histopathology 1991;19:40310. 
2.  Roullier V, Lézoray O, Ta VT, Elmoataz A. Multiresolution graphbased analysis of histopathological whole slide images: Application to mitotic cell extraction and visualization. Comput Med Imaging Graph 2011;35:60315. 
3.  Huh S, Ker DF, Bise R, Chen M, Kanade T. Automated mitosis detection of stem cell populations in phasecontrast microscopy images. IEEE Trans Med Imaging 2011;30:58696. 
4.  Magee D, Treanor D, Chomphuwiset P, Quirke P. Context aware colour classification in digital microscopy. In: Proceedings Medical Image Understanding and Analysis. United Kingdom: British Machine Vision Association (BMVA); 2010. p. 15. 
5.  Khan AM, ElDaly H, Rajpoot N. Ran PE. Random projections with ensemble clustering for segmentation of tumor areas in breast histology images. In: Medical Image Understanding and Analysis (MIUA). Swansea, UK: British Machine Vision Association (BMVA); 2012. p. 1723. 
6.  Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Series B Stat Methodol 1977. p. 138. 
7.  Khan AM, ElDaly H, Simmons E, Rajpoot NM. HyMaP: A hybrid magnitudephase approach to unsupervised segmentation of tumor areas in breast cancer histology images. J Pathol Inform 2013;4:1. 
[Figure 1], [Figure 2], [Figure 3], [Figure 4], [Figure 5]
[Table 1]
This article has been cited by  1 
Computational approach for mitotic cell detection and its application in oral squamous cell carcinoma 

 Dev Kumar Das,Pabitra Mitra,Chandan Chakraborty,Sanjoy Chatterjee,Asok Kumar Maiti,Surajit Bose   Multidimensional Systems and Signal Processing. 2017;   [Pubmed]  [DOI]   2 
Automated Mitosis Detection in Histopathology Based on NonGaussian Modeling of Complex Wavelet Coefficients 

 Tao Wan,Wanshu Zhang,Min Zhu,Jianhui Chen,Alin Achim,Zengchang Qin   Neurocomputing. 2017;   [Pubmed]  [DOI]   3 
Image Montaging for Creating a Virtual Pathology Slide: An Innovative and Economical Tool to Obtain a Whole Slide Image 

 Spoorthi Ravi Banavar,Prashanthi Chippagiri,Rohit Pandurangappa,Saileela Annavajjula,Premalatha Bidadi Rajashekaraiah   Analytical Cellular Pathology. 2016; 2016: 1   [Pubmed]  [DOI]   4 
Cell words: Modelling the visual appearance of cells in histopathology images 

 Korsuk Sirinukunwattana,Adnan M. Khan,Nasir M. Rajpoot   Computerized Medical Imaging and Graphics. 2015; 42: 16   [Pubmed]  [DOI]   5 
Automated identification of keratinization and keratin pearl area from in situ oral histological images 

 Dev Kumar Das,Chandan Chakraborty,Satyakam Sawaimoon,Asok Kumar Maiti,Sanjoy Chatterjee   Tissue and Cell. 2015; 47(4): 349   [Pubmed]  [DOI]   6 
A Nonlinear Mapping Approach to Stain Normalization in Digital Histopathology Images Using ImageSpecific Color Deconvolution 

 Adnan Mujahid Khan,Nasir Rajpoot,Darren Treanor,Derek Magee   IEEE Transactions on Biomedical Engineering. 2014; 61(6): 1729   [Pubmed]  [DOI]   7 
Breast Cancer Histopathology Image Analysis: A Review 

 Mitko Veta,Josien P. W. Pluim,Paul J. van Diest,Max A. Viergever   IEEE Transactions on Biomedical Engineering. 2014; 61(5): 1400   [Pubmed]  [DOI]   8 
Assessment of algorithms for mitosis detection in breast cancer histopathology images 

 Mitko Veta,Paul J. van Diest,Stefan M. Willems,Haibo Wang,Anant Madabhushi,Angel CruzRoa,Fabio Gonzalez,Anders B.L. Larsen,Jacob S. Vestergaard,Anders B. Dahl,Dan C. Cire?an,Jürgen Schmidhuber,Alessandro Giusti,Luca M. Gambardella,F. Boray Tek,Thomas Walter,ChingWei Wang,Satoshi Kondo,Bogdan J. Matuszewski,Frederic Precioso,Violet Snell,Josef Kittler,Teofilo E. de Campos,Adnan M. Khan,Nasir M. Rajpoot,Evdokia Arkoumani,Miangela M. Lacle,Max A. Viergever,Josien P.W. Pluim   Medical Image Analysis. 2014;   [Pubmed]  [DOI]  




