Journal of Pathology Informatics Journal of Pathology Informatics
Contact us | Home | Login   |  Users Online: 187  Print this pageEmail this pageSmall font sizeDefault font sizeIncrease font size 




 
Table of Contents    
SYMPOSIUM - ORIGINAL ARTICLE
J Pathol Inform 2013,  4:11

A gamma-gaussian mixture model for detection of mitotic cells in breast cancer histopathology images


1 Department of Computer Science, University of Warwick, Coventry, UK
2 Department of Pathology, Addenbrookes Hospital, Cambridge, UK
3 Department of Computer Science, University of Warwick, Coventry, UK; Department of Computer Science and Engineering, Qatar University, Qatar

Date of Submission30-Mar-2013
Date of Acceptance31-Mar-2013
Date of Web Publication30-May-2013

Correspondence Address:
Nasir M Rajpoot
Department of Computer Science, University of Warwick, Coventry, UK; Department of Computer Science and Engineering, Qatar University, Qatar

Login to access the Email id

Source of Support: None, Conflict of Interest: None


DOI: 10.4103/2153-3539.112696

Rights and Permissions
   Abstract 

In this paper, we propose a statistical approach for mitosis detection in breast cancer histological images. The proposed algorithm models the pixel intensities in mitotic and non-mitotic regions by a Gamma-Gaussian mixture model (GGMM) and employs a context aware post-processing (CAPP) in order to reduce false positives. Experimental results demonstrate the ability of this simple, yet effective method to detect mitotic cells (MCs) in standard H & E breast cancer histology images. Context: Counting of MCs in breast cancer histopathology images is one of three components (the other two being tubule formation, nuclear pleomorphism) required for developing computer assisted grading of breast cancer tissue slides. This is very challenging since the biological variability of the MCs makes their detection extremely difficult. In addition, if standard H & E is used (which stains chromatin rich structures, such as nucleus, apoptotic, and MCs dark blue) and it becomes extremely difficult to detect the latter given the fact that former two are densely localized in the tissue sections. Aims: In this paper, a robust MCs detection technique is developed and tested on 35 breast histopathology images, belonging to five different tissue slides. Settings and Design: Our approach mimics a pathologists' approach to MCs detections. The idea is (1) to isolate tumor areas from non-tumor areas (lymphoid/inflammatory/apoptotic cells), (2) search for MCs in the reduced space by statistically modeling the pixel intensities from mitotic and non-mitotic regions, and finally (3) evaluate the context of each potential MC in terms of its texture. Materials and Methods: Our experimental dataset consisted of 35 digitized images of breast cancer biopsy slides with paraffin embedded sections stained with H and E and scanned at × 40 using an Aperio scanscope slide scanner. Statistical Analysis Used: We propose GGMM for detecting MCs in breast histology images. Image intensities are modeled as random variables sampled from one of the two distributions; Gamma and Gaussian. Intensities from MCs are modeled by a gamma distribution and those from non-mitotic regions are modeled by a gaussian distribution. The choice of Gamma-Gaussian distribution is mainly due to the observation that the characteristics of the distribution match well with the data it models. The experimental results show that the proposed system achieves a high sensitivity of 0.82 with positive predictive value (PPV) of 0.29. Employing CAPP on these results produce 241% increase in PPV at the cost of less than 15% decrease in sensitivity. Conclusions: In this paper, we presented a GGMM for detection of MCs in breast cancer histopathological images. In addition, we introduced CAPP as a tool to increase the PPV with a minimal loss in sensitivity. We evaluated the performance of the proposed detection algorithm in terms of sensitivity and PPV over a set of 35 breast histology images selected from five different tissue slides and showed that a reasonably high value of sensitivity can be retained while increasing the PPV. Our future work will aim at increasing the PPV further by modeling the spatial appearance of regions surrounding mitotic events.

Keywords: Breast cancer grading, histopathology image analysis, mitotic cell detection, statistical modeling of mitotic cells


How to cite this article:
Khan AM, ElDaly H, Rajpoot NM. A gamma-gaussian mixture model for detection of mitotic cells in breast cancer histopathology images. J Pathol Inform 2013;4:11

How to cite this URL:
Khan AM, ElDaly H, Rajpoot NM. A gamma-gaussian mixture model for detection of mitotic cells in breast cancer histopathology images. J Pathol Inform [serial online] 2013 [cited 2017 Nov 23];4:11. Available from: http://www.jpathinformatics.org/text.asp?2013/4/1/11/112696


   Introduction Top


Counting of mitotic cells (MCs) in breast histopathology images is one of three components (the other two being tubule formation, nuclear pleomorphism) required for developing computer assisted grading of breast cancer tissue slides. [1] This is very challenging since the biological variability of the MCs makes their detection extremely difficult [Figure 1]. In addition, if standard H & E is used (which stains chromatin rich structures, such as nucleus, apoptotic cells, and MCs dark blue) and it becomes extremely difficult to detect the later given the fact that former two are densely localized in the tissue sections. As a consequence, two categories of relevant works have been reported in literature. One that use an additional stain (e.g., PHH3) to stain MCs exclusively and detect exclusively stained MCs in the images. [2] Other that use a video sequence to detect MCs over time by incorporating spatio-temporal information. [3] Since the exclusive stain costs additionally and videos are not at all used in standard histopathological practices, therefore a gap exists in the literature.
Figure 1: How hard is it to identify mitotic cells in breast?

Click here to view


In this paper, a robust MCs detection technique is developed and tested on 35 breast histopathology images, belonging to five different tissue slides. To the best of our knowledge, there is no existing method in the literature for detection of MCs in standard H and E, breast histology images. The proposed method mimics a pathologist's approach to MCs detection under microscope. The main idea is to isolate tumor region from non-tumor areas (lymphoid/inflammatory/apoptotic cells) and search for MCs in the reduced space by statistically modeling the pixel intensities from mitotic and non-mitotic regions. In order to further enhance the positive predictive value (PPV), context aware post-processing (CAPP) has been introduced. The experimental results show that the proposed system achieves a high sensitivity of 0.82 with PPV of 0.29. Employing CAPP on these results produce 241% increase in PPV at the cost of lesser than 15% decrease in sensitivity.


   The Proposed Algorithm Top


Stain Normalization

Tissue staining is commonly used to highlight distinct structures in histology images. Among many different stains, H & E is one of the most commonly used. It selectively stains nuclei structures blue and cytoplasm pink. Although staining enables better visualization of tissue structures; however, due to non-standardization in histopathological work flow, stained images vary a lot in terms of color, and intensity. Stain normalization is used to achieve a consistent color and intensity appearance. We found the algorithm proposed by Magee et al.[4] very effective for normalizing histology images.

Tumor Segmentation

Breast cancer histology images can be divided into two regions: tumor and non-tumor. MCs may exist in both tumor and non-tumor regions howeve only those MCs are considered for grading that are present in tumor regions. Therefore, an intelligent MCs detection system must first remove non-tumor areas from the tissue slide in order to minimize the search space. We have used a feature based texture segmentation frame-work random projections with ensemble clustering [5] to segment tumor regions. Broadly, the algorithm follows the following pipeline: (1) a library of texture features is computed over a range of scales and orientations, (2) low dimensional embedding (using random projections) is performed to avoid overfitting and curse of dimensionality, and finally (3) tumor segmentation is performed in low dimensional space. This produces an accurate and totally unsupervised tumor segmentation.

In order to account MCs present on the boundary of tumor and non-tumor regions, morphological dilation on tumor segmentation results is performed. Although it increases the chances of detecting boundary MCs, yet it also includes some lymphoid/inflammatory cells into the tumor regions, that appear as false positives (FPs) when detecting MCs in breast histology slides.

Statistical Modeling of MCs

MCs appear as relatively dark, jagged, and irregularly textured structures [Figure 1]. Owing to sectioning artifacts, some appear too dim to notice with a naked eye. In terms of shape, color and textural characteristics, lymphoid/inflammatory cells and apoptotic cells that are densely present in tissue slides possess almost similar characteristics; thus, could easily be confused with MCs.

In this paper, we propose gamma-gaussian mixture model (GGMM) for detecting MCs in breast histology images. Image intensities (L channel of La*b* color space) are modeled as random variables sampled from one of the two distributions; gamma-gaussian. Intensities from MCs are modeled by a Gamma distribution and those from non-mitotic regions are modeled by a Gaussian distribution. The choice of gamma-gaussian distribution is mainly due to the observation that the characteristics of the distribution match well with the data it models [Figure 2].
Figure 2: Marginal distributions (vertical bars) and fitted models (solid lines) by the two-component gamma-gaussian mixture model

Click here to view


GGMM

[Figure 2] shows two marginal distributions (solid lines) and their fitted models (dotted lines). The left and the right marginal distributions show the probability distributions of pixels belonging to mitotic and non-mitotic regions respectively. Close fit to the marginal distributions was achieved by GGMM. The GGMM is a parametric technique for estimating probability density function. In our context, it can be formulated as follows.

For pixel intensities x, the proposed mixture model is given by:



where ρ1 and ρ2 represent the mixing proportions (priors) of intensities belonging to mitotic and non-mitotic regions, and ρ1 + ρ2 = 1. Γ (x, 0α, β ) represents the gamma density function parameterized by α (the shape parameter) and β (the scale parameter). G (x, μ, σ) represents Gaussian density function parameterized by μ (mean) and σ (standard deviation). θ = [α, β, μ, σ, ρ1, ρ2] represents the vector of all unknown parameters in the model.

Parameter Estimation

In order to estimate unknown parameters (θ), we employ maximum likelihood estimation (MLE). Given image intensities x i, i = 1, 2,…., n where n is number of pixels, log-likelihood function (l) of parameter vector θ is given by



where f (xi ; θ) is the mixture density function in equation (1). The MLE of θ can be represented by



A convenient approach to obtain a numerical solution to the above maximization problem is provided by the expectation maximization (EM) algorithm. [6] In our context, the EM algorithm can be set up as follows.

Let z ik, k = 1, 2, be indicator variables showing the component membership of each pixel x i in the mixture model. (1) Note that these indicator variables are hidden (unobserved). The log-likelihood (2) can be extended as follows:



The EM algorithm finds iteratively as outlined in [Algorithm 1[Additional file 1]]. Let θ(m) be the estimate of θ after m iterations of the Algorithm 1. The EM algorithm seeks to find the MLE of the marginal likelihood by iteratively applying Expectation and Maximization steps.

Classification

The posterior probabilities of a pixel x i belonging to class 1 (Mitotic) or 2 (Non-Mitotic) are calculated as follows,



Given the pixel-wise posterior probability maps, Otsu thresholding is then used to classify mitotic and non-mitotic pixels. It was found empirically that the area of MC was between 60 and 1,000 pixels. Therefore, area thresholding is performed to remove all potentially mitotic regions having area out of this range.


   Capp Top


The results produced as a result of the algorithmic steps stated so far achieve 86% sensitivity, however given a large no of similar looking objects (apoptotic cells, lymphoid/inflammatory cells, etc), a number of FPs are also obtained. In order to reduce the FPs without significantly reducing sensitivity, CAPP is performed on the classification results. A small context window [Figure 3] is defined around the bounding box of each potentially MC. In each context window, four representative features are computed over a set of textural features. The representative features are used to train a support vector machine (SVM) classifier using a Gaussian kernel. The trained classifier is then used to predict unseen candidate contexts of mcs.
Figure 3: Four examples of 50 × 50 context patches, cropped around the bounding box of candidate MCs (detected using the proposed algorithm). First 2 (from left) are false positives, last 2 are MCs

Click here to view



   Results Top


Our experimental dataset consisted of 35 digitized images of breast cancer biopsy slides with paraffin embedded sections stained with H and E and scanned at × 40 using an Aperio ScanScope slide scanner. After stain normalization, background removal and unsupervised tumor segmentation over all 35 images, seven images were selected to extract mitotic and non-mitotic pixel intensities (L channel of La*b* color space) for model fitting using GGMM. We chose 500 iterations and tolerance (f = 0.01) for the EM algorithm. Although EM provides estimates of priors (ρ1 and ρ2 ), a more accurate estimate of priors (ρ1 = 0.0014 and ρ2 = 0.9986) was used based on the ratio of mitotic and non-mitotic data used for model fitting. [Figure 4] shows the plot of senstivity against PPV when area-threshold is varied on the candidate MCs.

The set of textural features extracted from a window of size 30 × 30 pixels around the bounding box of each candidate mitosis are as follows: 32 Phase Gradient (PG) features (16 orientations, 2 scales), [7] 1 roughness feature, 1 entropy feature. From each of these 34 features, 4 representative features were computed: (1) mean, (2) standard deviation, (3) skewness, (4) kurtosis. This gave a 136-dimensional features vector for each pixel inside the context window. The resulting 136 dimensional vector was used in training and testing of SVM.
Figure 4: Plot of sensitivity versus positive predictive value (PPV) when area-threshold is varied on the candidate mitotic cells. High sensitivity and low PPV is obtained when small values of area-threshold were used. Table 1 shows how introduction of CAPP appreciates PPV without significantly degrading sensitivity

Click here to view


Since the data consisting of candidate potential MCs, identified before CAPP was applied, was unbalanced (mitotic-29.1%, non-mitotic-70.9%) and therefore a balanced mix of mitotic and non-mitotic examples were randomly selected as training data. A total of 69.90% of data was used for training and remaining 30.10% for testing. Grid search was used to find optimal parameters for the Gaussian kernel of the in SVM. [Figure 5] demonstrates efficacy of the proposed MCs detection algorithm.
Figure 5: Visual results of mitotic cells (MC) detection in a sample image: (a) Original image with ground truth marked MCs shown in yellow color; (b) Results of Tumor segmentation (as outlined in Section 2.2) where non-tumor areas are shown in a slightly darker contrast with blue boundaries; (c) Results of MC detection (in yellow color) without CAPP (Sensitivity = 0.87, positive predictive value [PPV] = 0.54) and (d) Results of MC detection (in yellow color) with CAPP (Sensitivity = 0.87, PPV = 0.87)

Click here to view


A higher penalty for misclassification in the SVM was set for mitotic class, since the original data was unbalanced. [Table 1] provides details of the quantitative results obtained with a five-fold cross-validation. According to these results, more than 200% of PPV was enhanced at the cost of lesser than 15% reduction in sensitivity.
Table 1: Quantitative comparison of sensitivity and PPV with and without using CAPP for a fi xed value of area threshold=120. By employing CAPP, PPV is doubled on unseen data, without drastically reducing the sensitivity (i.e., less than 15% only)

Click here to view



   Conclusion Top


In this paper, we presented GGMM for detection of MCs in breast cancer histopathological images. In addition, we introduced CAPP as a tool to increase the PPV with a minimal loss in sensitivity. We evaluated the performance of the proposed detection algorithm in terms of sensitivity and PPV over a set of 35 breast histology images selected from 5 different tissue slides and showed that a reasonably high value of sensitivity can be retained although increasing the PPV. Our future work will aim at increasing the PPV further by modeling the spatial appearance of regions surrounding mitotic events.


   Acknowledgments Top


The authors would like to thank the organizers of International Conference on Pattern Recognition (ICPR) 2012 contest for mitosis detection in breast cancer. The images used in this paper are part of MITOS dataset, a dataset setup for ANR French project MICO. The authors would also like to thank Dr. Derek Magee for sharing the executable for his algorithm for stain normalization. The first author gratefully acknowledges the financial support provided by Warwick Post-graduate Research Scholarship scheme and the Department of Computer Science at the University of Warwick.

 
   References Top

1.Elston CW, Ellis IO. Pathological prognostic factors in breast cancer. I. The value of histological grade in breast cancer: Experience from a large study with long-term follow-up. Histopathology 1991;19:403-10.  Back to cited text no. 1
    
2.Roullier V, Lézoray O, Ta VT, Elmoataz A. Multi-resolution graph-based analysis of histopathological whole slide images: Application to mitotic cell extraction and visualization. Comput Med Imaging Graph 2011;35:603-15.  Back to cited text no. 2
    
3.Huh S, Ker DF, Bise R, Chen M, Kanade T. Automated mitosis detection of stem cell populations in phase-contrast microscopy images. IEEE Trans Med Imaging 2011;30:586-96.  Back to cited text no. 3
    
4.Magee D, Treanor D, Chomphuwiset P, Quirke P. Context aware colour classification in digital microscopy. In: Proceedings Medical Image Understanding and Analysis. United Kingdom: British Machine Vision Association (BMVA); 2010. p. 1-5.  Back to cited text no. 4
    
5.Khan AM, El-Daly H, Rajpoot N. Ran PE. Random projections with ensemble clustering for segmentation of tumor areas in breast histology images. In: Medical Image Understanding and Analysis (MIUA). Swansea, UK: British Machine Vision Association (BMVA); 2012. p. 17-23.  Back to cited text no. 5
    
6.Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Series B Stat Methodol 1977. p. 1-38.  Back to cited text no. 6
    
7.Khan AM, El-Daly H, Simmons E, Rajpoot NM. HyMaP: A hybrid magnitude-phase approach to unsupervised segmentation of tumor areas in breast cancer histology images. J Pathol Inform 2013;4:1.  Back to cited text no. 7
  Medknow Journal  


    Figures

  [Figure 1], [Figure 2], [Figure 3], [Figure 4], [Figure 5]
 
 
    Tables

  [Table 1]


This article has been cited by
1 Computational approach for mitotic cell detection and its application in oral squamous cell carcinoma
Dev Kumar Das,Pabitra Mitra,Chandan Chakraborty,Sanjoy Chatterjee,Asok Kumar Maiti,Surajit Bose
Multidimensional Systems and Signal Processing. 2017;
[Pubmed] | [DOI]
2 Automated Mitosis Detection in Histopathology Based on Non-Gaussian Modeling of Complex Wavelet Coefficients
Tao Wan,Wanshu Zhang,Min Zhu,Jianhui Chen,Alin Achim,Zengchang Qin
Neurocomputing. 2017;
[Pubmed] | [DOI]
3 Image Montaging for Creating a Virtual Pathology Slide: An Innovative and Economical Tool to Obtain a Whole Slide Image
Spoorthi Ravi Banavar,Prashanthi Chippagiri,Rohit Pandurangappa,Saileela Annavajjula,Premalatha Bidadi Rajashekaraiah
Analytical Cellular Pathology. 2016; 2016: 1
[Pubmed] | [DOI]
4 Cell words: Modelling the visual appearance of cells in histopathology images
Korsuk Sirinukunwattana,Adnan M. Khan,Nasir M. Rajpoot
Computerized Medical Imaging and Graphics. 2015; 42: 16
[Pubmed] | [DOI]
5 Automated identification of keratinization and keratin pearl area from in situ oral histological images
Dev Kumar Das,Chandan Chakraborty,Satyakam Sawaimoon,Asok Kumar Maiti,Sanjoy Chatterjee
Tissue and Cell. 2015; 47(4): 349
[Pubmed] | [DOI]
6 A Nonlinear Mapping Approach to Stain Normalization in Digital Histopathology Images Using Image-Specific Color Deconvolution
Adnan Mujahid Khan,Nasir Rajpoot,Darren Treanor,Derek Magee
IEEE Transactions on Biomedical Engineering. 2014; 61(6): 1729
[Pubmed] | [DOI]
7 Breast Cancer Histopathology Image Analysis: A Review
Mitko Veta,Josien P. W. Pluim,Paul J. van Diest,Max A. Viergever
IEEE Transactions on Biomedical Engineering. 2014; 61(5): 1400
[Pubmed] | [DOI]
8 Assessment of algorithms for mitosis detection in breast cancer histopathology images
Mitko Veta,Paul J. van Diest,Stefan M. Willems,Haibo Wang,Anant Madabhushi,Angel Cruz-Roa,Fabio Gonzalez,Anders B.L. Larsen,Jacob S. Vestergaard,Anders B. Dahl,Dan C. Cire?an,Jürgen Schmidhuber,Alessandro Giusti,Luca M. Gambardella,F. Boray Tek,Thomas Walter,Ching-Wei Wang,Satoshi Kondo,Bogdan J. Matuszewski,Frederic Precioso,Violet Snell,Josef Kittler,Teofilo E. de Campos,Adnan M. Khan,Nasir M. Rajpoot,Evdokia Arkoumani,Miangela M. Lacle,Max A. Viergever,Josien P.W. Pluim
Medical Image Analysis. 2014;
[Pubmed] | [DOI]



 

 
Top
  

    

 
  Search
 
   Browse articles
  
    Similar in PUBMED
   Search Pubmed for
   Search in Google Scholar for
 Related articles
    Access Statistics
    Email Alert *
    Add to My List *
* Registration required (free)  

 
  In this article
    Abstract
   Introduction
    The Proposed Alg...
   Capp
   Results
   Conclusion
   Acknowledgments
    References
    Article Figures
    Article Tables

 Article Access Statistics
    Viewed3225    
    Printed70    
    Emailed0    
    PDF Downloaded734    
    Comments [Add]    
    Cited by others 8    

Recommend this journal