|SYMPOSIUM - ORIGINAL RESEARCH
|J Pathol Inform 2013,
Stain guided mean-shift filtering in automatic detection of human tissue nuclei
Yu Zhou1, Derek Magee1, Darren Treanor2, Andrew Bulpitt1
1 School of Computing, University of Leeds, Leeds, United Kingdom
2 Pathology and Tumor Biology, Leeds Institute of Molecular Medicine, University of Leeds, Leeds, United Kingdom
|Date of Submission||23-Jan-2013|
|Date of Acceptance||23-Jan-2013|
|Date of Web Publication||30-Mar-2013|
School of Computing, University of Leeds, Leeds
Source of Support: None, Conflict of Interest: None
| Abstract|| |
Background: As a critical technique in a digital pathology laboratory, automatic nuclear detection has been investigated for more than one decade. Conventional methods work on the raw images directly whose color/intensity homogeneity within tissue/cell areas are undermined due to artefacts such as uneven staining, making the subsequent binarization process prone to error. This paper concerns detecting cell nuclei automatically from digital pathology images by enhancing the color homogeneity as a pre-processing step. Methods: Unlike previous watershed based algorithms relying on post-processing of the watershed, we present a new method that incorporates the staining information of pathological slides in the analysis. This pre-processing step strengthens the color homogeneity within the nuclear areas as well as the background areas, while keeping the nuclear edges sharp. Proof of convergence for the proposed algorithm is also provided. After pre-processing, Otsu's threshold is applied to binarize the image, which is further segmented via watershed. To keep a proper compromise between removing overlapping and avoiding over-segmentation, a naive Bayes classifier is designed to refine the splits suggested by the watershed segmentation. Results: The method is validated with 10 sets of 1000 × 1000 pathology images of lymphoma from one digital slide. The mean precision and recall rates are 87% and 91%, corresponding to a mean F-score equal to 89%. Standard deviations for these performance indicators are 5.1%, 1.6% and 3.2% respectively. Conclusion: The precision/recall performance obtained indicates that the proposed method outperforms several other alternatives. In particular, for nuclear detection, stain guided mean-shift (SGMS) is more effective than the direct application of mean-shift in pre-processing. Our experiments also show that pre-processing the digital pathology images with SGMS gives better results than conventional watershed algorithms. Nevertheless, as only one type of tissue is tested in this paper, a further study is planned to enhance the robustness of the algorithm so that other types of tissues/stains can also be processed reliably.
Keywords: Digital pathology, k-means, mean-shift, watershed
|How to cite this article:|
Zhou Y, Magee D, Treanor D, Bulpitt A. Stain guided mean-shift filtering in automatic detection of human tissue nuclei. J Pathol Inform 2013;4:6
| Background|| |
In digital pathology, a stained tissue section is first placed under a microscope for scanning. Then digital imaging data can be collected, archived and shown to users such as pathologists. Overall, digital pathology offers unprecedented flexibility to pathologists' clinical work, including a modern platform for carrying pathological inspection over the internet.
As an important problem in digital pathology which has miscellaneous biological and medical applications, detecting nuclei automatically has been investigated for more than 10 years.  There are numerous algorithms such as: Hough transform,  mean-shift,  quick-shift,  normalized cut,  graph cut,  watershed,  color-texture  and cumulative distribution function,  which have been used or have the potential to be used in nuclear detection. Recently, segmentation algorithms for 3D cellular data collected with laser scanning microscopy  and confocal microscopy  are also emerging.
Although there are a variety of methods in the literature, for analyzing color pathology images generated by devices such as Aperio's ScanScope series, there is still room for further improvement in terms of accuracy and efficiency, especially when compared with cell images from fluorescent microscopy.  One main reason behind this is that the color pathology images have richer texture details than the fluorescence microscopy images. Albeit the affluence of texture is desirable for better human based diagnosis, it makes the computer based nuclear detection problem rather challenging.
This paper offers a fully automatic algorithm for detecting nuclei from color pathology images. Instead of resorting to the post-processing of a watershed algorithm such as merging only, this paper incorporates a pre-processing step in the loop as well. Section 2 presents the details of this algorithm. The experimental results are presented in Section 3. Section 4 concludes this paper.
| Methods|| |
[Figure 1] presents an overview of our method. The first key step is pre-processing of the image which uses stain guided mean-shift (SGMS) filtering. The morphological watershed algorithm is then applied to generate the segmentation result. Finally, the watershed segmentation result is refined, leading to the final output.
[Figure 2] shows the virtual pathology images of the lymphoma. The tissue shown in [Figure 2] was counterstained with haematoxylin (h) and eosin (e). H has a purple color and E exhibits a pink color. Due to the non-uniformity of tissue structure and staining, the intensities of the stains vary across [Figure 2], even within the same nucleus. This is one of the issues causing problems for the segmentation algorithm's performance. Here, the hypothesis we wish to test is: If the color uniformity within the image area can be enhanced while the edge of the nucleus is preserved, the segmentation performance will be improved.
|Figure 2: (a) Digital pathology image of lymphoma; (b) selected area of interest in (a)|
Click here to view
As a non-parametric clustering algorithm, mean-shift has attracted wide attention.  Comaniciu and Meer  applied mean-shift filtering in analyzing cell images, which incorporates a sub-sampling procedure and is largely used as a reference method in nuclear detection. 
For every pixel in the image, a color vector can be constructed. Within the RGB color space, every pixel has a color vector x = (r, g, b). For a color vector x 0 , the standard mean-shift algorithm updates this color vector with the following equation:
where k (xi, x0) is the kernel function and N (x0) is the set of neighborhood pixels of x0.
Unlike classic mean-shift algorithm which processes the input image within its neighborhood area only, here we incorporate prior knowledge of pathology dyes into the filtering process to enhance the pre-processing. As shown in [Figure 1], k-means is applied to extract the three main color vectors, which represents the three main/dominant colors within one image: Two stains and the background color induced by the white lighting.
For instance, in [Figure 2], apart from the purple color and pink color, there are white areas in [Figure 2] as well, showing no tissue in the field of view. This white color is the color of the light source, which also affects the image in a global manner.
Therefore, by introducing the stain color vectors into the filtering process, the updated algorithm for the color vector in (1) is modified as follows:
where . In the numerator of (2), the first summation represents the effect of local neighborhood colors and the second summation represents the effect of the stain color vectors. The denominator is the normalization coefficient. S denotes the set of color vectors for the two stains plus the light source. Compared with (1), (2) is governed by the stain and lighting color vectors, therefore it is named as SGMS.
In the appendix, we will prove that SGMS in (2) is convergent by converting (2) into a matrix format. This proof assures the user that at some point the iteration can be stopped without missing important information. However, due to the inherent nonlinearity of the problem, an explicit expression of the convergent solution is yet unknown, i.e. we still need to let the computer run iterations before it reaches a stable result.
After applying the SGMS filtering, the color within the image is agglomerated, including the cytoplasm areas as well as the background areas. As shown in the Appendix, each pixel tends to converge to its closest global color model. Thus by applying Otsu's threshold, the nuclei areas, which are of darker color, can be segmented with the watershed algorithm.  [Figure 3] shows an example in which [Figure 3]a is the original image; [Figure 3]b is the result of SGMS filtering; and [Figure 3]c shows the inverse distance transform result after applying a threshold to [Figure 3]b.
|Figure 3: Watershed segmentation of stain guided mean - shift (SGMS) images (a) raw picture; (b) SGMS; (c) inverted distance transform; (d) stand - alone catchment basins; (e) touching nuclei and (f) final result|
Click here to view
Detecting Isolated Nuclei
As shown in [Figure 3]c, after watershed processing, lots of catchment basins can be identified. A stand-alone catchment basin corresponds to one single nucleus area. In [Figure 3]d, these units represent the nuclei that are stained with H and are not touching other cells.
With the availability of these stand-alone nuclei, statistics of the size, shape, and color can be extracted. Here, the area of the nucleus region is employed to represent the size; aspect ratio to characterize the shape and the original color vectors are utilized to extract the color statistics. Therefore, there are five features to evaluate altogether. Statistics of these five features can be used to refine the over-segmented areas in the watershed which usually correspond to the touching nuclei. The blue areas in [Figure 3]e are examples of touching nucleus units.
Separating Touching Nuclei
To refine the watershed segmentation result, a naive Bayes mechanism is employed.
Firstly, for every touching nucleus unit in the initial segmentation, a hypothesis is made that is based on combinations of the catchment basins. Because there might be multiple catchment basins within one connected component, as shown in [Figure 3]e, different hypotheses, i.e. combinations of these catchment basins, are tested.
Secondly, the size, shape and color statistics of the areas in the hypothesis are extracted and forwarded to the naive Bayes classifier whose output is defined as follows:
where f (i) is the binarized output obtained by testing one feature's fitness against the obtained nuclei statistics. If the classifier output in (3) has a positive output, the hypothesis is accepted. Otherwise, the null hypothesis, which means the combination is invalid, will not be rejected.
The above hypothesis-test is applied to every possible combination of merging certain catchment basins. In addition, because one area can only have one nucleus, if one catchment basin passes more than one test, the hypothesis with the best fit to the nuclei statistics obtained in Section 2.3 will be accepted.
[Figure 3]f shows the final result of applying the proposed refinement scheme. As can be seen, the touching nuclei are split effectively and the spurious tiny noise is eliminated.
| Results|| |
[Figure 4] shows an overview of the images selected for evaluating the nuclei detection algorithm in this paper. The original size of the images in [Figure 4] is 1000 × 1000. These images correspond to 10 randomly selected regions of one lymphoma tissue from a patient, that was formalin fixed and paraffin embedded initially and then cut into slices with 5 μm in thickness. After staining the tissue section with H and E, the digital slide was collected with an Aperio AT scanner (Aperio, Vista, CA, USA) with a spatial resolution of 0.25 μm per pixel. To evaluate the performance of the proposed algorithm, the ground truth of nuclei centers was generated by manually clicking on the raw data.
|Figure 4: Ten sets of lymphoma images for evaluating the nuclear detection algorithm|
Click here to view
Detection Methods for Comparison
Here, several nuclear detection methods are employed as references. These methods include: K-means of intensity, Canny operator, Laplacian of Gaussian (LoG), quick-shift,  watershed, anisotropic diffusion (A-Diff) , and mean-shift. 
For SGMS, a standard Gaussian kernel is used (σ = 1). If the update of the image is smaller than a given threshold, or it has repeated more than 100 times, the algorithm will exit the loop. In the naive Bayesian classifier, we choose P = 0.05.
| Evaluation of Results|| |
For nuclear segmentation/detection, there are four different types of results: Correct detection, under segmentation-type A, under segmentation - type B and over-segmentation. Here, the type A under segmentation is a false negative while type B under segmentation means that one unit covers more than one nucleus center. [Figure 5] gives a schematic diagram illustrating the meaning of these four types of nuclear detection results.
|Figure 5: Four - types of detection results. '+'s are manually assigned nucleus centres. (2)(3)(4) - correct detections; (5) type A under - segmentation; (1) type B under - segmentation and (6) over - segmentation|
Click here to view
We denote the number of nuclei in the ground truth as n g, the number of detected nuclei as n d, and the four types of results as nI, nII, nIII, nIV respectively. The type B under-segmentation can be counted in two ways. One way is to count the corresponding number of ground truth while the other way is to count the number of detected nucleus units . The relationship between these numbers can be summarized as follows:
To compare the performance of the different algorithms, we use precision/recall and F-number to measure the results.
Apart from using precision/recall, we can also evaluate the performance of algorithms based on the four types of detection results. Since the true/false positive/negative based method cannot reveal the different under-segmentations shown in [Figure 5], here, four performance indicators are proposed as follows:
where the denominator D is the maximum of n g and n d.
| Results|| |
For the 10 images, the mean value plus the standard deviation of performance indicators are extracted to characterize the performance of eight algorithms. [Table 1] and [Table 2] summarize the results of experiments in which eight different algorithms are tested with respect to the 10 lymphoma images.
|Table 2: Records of four types of detections (mean value and standard deviation)|
Click here to view
[Table 1] shows the nuclear detection results in precision/recall and F-measure. According to [Table 1], SGMS gives the best mean F-measure as well as the best consistency of F-measure. For both the precision and recall rates, SGMS gives the best results with good consistency levels.
[Table 2] summarizes the four types of detection results as defined in (6-9). It can be seen from [Table 2] that the correct detection ratio for SGMS is the best when compared with other alternatives. In addition, according to the PII in [Table 2], SGMS has the minimum level of type A under-segmentation which is even less than the standard watershed. For type B under-segmentation (PIII ) and over-segmentation (PIV ), SGMS also gives good results.
[Figure 6] presents the computational costs in applying SGMS to analyze those images in [Figure 4]. This is based on a personal computer with a dual core processor (3.40 GHz) plus 16 GB installed memory. As shown in [Figure 6], the processing time typically takes 90 s if the iteration number is set to 50 times. In practice, we set the number of iterations as 10 and it gives good results in general.
|Figure 6: Computational time versus number of iterations in applying stain guided mean-shift|
Click here to view
| Conclusion|| |
This study investigates automatic detection of tissue nuclei from pathological images. Instead of using watershed directly, we propose an algorithm to smooth the input image while preserving the edges of the nuclei. Unlike the standard mean-shift algorithm which is based on local colors only, here by introducing the stain color vectors, global stain information is incorporated into the filtering process. We tested the algorithm and compared the results with several alternative algorithms. Based on the results shown in [Table 1] and [Table 2], one can conclude that by processing the input image with SGMS filtering, the accuracy of watershed based nuclear detection is improved.
This study was exploratory on a small set of images. When compared with more complicated pathology images, the image set tested in this study is relatively simple since many nuclear borders are well separated and overlapping nuclei are a minority. Therefore, for the next step of research, enhancing the algorithm on other types of human tissue constitutes a good direction.
| Acknowledgments|| |
This work was funded through WELMEC, a Centre of Excellence in Medical Engineering funded by the Wellcome Trust and EPSRC, under grant number WT 088908/Z/09/Z.
Appendix: Proof of Convergence for (2)
Suppose the input image is m × n. Firstly, the update equation in (2) can be rewritten in vector-matrix format as follows:
where X k is the vector representing the current image in k-th iteration and S represents the stain vectors. A k is an mn × mn square matrix and B k is an mn × 3 matrix. Both A k and B k are determined by the selected kernel function and they are normalized to satisfy . If the kernel function in (2) is Gaussian, we have and Ak (i, j) > 0 and B k (i, j) > 0. Therefore,
According to modern control theory,  (12) means Xk in (10) is convergent, i.e., (2) is convergent.
| References|| |
|1.||Gurcan MN, Boucheron LE, Can A, Madabhushi A, Rajpoot NM, Yener B. Histopathological image analysis: A review. IEEE Rev Biomed Eng 2009;2:147-71. |
|2.||Magee D, Chomphuwiset P, Treanor D, Quirke P. Context aware colour classification in digital microscopy. London: In: Proceedings of Medical Image Understanding and Analysis; 2011. p. 1-5. |
|3.||Comaniciu D. Meer P. Cell Image segmentation for diagnostic pathology. In: Suri JS, Setarehdan S, Singh S, editors. Advances in Computer Vision and Pattern Recognition. London: Springer; 2002. p. 541-58. |
|4.||Vedaldi A. Fulkerson B. Vlfeat: An open and portable library of computer vision algorithms. Proc. ACM Multimedia Int. Conf; 2010. p. 1469-72. |
|5.||Bernardis E, Yu SXW. Pop out many small structures from a very large microscopic image. Med Image Anal 2011;15:690-707. |
|6.||Al-Kofahi Y, Lassoued W, Lee W, Roysam B. Improved automatic detection and segmentation of cell nuclei in histopathology images. IEEE Trans Biomed Eng 2010;57:841-52. |
|7.||Chen X, Zhou X, Wong ST. Automated segmentation, classification, and tracking of cancer cell nuclei in time-lapse microscopy. IEEE Trans Biomed Eng 2006;53:762-6. |
|8.||Kong H, Gurcan M, Belkacem-Boussaid K. Partitioning histopathological images: An integrated framework for supervised color-texture segmentation and cell splitting. IEEE Trans Med Imaging 2011;30:1661-77. |
|9.||Hagwood C, Bernal J, Halter M, Elliott J. Evaluation of segmentation algorithms on cell populations using CDF curves. IEEE Trans Med Imaging 2012;31:380-90. |
|10.||Krivá Z, Mikula K, Peyriéras N, Rizzi B, Sarti A, Stasová O. 3D early embryogenesis image filtering by nonlinear partial differential equations. Med Image Anal 2010;14:510-26. |
|11.||Ram S, Rodríguez JJ, Bosco G. Segmentation and detection of fluorescent 3D spots. Cytometry A 2012;81:198-212. |
|12.||Comaniciu D, Meer P. Mean Shift: A robust approach toward feature space analysis. IEEE Trans Pattern Anal Mach Intell 2002;24:603-19. |
|13.||Perona P. Malik J. Scale-space and edge detection using anisotropic diffusion. IEEE Trans Pattern Anal Mach Intell 1990;12:629-39. |
|14.||Méndez-Rial R, Martín-Herrero J. Efficiency of semi-implicit schemes for anisotropic diffusion in the hypercube. IEEE Trans Image Process 2012;21:2389-98. |
|15.||Astrom K. Murray R. Feedback systems: An introduction for scientists and engineers. Princeton, NJ, USA: Princeton University Press; 2008. p. 102-19. |
[Figure 1], [Figure 2], [Figure 3], [Figure 4], [Figure 5], [Figure 6]
[Table 1], [Table 2]
|This article has been cited by|
||An alternative reference space for H&E color normalization
| ||Mark D. Zarella,Chan Yeoh,David E. Breen,Fernando U. Garcia,Abhijit De |
| ||PLOS ONE. 2017; 12(3): e0174489 |
|[Pubmed] | [DOI]|
||An optimized image analysis algorithm for detecting nuclear signals in digital whole slides for histopathology
| ||Róbert Paulik,Tamás Micsik,Gábor Kiszler,Péter Kaszál,János Székely,Norbert Paulik,Eszter Várhalmi,Viktória Prémusz,Tibor Krenács,Béla Molnár |
| ||Cytometry Part A. 2017; 91(6): 595 |
|[Pubmed] | [DOI]|
||Incorporating Local and Global Context for Better Automated Analysis of Colorectal Cancer on Digital Pathology Slides
| ||Alexander I. Wright,Derek Magee,Philip Quirke,Darren Treanor |
| ||Procedia Computer Science. 2016; 90: 125 |
|[Pubmed] | [DOI]|
||Nuclear shape descriptors by automated morphometry may distinguish aggressive variants of squamous cell carcinoma from relatively benign skin proliferative lesions: a pilot study
| ||Weixi Yang,Rong Tian,Tongqing Xue |
| ||Tumor Biology. 2015; 36(8): 6125 |
|[Pubmed] | [DOI]|
||An image processing pipeline to detect and segment nuclei in muscle fiber microscopic images
| ||Yanen Guo,Xiaoyin Xu,Yuanyuan Wang,Yaming Wang,Shunren Xia,Zhong Yang |
| ||Microscopy Research and Technique. 2014; : n/a |
|[Pubmed] | [DOI]|