SYMPOSIUM  ORIGINAL RESEARCH Year : 2013  Volume : 4  Issue : 2  Page : 6 Stain guided meanshift filtering in automatic detection of human tissue nuclei Yu Zhou^{1}, Derek Magee^{1}, Darren Treanor^{2}, Andrew Bulpitt^{1}, ^{1} School of Computing, University of Leeds, Leeds, United Kingdom ^{2} Pathology and Tumor Biology, Leeds Institute of Molecular Medicine, University of Leeds, Leeds, United Kingdom Correspondence Address: Background: As a critical technique in a digital pathology laboratory, automatic nuclear detection has been investigated for more than one decade. Conventional methods work on the raw images directly whose color/intensity homogeneity within tissue/cell areas are undermined due to artefacts such as uneven staining, making the subsequent binarization process prone to error. This paper concerns detecting cell nuclei automatically from digital pathology images by enhancing the color homogeneity as a preprocessing step. Methods: Unlike previous watershed based algorithms relying on postprocessing of the watershed, we present a new method that incorporates the staining information of pathological slides in the analysis. This preprocessing step strengthens the color homogeneity within the nuclear areas as well as the background areas, while keeping the nuclear edges sharp. Proof of convergence for the proposed algorithm is also provided. After preprocessing, Otsu«SQ»s threshold is applied to binarize the image, which is further segmented via watershed. To keep a proper compromise between removing overlapping and avoiding oversegmentation, a naive Bayes classifier is designed to refine the splits suggested by the watershed segmentation. Results: The method is validated with 10 sets of 1000 × 1000 pathology images of lymphoma from one digital slide. The mean precision and recall rates are 87% and 91%, corresponding to a mean Fscore equal to 89%. Standard deviations for these performance indicators are 5.1%, 1.6% and 3.2% respectively. Conclusion: The precision/recall performance obtained indicates that the proposed method outperforms several other alternatives. In particular, for nuclear detection, stain guided meanshift (SGMS) is more effective than the direct application of meanshift in preprocessing. Our experiments also show that preprocessing the digital pathology images with SGMS gives better results than conventional watershed algorithms. Nevertheless, as only one type of tissue is tested in this paper, a further study is planned to enhance the robustness of the algorithm so that other types of tissues/stains can also be processed reliably.
Background In digital pathology, a stained tissue section is first placed under a microscope for scanning. Then digital imaging data can be collected, archived and shown to users such as pathologists. Overall, digital pathology offers unprecedented flexibility to pathologists' clinical work, including a modern platform for carrying pathological inspection over the internet. As an important problem in digital pathology which has miscellaneous biological and medical applications, detecting nuclei automatically has been investigated for more than 10 years. [1] There are numerous algorithms such as: Hough transform, [2] meanshift, [3] quickshift, [4] normalized cut, [5] graph cut, [6] watershed, [7] colortexture [8] and cumulative distribution function, [9] which have been used or have the potential to be used in nuclear detection. Recently, segmentation algorithms for 3D cellular data collected with laser scanning microscopy [10] and confocal microscopy [11] are also emerging. Although there are a variety of methods in the literature, for analyzing color pathology images generated by devices such as Aperio's ScanScope series, there is still room for further improvement in terms of accuracy and efficiency, especially when compared with cell images from fluorescent microscopy. [7] One main reason behind this is that the color pathology images have richer texture details than the fluorescence microscopy images. Albeit the affluence of texture is desirable for better human based diagnosis, it makes the computer based nuclear detection problem rather challenging. This paper offers a fully automatic algorithm for detecting nuclei from color pathology images. Instead of resorting to the postprocessing of a watershed algorithm such as merging only, this paper incorporates a preprocessing step in the loop as well. Section 2 presents the details of this algorithm. The experimental results are presented in Section 3. Section 4 concludes this paper. Methods [Figure 1] presents an overview of our method. The first key step is preprocessing of the image which uses stain guided meanshift (SGMS) filtering. The morphological watershed algorithm is then applied to generate the segmentation result. Finally, the watershed segmentation result is refined, leading to the final output.{Figure 1} SGMS Filtering [Figure 2] shows the virtual pathology images of the lymphoma. The tissue shown in [Figure 2] was counterstained with haematoxylin (h) and eosin (e). H has a purple color and E exhibits a pink color. Due to the nonuniformity of tissue structure and staining, the intensities of the stains vary across [Figure 2], even within the same nucleus. This is one of the issues causing problems for the segmentation algorithm's performance. Here, the hypothesis we wish to test is: If the color uniformity within the image area can be enhanced while the edge of the nucleus is preserved, the segmentation performance will be improved.{Figure 2} As a nonparametric clustering algorithm, meanshift has attracted wide attention. [12] Comaniciu and Meer [3] applied meanshift filtering in analyzing cell images, which incorporates a subsampling procedure and is largely used as a reference method in nuclear detection. [8] For every pixel in the image, a color vector can be constructed. Within the RGB color space, every pixel has a color vector x = (r, g, b). For a color vector x 0 , the standard meanshift algorithm updates this color vector with the following equation: [INLINE:1] where k (xi, x0) is the kernel function and N (x0) is the set of neighborhood pixels of x0. Unlike classic meanshift algorithm which processes the input image within its neighborhood area only, here we incorporate prior knowledge of pathology dyes into the filtering process to enhance the preprocessing. As shown in [Figure 1], kmeans is applied to extract the three main color vectors, which represents the three main/dominant colors within one image: Two stains and the background color induced by the white lighting. For instance, in [Figure 2], apart from the purple color and pink color, there are white areas in [Figure 2] as well, showing no tissue in the field of view. This white color is the color of the light source, which also affects the image in a global manner. Therefore, by introducing the stain color vectors into the filtering process, the updated algorithm for the color vector in (1) is modified as follows: [INLINE:2] where [INSIDE:1]. In the numerator of (2), the first summation represents the effect of local neighborhood colors and the second summation represents the effect of the stain color vectors. The denominator is the normalization coefficient. S denotes the set of color vectors for the two stains plus the light source. Compared with (1), (2) is governed by the stain and lighting color vectors, therefore it is named as SGMS. In the appendix, we will prove that SGMS in (2) is convergent by converting (2) into a matrix format. This proof assures the user that at some point the iteration can be stopped without missing important information. However, due to the inherent nonlinearity of the problem, an explicit expression of the convergent solution is yet unknown, i.e. we still need to let the computer run iterations before it reaches a stable result. Watershed Segmentation After applying the SGMS filtering, the color within the image is agglomerated, including the cytoplasm areas as well as the background areas. As shown in the Appendix, each pixel tends to converge to its closest global color model. Thus by applying Otsu's threshold, the nuclei areas, which are of darker color, can be segmented with the watershed algorithm. [7] [Figure 3] shows an example in which [Figure 3]a is the original image; [Figure 3]b is the result of SGMS filtering; and [Figure 3]c shows the inverse distance transform result after applying a threshold to [Figure 3]b.{Figure 3} Detecting Isolated Nuclei As shown in [Figure 3]c, after watershed processing, lots of catchment basins can be identified. A standalone catchment basin corresponds to one single nucleus area. In [Figure 3]d, these units represent the nuclei that are stained with H and are not touching other cells. With the availability of these standalone nuclei, statistics of the size, shape, and color can be extracted. Here, the area of the nucleus region is employed to represent the size; aspect ratio to characterize the shape and the original color vectors are utilized to extract the color statistics. Therefore, there are five features to evaluate altogether. Statistics of these five features can be used to refine the oversegmented areas in the watershed which usually correspond to the touching nuclei. The blue areas in [Figure 3]e are examples of touching nucleus units. Separating Touching Nuclei To refine the watershed segmentation result, a naive Bayes mechanism is employed. Firstly, for every touching nucleus unit in the initial segmentation, a hypothesis is made that is based on combinations of the catchment basins. Because there might be multiple catchment basins within one connected component, as shown in [Figure 3]e, different hypotheses, i.e. combinations of these catchment basins, are tested. Secondly, the size, shape and color statistics of the areas in the hypothesis are extracted and forwarded to the naive Bayes classifier whose output is defined as follows: [INLINE:3] where f (i) is the binarized output obtained by testing one feature's fitness against the obtained nuclei statistics. If the classifier output in (3) has a positive output, the hypothesis is accepted. Otherwise, the null hypothesis, which means the combination is invalid, will not be rejected. The above hypothesistest is applied to every possible combination of merging certain catchment basins. In addition, because one area can only have one nucleus, if one catchment basin passes more than one test, the hypothesis with the best fit to the nuclei statistics obtained in Section 2.3 will be accepted. [Figure 3]f shows the final result of applying the proposed refinement scheme. As can be seen, the touching nuclei are split effectively and the spurious tiny noise is eliminated. Results Lymphoma Data [Figure 4] shows an overview of the images selected for evaluating the nuclei detection algorithm in this paper. The original size of the images in [Figure 4] is 1000 × 1000. These images correspond to 10 randomly selected regions of one lymphoma tissue from a patient, that was formalin fixed and paraffin embedded initially and then cut into slices with 5 μm in thickness. After staining the tissue section with H and E, the digital slide was collected with an Aperio AT scanner (Aperio, Vista, CA, USA) with a spatial resolution of 0.25 μm per pixel. To evaluate the performance of the proposed algorithm, the ground truth of nuclei centers was generated by manually clicking on the raw data.{Figure 4} Detection Methods for Comparison Here, several nuclear detection methods are employed as references. These methods include: Kmeans of intensity, Canny operator, Laplacian of Gaussian (LoG), quickshift, [4] watershed, anisotropic diffusion (ADiff) [13],[14] and meanshift. [3] For SGMS, a standard Gaussian kernel is used (σ = 1). If the update of the image is smaller than a given threshold, or it has repeated more than 100 times, the algorithm will exit the loop. In the naive Bayesian classifier, we choose P = 0.05. Evaluation of Results For nuclear segmentation/detection, there are four different types of results: Correct detection, under segmentationtype A, under segmentation  type B and oversegmentation. Here, the type A under segmentation is a false negative while type B under segmentation means that one unit covers more than one nucleus center. [Figure 5] gives a schematic diagram illustrating the meaning of these four types of nuclear detection results.{Figure 5} We denote the number of nuclei in the ground truth as n g, the number of detected nuclei as n d, and the four types of results as nI, nII, nIII, nIV respectively. The type B undersegmentation can be counted in two ways. One way is to count the corresponding number of ground truth [INSIDE:2] while the other way is to count the number of detected nucleus units [INSIDE:3]. The relationship between these numbers can be summarized as follows: [INLINE:4] To compare the performance of the different algorithms, we use precision/recall and Fnumber to measure the results. Apart from using precision/recall, we can also evaluate the performance of algorithms based on the four types of detection results. Since the true/false positive/negative based method cannot reveal the different undersegmentations shown in [Figure 5], here, four performance indicators are proposed as follows: [INLINE:5] where the denominator D is the maximum of n g and n d. Results For the 10 images, the mean value plus the standard deviation of performance indicators are extracted to characterize the performance of eight algorithms. [Table 1] and [Table 2] summarize the results of experiments in which eight different algorithms are tested with respect to the 10 lymphoma images.{Table 1}{Table 2} [Table 1] shows the nuclear detection results in precision/recall and Fmeasure. According to [Table 1], SGMS gives the best mean Fmeasure as well as the best consistency of Fmeasure. For both the precision and recall rates, SGMS gives the best results with good consistency levels. [Table 2] summarizes the four types of detection results as defined in (69). It can be seen from [Table 2] that the correct detection ratio for SGMS is the best when compared with other alternatives. In addition, according to the PII in [Table 2], SGMS has the minimum level of type A undersegmentation which is even less than the standard watershed. For type B undersegmentation (PIII ) and oversegmentation (PIV ), SGMS also gives good results. [Figure 6] presents the computational costs in applying SGMS to analyze those images in [Figure 4]. This is based on a personal computer with a dual core processor (3.40 GHz) plus 16 GB installed memory. As shown in [Figure 6], the processing time typically takes 90 s if the iteration number is set to 50 times. In practice, we set the number of iterations as 10 and it gives good results in general.{Figure 6} Conclusion This study investigates automatic detection of tissue nuclei from pathological images. Instead of using watershed directly, we propose an algorithm to smooth the input image while preserving the edges of the nuclei. Unlike the standard meanshift algorithm which is based on local colors only, here by introducing the stain color vectors, global stain information is incorporated into the filtering process. We tested the algorithm and compared the results with several alternative algorithms. Based on the results shown in [Table 1] and [Table 2], one can conclude that by processing the input image with SGMS filtering, the accuracy of watershed based nuclear detection is improved. This study was exploratory on a small set of images. When compared with more complicated pathology images, the image set tested in this study is relatively simple since many nuclear borders are well separated and overlapping nuclei are a minority. Therefore, for the next step of research, enhancing the algorithm on other types of human tissue constitutes a good direction. Acknowledgments This work was funded through WELMEC, a Centre of Excellence in Medical Engineering funded by the Wellcome Trust and EPSRC, under grant number WT 088908/Z/09/Z. Appendix: Proof of Convergence for (2) Suppose the input image is m × n. Firstly, the update equation in (2) can be rewritten in vectormatrix format as follows: [INLINE:6] where X k is the vector representing the current image in kth iteration and S represents the stain vectors. A k is an mn × mn square matrix and B k is an mn × 3 matrix. Both A k and B k are determined by the selected kernel function and they are normalized to satisfy [INSIDE:4]. If the kernel function in (2) is Gaussian, we have and Ak (i, j) > 0 and B k (i, j) > 0. Therefore, [INLINE:7] According to modern control theory, [15] (12) means Xk in (10) is convergent, i.e., (2) is convergent. References


