Andrew Janowczyk, Anant Madabhushi J Pathol Inform 2016, 7:29 (26 July 2016) DOI:10.4103/2153-3539.186902 PMID:27563488Background: Deep learning (DL) is a representation learning approach ideally suited for image analysis challenges in digital pathology (DP). The variety of image analysis tasks in the context of DP includes detection and counting (e.g., mitotic events), segmentation (e.g., nuclei), and tissue classification (e.g., cancerous vs. non-cancerous). Unfortunately, issues with slide preparation, variations in staining and scanning across sites, and vendor platforms, as well as biological variance, such as the presentation of different grades of disease, make these image analysis tasks particularly challenging. Traditional approaches, wherein domain-specific cues are manually identified and developed into task-specific "handcrafted" features, can require extensive tuning to accommodate these variances. However, DL takes a more domain agnostic approach combining both feature discovery and implementation to maximally discriminate between the classes of interest. While DL approaches have performed well in a few DP related image analysis tasks, such as detection and tissue classification, the currently available open source tools and tutorials do not provide guidance on challenges such as (a) selecting appropriate magnification, (b) managing errors in annotations in the training (or learning) dataset, and (c) identifying a suitable training set containing information rich exemplars. These foundational concepts, which are needed to successfully translate the DL paradigm to DP tasks, are non-trivial for (i) DL experts with minimal digital histology experience, and (ii) DP and image processing experts with minimal DL experience, to derive on their own, thus meriting a dedicated tutorial. Aims: This paper investigates these concepts through seven unique DP tasks as use cases to elucidate techniques needed to produce comparable, and in many cases, superior to results from the state-of-the-art hand-crafted feature-based classification approaches. Results : Specifically, in this tutorial on DL for DP image analysis, we show how an open source framework (Caffe), with a singular network architecture, can be used to address: (a) nuclei segmentation (F-score of 0.83 across 12,000 nuclei), (b) epithelium segmentation (F-score of 0.84 across 1735 regions), (c) tubule segmentation (F-score of 0.83 from 795 tubules), (d) lymphocyte detection (F-score of 0.90 across 3064 lymphocytes), (e) mitosis detection (F-score of 0.53 across 550 mitotic events), (f) invasive ductal carcinoma detection (F-score of 0.7648 on 50 k testing patches), and (g) lymphoma classification (classification accuracy of 0.97 across 374 images). Conclusion: This paper represents the largest comprehensive study of DL approaches in DP to date, with over 1200 DP images used during evaluation. The supplemental online material that accompanies this paper consists of step-by-step instructions for the usage of the supplied source code, trained models, and input data. |
Aicha BenTaieb, Masoud S Nosrati, Hector Li-Chang, David Huntsman, Ghassan Hamarneh J Pathol Inform 2016, 7:28 (26 July 2016) DOI:10.4103/2153-3539.186899 PMID:27563487Context: It has been shown that ovarian carcinoma subtypes are distinct pathologic entities with differing prognostic and therapeutic implications. Histotyping by pathologists has good reproducibility, but occasional cases are challenging and require immunohistochemistry and subspecialty consultation. Motivated by the need for more accurate and reproducible diagnoses and to facilitate pathologists' workflow, we propose an automatic framework for ovarian carcinoma classification. Materials and Methods: Our method is inspired by pathologists' workflow. We analyse imaged tissues at two magnification levels and extract clinically-inspired color, texture, and segmentation-based shape descriptors using image-processing methods. We propose a carefully designed machine learning technique composed of four modules: A dissimilarity matrix, dimensionality reduction, feature selection and a support vector machine classifier to separate the five ovarian carcinoma subtypes using the extracted features. Results: This paper presents the details of our implementation and its validation on a clinically derived dataset of eighty high-resolution histopathology images. The proposed system achieved a multiclass classification accuracy of 95.0% when classifying unseen tissues. Assessment of the classifier's confusion (confusion matrix) between the five different ovarian carcinoma subtypes agrees with clinician's confusion and reflects the difficulty in diagnosing endometrioid and serous carcinomas. Conclusions: Our results from this first study highlight the difficulty of ovarian carcinoma diagnosis which originate from the intrinsic class-imbalance observed among subtypes and suggest that the automatic analysis of ovarian carcinoma subtypes could be valuable to clinician's diagnostic procedure by providing a second opinion. |