Identification of histological correlates of overall survival in lower grade gliomas using a bag-of-words paradigm: A preliminary analysis based on hematoxylin & eosin stained slides from the lower grade glioma cohort of the cancer genome Atlas
Reid Trenton Powell1, Adriana Olar2, Shivali Narang3, Ganesh Rao4, Erik Sulman5, Gregory N Fuller6, Arvind Rao7
1 Center for Translational Cancer Research, Texas A and M Health Science Center, Institute of Biosciences and Technology, Houston, TX 77030, USA 2 Department of Hematopathology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA 3 Department of Bioinformatics and Computational Biology, Division of Quantitative Sciences, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA 4 Department of Neurosurgery, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA 5 Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA 6 Department of Pathology (Section of Neuropathology), The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA 7 Department of Bioinformatics and Computational Biology, Division of Quantitative Sciences, The University of Texas MD Anderson Cancer Center; Department of Neurosurgery, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
Correspondence Address:
Arvind Rao Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, 1400 Pressler Street, Unit 1410, Houston, Texas 77030 USA
 Source of Support: None, Conflict of Interest: None  | Check |
DOI: 10.4103/jpi.jpi_43_16
|
Background: Glioma, the most common primary brain neoplasm, describes a heterogeneous tumor of multiple histologic subtypes and cellular origins. At clinical presentation, gliomas are graded according to the World Health Organization guidelines (WHO), which reflect the malignant characteristics of the tumor based on histopathological and molecular features. Lower grade diffuse gliomas (LGGs) (WHO Grade II–III) have fewer malignant characteristics than high-grade gliomas (WHO Grade IV), and a better clinical prognosis, however, accurate discrimination of overall survival (OS) remains a challenge. In this study, we aimed to identify tissue-derived image features using a machine learning approach to predict OS in a mixed histology and grade cohort of lower grade glioma patients. To achieve this aim, we used H and E stained slides from the public LGG cohort of The Cancer Genome Atlas (TCGA) to create a machine learned dictionary of “image-derived visual words” associated with OS. We then evaluated the combined efficacy of using these visual words in predicting short versus long OS by training a generalized machine learning model. Finally, we mapped these predictive visual words back to molecular signaling cascades to infer potential drivers of the machine learned survival-associated phenotypes. Methods: We analyzed digitized histological sections downloaded from the LGG cohort of TCGA using a bag-of-words approach. This method identified a diverse set of histological patterns that were further correlated with OS, histology, and molecular signaling activity using Cox regression, analysis of variance, and Spearman correlation, respectively. A support vector machine (SVM) model was constructed to discriminate patients into short and long OS groups dichotomized at 24-month. Results: This method identified disease-relevant phenotypes associated with OS, some of which are correlated with disease-associated molecular pathways. From these image-derived phenotypes, a generalized SVM model which could discriminate 24-month OS (area under the curve, 0.76) was obtained. Conclusion: Here, we demonstrated one potential strategy to incorporate image features derived from H and E stained slides into predictive models of OS. In addition, we showed how these image-derived phenotypic characteristics correlate with molecular signaling activity underlying the etiology or behavior of LGG. |