RESEARCH ARTICLE Year : 2019  Volume : 10  Issue : 1  Page : 30 Statistical analysis of survival models using feature quantification on prostate cancer histopathological images Jian Ren^{1}, Eric A Singer^{2}, Evita Sadimin^{3}, David J Foran^{4}, Xin Qi^{4}, ^{1} Department of Electrical and Computer Engineering, Rutgers University, Piscataway, NJ, USA ^{2} Department of Pathology and Laboratory Medicine, Section of Urologic Oncology; Center for Biomedical Imaging and Informatics, Rutgers Cancer Institute of New Jersey, New Brunswick, NJ, USA ^{3} Department of Pathology and Laboratory Medicine, Section of Urologic Oncology, Rutgers Cancer Institute of New Jersey, New Brunswick, NJ, USA ^{4} Center for Biomedical Imaging and Informatics, Rutgers Cancer Institute of New Jersey, New Brunswick, NJ, USA Correspondence Address: Background: Grading of prostatic adenocarcinoma is based on the Gleason scoring system and the more recently established prognostic grade groups. Typically, prostate cancer grading is performed by pathologists based on the morphology of the tumor on hematoxylin and eosin (H and E) slides. In this study, we investigated the histopathological image features with various survival models and attempted to study their correlations. Methods: Three texture methods (speededup robust features, histogram of oriented gradient, and local binary pattern) and two convolutional neural network (CNN)based methods were applied to quantify histopathological image features. Five survival models were assessed on those image features in the context with other prostate clinical prognostic factors, including primary and secondary Gleason patterns, prostatespecific antigen levels, age, and clinical tumor stages. Results: Based on statistical comparisons among different image features with survival models, image features from CNNbased method with a recurrent neural network called CNNlongshortterm memory provided the highest hazard ratio of prostate cancer recurrence under Cox regression with an elastic net penalty. Conclusions: This approach outperformed the other image quantification methods listed above. Using this approach, patient outcomes were highly correlated with the histopathological image features of the tissue samples. In future studies, we plan to investigate the potential use of this approach for predicting recurrence in a wider range of cancer types.
Introduction Survival analysis is a means for predicting patient outcomes by providing invaluable information for selecting treatment. Predicting prostate cancer survival outcomes is a significant challenge. Following radical prostatectomy, men must be closely monitored for the evidence of recurrence. This is typically done via prostatespecific antigen (PSA) blood tests. A detectable or rising PSA after surgery is the evidence of biochemical recurrence. The measure of time from surgery to biochemical recurrence is biochemical recurrencefree survival (bRFS). Multiple studies examined predictors of bRFS using quantitative histopathological features with some survival models.[1],[2],[3],[4] However, numerous prediction tools[5],[6],[7],[8],[9],[10],[11] utilized wholeslide images (WSIs) to assess prostate cancer recurrence and predicted the likely outcomes resulting from treatments. Few of these studies simultaneously considered clinical factors (primary and secondary Gleason patterns, PSA value, age, tumor stage) and tissue WSIs to correlate with recurrence under different survival models. The Gleason scoring system for prostate cancer remains one of the best predictors for prostate cancer progression and recurrence,[12],[13],[14],[15] despite significant interobserver reproducibility among pathologists.[16],[17],[18] A more recently adapted grading system stratifies patients into five prognostic grade groups[19] based on their Gleason patterns: grade Group 1 (Gleason ≤ 3 + 3 = 6), Grade Group 2 (Gleason 3 + 4 = 7), Grade Group 3 (Gleason 4 + 3 = 7), Grade Group 4 (Gleason 4 + 4 = 8, 3 + 5 = 8, and 5 + 3 = 8), and Grade Group 5 (Gleason 4 + 5 = 9, 5 + 4 = 9, and 5 + 5 = 10). [Figure 1] shows an example of Gigapixel WSI with different Gleason patterns. The greenframed patch contains Gleason pattern 3; the blueframed patch contains Gleason pattern 4; and the redframed patch contains Gleason pattern 5. In this study, we conducted experiments on public prostate cancer dataset using different feature quantification methods and recurrence analysis using different survival models. Histopathological image features were quantified through texture methods and neural networkbased approaches. We focused on the prostate cancer grade groups of 1–4. The bRFS was applied as the time to recurrence for prostate cancer progression analysis.{Figure 1} Materials and Methods Materials In this study, we used the prostate dataset from the Genomic Data Commons (GDC).[20] The dataset included wholeslide histopathological images from patients and their corresponding clinical reports, including the primary and secondary Gleason pattern, patients' PSA value, age, and tumor stage. All the image data, annotations of Gleason score, and clinical information were publicly available. We selected the patients with lowrisk (Gleason score 3 + 3), intermediaterisk (Gleason score 3 + 4 or 4 + 3), and highrisk prostate cancer (Gleason score 4 + 4) because those patient populations show a reasonable range of prognoses for our analysis. We excluded patients with Gleason Grade Group 5 patients in this study due to poor prognosis of their disease.[21] Considering the high computational cost on the Gigapixel tissue WSIs, existing WSIs classification and recurrence analysis approaches were focused on effectively utilizing the cropped patches from region of interests.[22],[23],[24],[25],[26],[27] For image preparation, we adopted the twostep cropping–selecting process. First, original patches were automatically generated within each WSI under ×40 with a patch size of 4096 × 4096. Second, the patches with the tissue accounting for at least 20% of the whole area were selected for our experiments. The number of WSIs and cropped patches under different Gleason scores is shown in [Table 1].{Table 1} Methods Initially, we utilized various quantification methods to extract image features from WSIs. Next, the recurrence analysis was performed on the combination of image features and clinical factors utilizing different survival models, as shown in [Figure 2]. Hazard ratios using different survival models were calculated to indicate the correlation between image features (or in context of clinical factors) and recurrence; the higher the hazard ratio, the higher the correlations.{Figure 2} Image feature quantification We adopted five approaches for the purpose of feature quantification including unsupervised and supervised methods. The unsupervised texture methods consisted of speededup robust features (SURFs),[28] histogram of oriented gradients (HOGs),[29] and local binary pattern (LBP).[30] The two supervised methods are based on convolutional neural networks (CNNs). For supervised methods, we randomly selected 20% of the cases as testing set, 10% as validation set, and the remaining as training set. Texture features We chose three texture methods for prostate cancer histopathological image analysis. They were rotation, translation, and scale and intensityinvariant which were suitable for descriptions of the texture features within WSIs. The SURF[28] is partly inspired by the scaleinvariant feature transform (SIFT) descriptors. The standard version of SURF is several times faster than SIFT and more robust against different image transformations than SIFT. The image is transformed into coordinates, using the multiresolution pyramid technique, to copy the original image with a pyramidal Gaussian or Laplacian pyramid shape to obtain an image with the same size but with reduced bandwidth. The HOG[29] counts occurrences of gradient orientation in a local region of an image. It is similar to that of edgeorientation histograms, SIFT descriptors, and shape contexts but differs in that it is computed on a dense grid of uniformly spaced cells and uses overlapping local contrast normalization for improved accuracy. The LBP[30] is used to model the image local features in texture spectrum units in a multiresolution grayscale mode. It is based on recognizing local binary unit patterns for any quantization of the angular space and spatial resolution. The image features for each patch were generated using a bagofwords approach[31] from the texture features of different texture methods. By treating image features as words, a bag of words is a sparse vector of occurrence counts (histogram) of a vocabulary of local image features. In the bagofword approach, it converts vectorrepresented texture features to codewords, which also produce a codebook. The image features are mapped to certain codewords through the clustering process, and the image is then represented by the histogram of the codewords. Empirically, we use 100 as the number of cluster centers to report the best performance for texture features. To select the texture features for WSIs, we apply principal component analysis (PCA)[32] of the image features for all patches within a WSI due to correlations among the patches. Convolutional neural networkbased features In recent years, with the advances of deep learning, studies using CNNs have demonstrated significant improvement on histopathological image classification[27],[33],[34],[35],[36] and segmentation.[33],[34],[37],[38] For the WSIs, applications based on CNNs have been widely developed.[39],[40],[41] In our study, we adopted two approaches to obtain CNNbased features. The first one was using the neural network to obtain image features for each patch, and then the features for WSIs were obtained by utilizing PCA on all patches. The CNN employed in the study is shown in [Table 2]. The input to the network was the cropped patches from prostate pathological WSIs. The activations from the second to the last layer were considered as the image features of the input samples. To train the network with patches, we assigned Gleason pattern as the ground truth annotation for the patch. The GDC WSIs have been previously graded with the primary and secondary patterns, as well as the final Gleason score given.{Table 2} To model variations among Gleason patterns within a WSI, we used the multitask architecture to enable the network to learn as much information about the Gleason patterns from the patches of a WSI as possible. During the training process, we assigned the primary pattern and the sum of primary pattern and secondary pattern (Gleason score) as labels for each patch and use the following multitask loss function: [INLINE:1] where for the ith image within the batch of N images,[INSIDE:1] and [INSIDE:2]encoded the Gleason grading for the primary pattern and the sum score and [INSIDE:3]and [INSIDE:4]encoded the predicted grading of the model. The onehot encoding is a process by which categorical variables are converted into a form that could be provided to CNN to do a better job in classification. The results suggested that using the primary Gleason pattern and the Gleason score together achieved the best estimate of risk of recurrence by capturing local and global image feature distribution more efficiently than using either one alone. For the second approach, we treated the cropped patches from the WSI as an image sequence and used one type of recurrent neural network (RNN) called longshortterm memory (LSTM) to explore the longterm dynamic information of the patch spatial sequence within the WSI. We denoted the method as CNN features with LSTM (CNN + LSTM). The LSTM could fully leverage the patch spatial sequence within a WSI to get the representative features that model the global Gleason score of the WSI and the distribution of the Gleason patterns among the WSI. Recently, the LSTM model has been successfully used in speech recognition,[42],[43] language translation models,[44] image captioning,[45] and video classification.[46] Compared with the traditional RNNs, LSTM is more effectively in longrange and shortterm spatial sequence modeling. In general, given an input feature sequence (x1, x2,…, xT), the LSTM outputs the output sequence (y1, y2,…, yT). The hidden layer of LSTM is computed recursively from t = 1 to t = T with the following equations: [INLINE:2] [INLINE:3] [INLINE:4] [INLINE:5] [INLINE:6] where xi is the network activations of the ith patch, ht is the hidden vector, it, ct, ft, and ot are, respectively, the activation vectors of the input gate, memory cell, forget gate, and output gate. W terms denoted the weight matrices connecting different units, b terms denoted the bias vectors, and σ is the logistic sigmoid function. From the above equations, we can see the memory cell ci in LSTM having two inputs: the weighted sum of the current inputs and the previous memory cell units ct − 1, which enables the model to learn when to forget the old information and when to consider new information. The output gate ot controls the propagation of information to the following step. Since we utilized the spatial characteristic encoded features from CNN, the training process of LSTM of patches within WSIs was formed in a spatial format instead of time sequential manner. As shown in [Figure 3], we used the image coordinates to indicate the location of each patch in the patch spatial sequence. In this way, we considered both the unique characteristics of each patch and the finegrained variations between patches. For one prostate WSI, the patches were fed into the network to get the activations from the second to the last layer. Then, we utilized a onelayer LSTM to recursively map the activations of each patch to a feature vector. In addition, the average pooling layer was applied on top of the network to get a feature vector as the computational image features for the WSI. The number of hidden units for each LSTM is 1024. During the training process, we applied the multitask loss and assign the primary pattern and the Gleason score for the WSIs.{Figure 3} Survival models To evaluate the performance of various survival models using different image features quantified by textural and CNNbased methods on patients with prostate cancer, we used the bRFS since their initial treatment as a timetorecurrence variable for survival models. Using survival models, we assessed the image features related to recurrence hazard risk scores in the context of other clinical prognostic factors, including the primary and the secondary Gleason patterns, PSA, age, and clinical tumor stage. The hazard risk scores of image features in the context of clinical mean a measure of prostate cancer recurrence risk ratio, commonly in timetoevent analysis or survival analysis. The survival models evaluated in our study include multivariate Cox proportionalhazards model,[47] Cox regression by an elastic net penalty (COXEN),[48] parametric proportionalhazard model (PHEX),[49] parametric proportionalhazard model with lognormal distance (PHLogN),[49] and parametric proportionalhazard model with loglogistic distance (PHLogL).[49] For the highdimensional data, univariate Cox regression was applied to the computational image features. Only those with Wald test, P < 0.05 is selected in conjunction with clinical factors as inputs of the survival models. The Cox proportionalhazards model is a popular regression model for the analysis of survival data. It is a semiparametric method for adjusting survival rate estimates to quantify the effect of predictor variables. In contrast with parametric models, it makes no assumptions about the shape of the socalled baseline hazard function. It represents the effects of explanatory variables as a multiplier of a common baseline hazard function H0. Given the patients (ti, li, xi), where i = 1, 2.,N, we have the ti as the patient's recurrence time for individual i; li is the label of the censored data that equals 1 if the recurrence occurred at that time and 0 if the patient has been censored; and Xi as the vector of covariates of the selected image features and clinical factors. The hazard function is the nonparametric part of the Cox proportionalhazards regression function corresponding to [INLINE:7] Here, xij is the image features j for patient i, where j = 1, 2, …p and βi is the Cox regression parameter for each patient. The hazard ratio is derived from [INSIDE:5], representing the relative risk of instant failure for patients having the predictive value Xi compared to the ones having the baseline values. Here, di is weighting parameters for each patient. [INLINE:8] For the COXEN, the elastic net penalty [INSIDE:6]is given in the equation below. It is a mixture of the L1 (Lasso) and L2 (ridge regression) penalty. Here, is the ratio between L1 and L2 for elastic net. [INLINE:9] where [INLINE:10] Based on the assumption that the effect of the covariates is to increase or decrease the hazard by a proportionate amount at all durations, the parametric proportionalhazard model is a locationscale model for arbitrary transform of the time variable ti, leading to accelerated failure time model with different penalty distance functions. The distance functions we use for parametric proportionalhazard models are exponential transformation (PHEX), lognormal (PHLogN), and loglogistic (PHLogL) distances. The survival model fitting to different image features were quantified by Akaike information criteria (AIC).[50] AIC = −2log (likelihood) +2K(11) where likelihood is a measureofmodel fitness and K represents the number of model parameters. The smaller value of the AIC, the better the goodness of fit of the survival models. Experimental Results In this section, we conducted the experiments on the public prostate cancer dataset to make statistical analysis on various survival models using different histopathological image feature quantification methods. Implementation details For the CNNbased approaches to extract image features, we first used the patches to train the CNN with multitask loss. Each patch was resized as 256 × 256 and assigned two labels according to the Gleason grading of the WSI: one being the primary pattern and another being the Gleason score. The CNN was trained with minibatch stochastic gradient descent. The momentum is 0.9, and weight decay was 5 × 10−5. The initial learning rate is 10−3 and annealed by 0.1 after 104 iterations. To train the LSTM, we set the same momentum, the weight decay, and the initial learning rate. The learning rate is annealed by 0.1 after 2 × 103 iterations. The implementation is based on the Caffe toolbox.[51] Comparison of image features First, only using image features from tissue specimens, including clinical Gleason primary and secondary patterns and the quantified image features from various image methods, their Cox hazard ratios are shown in [Table 3]. CNN achieved better results than texture methods, including SURF,[28] HOG,[29] and LBP.[30] Using CNN with LSTM to model the spatial relation of patches achieved the highest Cox hazard ratio, which indicated the best recurrence correlation for prostate cancer patients' recurrence data. On the other hand, the image features obtained from texturebased methods and CNN approaches achieved higher Cox hazard ratios as compared to utilizing primary and secondary patterns alone.{Table 3} Second, in addition to the image features, PSA levels, ages, and clinical tumor stages were included in the Cox survival model, besides the primary and the secondary Gleason patterns. The results of combining clinical factors and image features are shown in [Table 4], demonstrating that the image features generated from CNNbased approaches were more representative than the texture features by having higher values of hazard ratio. In addition, those features were more representative than clinical prognostic factors. We also calculated the AIC values, as shown in [Table 4]. The smaller AIC value encodes the better goodness of fit of the survival model. CNN + LSTM achieved the best fitness on the Cox regression model compared to other image features quantification methods.{Table 4} Finally, without any image features, we showed the Cox hazard ratios of the clinical factors, as shown in [Table 5]. From the results of [Table 3], [Table 4], [Table 5], we can see that primary Gleason patterns have higher Cox hazard ratios than the ones of other clinical factors, which was consistent with its high prediction power for prostate cancers.[1],[4]{Table 5} Ablation study on training strategies Furthermore, considering the multiple Gleason patterns within WSIs, we designed two training strategies to train the CNNbased approaches. The first one was to use multitask loss to learn both the primary Gleason pattern and the sum of the primary and secondary patterns (namely, the Gleason score). The second one was to use the primary Gleason pattern or the Gleason score alone to learn the patterns within the patches or WSIs. The performance of two CNNbased approaches on patient recurrence analysis was compared using different training strategies. The results are shown in [Table 6]. We can see that the multitask architecture achieved better correlation with patients' recurrence than training label using the primary Gleason pattern or Gleason score alone as it has much higher recurrence hazard ratios and lower AIC values. This is because the primary Gleason pattern and the Gleason score together could better reflect the local and global image features in the WSIs than use each alone.{Table 6} Comparison of survival models In this section, we performed statistical analysis on various survival models, including COXEN,[48] PHEX,[50] PHLogN,[50] and PHLogL,[50] using prostate images with Gleason score 6–8 and clinical factors. The Cox proportionalhazards model does not need an assumption of a particular survival distribution of the patients' survival data. The only assumption in the model is about the proportional hazards. Unlike the Cox proportionalhazards model, parametric models with different penalty distance functions (such as exponential, lognormal, and loglogistic) need to specify the hazard functions.[52],[53] Studies have indicated that under certain circumstances, such as strong effect or strong time trend in covariates or followup depending on covariates, the parametric models are good alternatives to the Cox regression model.[53] We assessed different survival models and show the hazard ratios of image features and patients' clinical prognostic factors, as shown in [Table 7]. Based on these results, first, we can see that the image features quantified from WSIs outperformed other clinical factors in all texture and CNNbased approaches. Second, CNNbased approaches achieved a better correlation with patients' recurrence due to their higher hazard ratios than other texture methods for all survival models. Third, by comparing with [Table 4], COXEN achieved the lowest AIC value with image features obtained from CNN + LSTM, proving that the model was more suitable for recurrence analysis for prostate patients with low, intermediate, and high risk than other survival models.{Table 7} Discussion and Conclusions In this paper, we presented three unsupervised texture methods (SURF, HOG, and LBP) and two supervised CNNbased methods to quantify the features from histopathological images. Five survival models were assessed on those image features along with prostate cancer clinical prognostic factors, including the primary and the secondary Gleason patterns, PSA, age, and clinical tumor stage to perform bPFS analyses. Based on the statistical comparisons among different image feature quantification methods with survival models, the CNNLSTM provided the highest hazard ratio of prostate cancer recurrence under COXEN. COXEN outperforms other image quantification methods with other survival models, respectively. In our approach, patient outcomes were better correlated with their histopathological image features. Due to the limited size of the public prostate dataset, the results achieved from our experiments were preliminary. To further validate its generalizability of our approach, more prostate images from local institutions are needed to perform extensive experiments. In the future, besides using tissue WSIs for patients' bRFS analysis, integrating patients' genomic information and tissue histopathology images will be investigated as a means for providing additional predictive power. Doing so would provide a more quantitative and accurate clinical decisionmaking support system for patients with prostate cancer. Financial support and sponsorship This research was funded, in part, by grants from NIH contracts 4R01LM00923908, 4R01CA16137505, 1UG3CA22502101, and P30CA072720. Conflicts of interest Dr. Singer is the principal investigator on an investigatorinitiated clinical trial that is funded by Astellas/Medivation (NCT02885649) (http://cinj.org/clinicaltrials/index&?show=trial&p=081604). The other authors declare that they have no competing interests. References


