Journal of Pathology Informatics

: 2014  |  Volume : 5  |  Issue : 1  |  Page : 28-

Automated quantification of aligned collagen for human breast carcinoma prognosis

Jeremy S Bredfeldt1, Yuming Liu2, Matthew W Conklin3, Patricia J Keely3, Thomas R Mackie1, Kevin W Eliceiri4,  
1 Laboratory for Optical and Computational Instrumentation; Morgridge Institute for Research, Madison, WI 53715, USA
2 Laboratory for Optical and Computational Instrumentation, Madison, USA
3 Laboratory for Optical and Computational Instrumentation; Laboratory for Cell and Molecular Biology, University of Wisconsin at Madison, Madison, WI 53706, USA
4 Laboratory for Optical and Computational Instrumentation; Morgridge Institute for Research, Madison, WI 53715; Laboratory for Cell and Molecular Biology, University of Wisconsin at Madison, Madison, WI 53706, USA

Correspondence Address:
Kevin W Eliceiri
Laboratory for Optical and Computational Instrumentation; Morgridge Institute for Research, Madison, WI 53715; Laboratory for Cell and Molecular Biology, University of Wisconsin at Madison, WI 53706


Background: Mortality in cancer patients is directly attributable to the ability of cancer cells to metastasize to distant sites from the primary tumor. This migration of tumor cells begins with a remodeling of the local tumor microenvironment, including changes to the extracellular matrix and the recruitment of stromal cells, both of which facilitate invasion of tumor cells into the bloodstream. In breast cancer, it has been proposed that the alignment of collagen fibers surrounding tumor epithelial cells can serve as a quantitative image-based biomarker for survival of invasive ductal carcinoma patients. Specific types of collagen alignment have been identified for their prognostic value and now these tumor associated collagen signatures (TACS) are central to several clinical specimen imaging trials. Here, we implement the semi-automated acquisition and analysis of this TACS candidate biomarker and demonstrate a protocol that will allow consistent scoring to be performed throughout large patient cohorts. Methods: Using large field of view high resolution microscopy techniques, image processing and supervised learning methods, we are able to quantify and score features of collagen fiber alignment with respect to adjacent tumor-stromal boundaries. Results: Our semi-automated technique produced scores that have statistically significant correlation with scores generated by a panel of three human observers. In addition, our system generated classification scores that accurately predicted survival in a cohort of 196 breast cancer patients. Feature rank analysis reveals that TACS positive fibers are more well-aligned with each other, are of generally lower density, and terminate within or near groups of epithelial cells at larger angles of interaction. Conclusion: These results demonstrate the utility of a supervised learning protocol for streamlining the analysis of collagen alignment with respect to tumor stromal boundaries.

How to cite this article:
Bredfeldt JS, Liu Y, Conklin MW, Keely PJ, Mackie TR, Eliceiri KW. Automated quantification of aligned collagen for human breast carcinoma prognosis.J Pathol Inform 2014;5:28-28

How to cite this URL:
Bredfeldt JS, Liu Y, Conklin MW, Keely PJ, Mackie TR, Eliceiri KW. Automated quantification of aligned collagen for human breast carcinoma prognosis. J Pathol Inform [serial online] 2014 [cited 2020 Sep 27 ];5:28-28
Available from:

Full Text


Breast cancer diagnosis and staging have been revolutionized by new molecular screening assays based on immunohistochemistry, [1] fluorescence in situ hybridization, [2] and reverse transcription polymerase chain reaction, [3] which are all used to personalize care. These tools are helping patients live longer and receive better treatment than ever before. However, there remains a significant group of breast cancer patients for whom these new techniques ultimately fail, due to several factors including varying patient genotype and primary or acquired resistance to drugs such as the HER2/neu receptor targeting drug trastuzumab (trade name Herceptin). [4] In addition, molecular screens are confounded by the high-degree of intratumor genetic diversity and often require extra tissue sections to be cut, stained and evaluated on top of the standard hematoxylin and eosin (H&E) preparation. New assays that predict patient outcome and response to treatment are therefore critically needed if we are to continue improving breast cancer treatment and prevention. One promising area of development is image based assays, which leverage high content imaging hardware and image analysis software to classify biological samples. [5],[6],[7] In many cases, image based analysis does not require more than the standard histopathology H&E stained slides prepared as part of the normal clinical workflow. In this paper, we demonstrate the use of a new image-based assay for predicting patient outcome using information about tumor-stromal interactions from standard H&E stained histopathology specimens.

Aberrant tumor-stromal interactions have been shown to accelerate tumorigenesis in breast cancer. [8],[9],[10] The importance of stromal collagen in breast cancer is highlighted by the link between breast cancer, breast density, and the increased deposition of stromal collagen. [11],[12],[13],[14],[15] Interestingly, although mammographic density, which is attributable to collagen content, is one of the largest risk factors for the development of breast tumors, there is currently no clinical intervention based on mammographic density alone. This is due in part to the lack of a clear correlation observed between increased mammographic density and patient outcome. Most of the work to date [16],[17],[18],[19] has defined mammographic density as a etiological factor and not as a prognostic factor. Recently, Cil et al. [20] explored mammographic density as predictor of local breast cancer recurrence. They reported that women with intermediate and high breast density had a significantly elevated risk to develop a local breast cancer recurrence. However, follow-up clinical trials that incorporate additional risk factors such as obesity are needed to examine the possible prognostic value of mammographic density in large and diverse patient cohorts before using density as a possible clinical target. As well recently there has been an effort to investigate the underlying contributor to mammographic density, focusing on one of the largest components present in the dense stroma, collagen. Several studies have shown a link between collagen remodeling and the invasion and progression of mammary cancer in mouse models. [21],[22],[23] Furthermore, there was a link observed between collagen morphology, particularly collagen alignment, and breast cancer patient outcome. [24] Provenzano et al. [21] first introduced the so called tumor associated collagen signature (TACS) nomenclature to describe collagen alignment patterns. The TACS phenotypes are currently classified into three groups. TACS-1 describes the standard desmoplastic response of increased collagen deposition surrounding initiating tumor cells. TACS-2 is observed as straightened fibers aligned tangentially around developing tumors, while TAC-3 is seen as radially aligned fibers that facilitate local invasion. [25] Conklin et al. [24] qualitatively searched for these patterns in human breast cancer samples and through extensive manual analysis found that the presence of the TACS-3 alignment phenotype was a prognostic indicator for disease free and disease specific survival (DFS and DSS respectively) for invasive breast cancer patients. Our quantitative study, presented here, computationally builds on this previous work by defining an algorithmic model for TACS-3 and applying this model to the same cohort of patients. [24]

Previous collagen alignment studies have largely been facilitated by the development of second harmonic generation (SHG) microscopy techniques, which have the ability to capture high contrast images of the collagen fiber extracellular matrix without the need for exogenous stains. [26],[27],[28],[29] The application of SHG imaging in cancer research is growing rapidly. For example, changes in the ratio of the forward SHG (FSHG) to backward propagating SHG signal have been recently linked to breast tumor progression [30] and positive lymph node status. [31] SHG directionality was also used by Ajeti et al. to quantify the collagen composition in breast cancer models, [32] while Ambekar et al. used Fourier transform and polarization-resolved SHG imaging to differentiate malignant from benign tissues in breast biopsies. [33]

In addition, many new computational techniques are being developed to quantify patterns observed in SHG images. For example, a directional gradient method developed by Altendorf et al. [34] provides three-dimensional orientation and radius information about fibers in SHG images. Due to the fibrous nature of the collagen matrix, SHG images are particularly well-suited for the curvelet transform (CT), which is a multiscale, orientation sensitive version of the wavelet transform. The CT [35] and combined fiber tracking methods [36] have been applied to extract fiber orientation, length, curvature and radius from SHG images of collagen. One key feature however that is missing from all of the available image analysis techniques is the ability to incorporate cellular information into the analysis. The interaction between tumor cells and collagen fibers cannot be fully assessed without integration of information about cellular morphology and associated collagen morphology. As well this information is critical for finding regions of interest (ROI) with TACS, an essential task for any type of high-throughput screening where manual searching is not practical. Herein, we describe a computational protocol that achieves this goal by integrating information about collagen fibers from SHG images with information about cells captured through bright field imaging of standard H&E stained slides to perform highly automated, prognostic TACS-3 scoring.

In order for TACS to become a useful and fully validated biomarker, it must be screened for in several large studies containing many patients and diverse populations. In addition, besides screening in heterogeneous populations, it ideally needs to be screened in diverse sample types to account for possible subtle differences in surgery, pathology or sample preparation that could negatively impact sample consistency. This ability to rapidly screen in many sample types of large diverse populations would also open the door for TACS to be explored in other cancer types such as pancreatic and renal cancer. Heretofore, there has not been a method that automates enough of the process to enable such large scale adaptation. In previous studies, collagen fiber angles were measured by hand, one at a time, using ImageJ ROI marking tools. [21],[23] These experiments used information gathered a priori or from autofluorescence to identify tumor-stromal boundaries. In addition, imaging locations were chosen manually. Conklin et al. manually captured each individual image, used bright field images to manually identify tumor-stromal boundaries, and manually estimated collagen fiber angles. [24] In each of these cases, many subjective decisions were made while identifying which areas to image, which fibers to measure and what should be considered a tumor-stromal boundary. There has been progress made in automating the fiber angle analysis steps of this task. [35],[36],[37],[38],[39] However, none of these methods can automate all four steps of the TACS analysis process, which are: (1) Image capture, (2) Fiber angle measurement, (3) tumor-stromal boundary identification, and (4) relative angle measurement between fiber and boundary. In this paper, we use image analysis and supervised learning techniques to enable the automation of each of these tasks. The block diagram of our imaging and analysis protocol is shown in [Figure 1]. Starting with the previously-imaged invasive breast cancer tissue microarray (TMA), we captured registered, whole-slide SHG and bright field images, extracted fibers from the SHG images, identified tumor-stromal boundaries from the bright field images, and measured relative angles, all in a scripted pipelined process that requires little human intervention. We believe that this method will allow significantly larger scale studies to be performed in order to validate TACS-3 as a prognostic biomarker in breast cancer and potentially other cancer types, and to investigate if TACS-3 can be used to predict patient response to targeted therapies.{Figure 1}


Human Breast Carcinoma Tissue Microarray

The TMA used here was the same as that used by Conklin et al. [24] for the manual collagen alignment analysis. The clinical profiles of all patients whose tissue was included in this TMA have been described in a previous study. [39] All tissue and patient information used in this study were acquired following Institutional Review Board approval. Tumor tissues from 353 patients diagnosed with invasive carcinoma were resected by the same surgeon between 1981 and 1995. Pieces of each resected tumor were embedded in paraffin according to standard histopathology protocols. After tumors smaller than 5 mm and severely damaged samples were excluded, 196 patients remained for analysis. Sections of 4 μm thickness were cut from archived TMA blocks containing 1.0 mm diameter tissue cores, placed on glass slides, stained with H&E and mounted under a glass coverslip. Patients were followed for a median of 6.2 years, ranging from 1 month to 18.6 years.

Imaging System

All samples in this study were imaged with the custom built integrated FSHG/bright field imaging system shown in [Figure 2]. A MIRA 900 Ti: Sapphire laser (Coherent, Santa Clara, CA) tuned to 780 nm, with a pulse length of approximately 100 fs, was directed through a Pockel's cell (ConOptics, Danbury, CT, USA), half and quarter waveplates (ThorLabs, Newton, NJ, USA), beam expander (ThorLabs), a 3 mm galvanometer driven mirror pair (Cambridge, Bedford, MA), a scan/tube lens pair (ThorLabs), through a dichroic beam splitter (Semrock, Rochester, NY) and focused by a 20X/0.75NA objective (Nikon, Melville, NY). SHG light was collected in the forward direction with a 0.54 NA condenser (ThorLabs) and filtered with an interference filter centered at 390 nm with a full width at half maximum bandwidth of 22.4 nm (Semrock). The back aperture of the condenser lens was imaged onto the 5 mm aperture of a 7422-40P photomultiplier tube (Hamamatsu, Hamamatsu, Japan) the signal from which was amplified with a C7319 integrating amplifier (Hamamatsu) and sampled with an analog to digital converter (Innovative Integration, Simi Valley, CA). Timing between the galvo scanners, signal acquisition, and motorized stage positioning was achieved using our custom software called WiscScan. [41] The Rapid Automated Modular Microscope system (Applied Scientific Instrumentation, Eugene, OR) served as our microscope base and we used ASI motorized translation stages for x, y, and z motion control. The SHG light source was verified to be circularly polarized at the sample using the protocol of Chen et al. [29] SHG images were captured as stacks of three images spaced 3 μm apart, then z-projected to improve field flatness. Bright field images were captured with the same system using a MCWHL2 white LED lamp (ThorLabs) set up for Kohler illumination. White light from this lamp was separated from SHG light traveling through the condenser assembly using a short pass dichroic mirror with a cutoff at 670 nm (Semrock). An RGB camera (QImaging, Surrey, BC, Canada) was used to capture bright field images through WiscScan to allow for acquisition within a single application. Both SHG and white light images were tiled with 10% overlap using automation provided by WiscScan. Stage positions for individual images and pixel size data were stored in Bio-Formats image metadata [42] and this was then used by the grid/collection stitching ImageJ plugin [43] to reassemble a high-resolution large field of view image of the entire TMA. When capturing large field of view images, the sample plane often walks out of the in-focus imaging plane as the stage is translated over large distances in x or y. We alleviated this issue using the Continuous Reflection Interface Sampling and Positioning autofocus system (Applied Scientific Instrumentation), which maintained an accurate distance between the coverslip and the objective throughout the whole slide stitched image capture. This allowed for a single bright field image to be captured at each location rather than a z-stack, improving capture speed, reconstruction speed, and reducing production of unnecessary data. After SHG and white light images were captured and stitched, the two modalities were manually registered with the landmark correspondences ImageJ plugin using five control points per image. The image of the entire TMA was registered in a single step, and then each individual TMA core was cropped out of the full TMA image, producing 196 images. The resulting TMA core images were each 2048 × 2048 pixels, consisting of four eight-bit channels. The first three channels represented the red, green and blue planes of the white light image, while the fourth channel contained the SHG information.{Figure 2}

Tumor Associated Collagen Signatures-3 Model

Our TACS-3 model was based on previously published observations relating collagen structure to breast cancer progression and survival. In these studies, the first step in the TACS-3 scoring process was the identification of groups of straightened, aligned collagen fibers. The second step was to determine if those fibers terminate at or near regions of epithelial cells at steep angles. If a fiber met both of these criteria, then it was considered TACS-3 positive. If one or more TACS-3 positive fibers were found in a sub-region of an image, then that region was scored TACS-3 positive. The number of regions with TACS-3 positive scores was then used to score the entire image. There were many details in these steps and defining parameters to account for each step would have produced a potentially fragile model. Instead, we have implemented a supervised learning approach that allows the data to most appropriately define the model. We performed this task computationally using a series of cascaded classifiers. The first classifier was trained to find epithelial regions in the images using a small training set of annotated ROIs. The resultant epithelial cell model was then used to segment epithelial cell regions within the entire cohort of images. Features describing the epithelial regions were then combined with features derived from our fiber extraction algorithm and were fed into a second classifier, which was trained to score each image as being TACS-3 positive or negative based on a training set of annotated images. TACS-3 scores were then fed into a cox proportional hazard model to regress to censored survival data.


Fiber Extraction

We applied a technique called CT-FIRE [36] to the SHG images to enhance, trace and extract a network of collagen fibers for each SHG image [INSIDE:1]. CT-FIRE combines the advantage of the CT [44] for denoising the image and enhancing the fiber ridge features with the advantage of a fiber tracing algorithm [45] for automatic fiber extraction, being capable of extracting fiber geometric information such as length, angle, width, and curvature of each fiber. We applied the fast discrete CT (FDCT) to capture a collection of coefficients C D in curvelet space, which are defined as the inner product of the input SHG image channel with each of the curvelet basis functions.


where [INSIDE:2] is the digital curvelet waveform and jkl represent the scale, orientation, and location indices, respectively. We used the open source FDCT MATLAB (The Mathworks, Natick, MA, USA) [44] library and specifically the "wrapping" version of the FDCT due to its simplicity. To denoise the image, we set all curvelet coefficients to zero that fall below a user defined threshold T as shown below


This threshold was determined empirically on a small subset of SHG images to determine the appropriate level of noise reduction. The inverse FDCT was then applied to reconstruct an edge enhanced, noise reduced version of the SHG image. After reconstruction, CT-FIRE traced fibers, using the method of Stein et al., [45] by first finding local maxima in the result of the smoothed distance transform. The distance transform computed the distance from each foreground pixel to the nearest background pixel. Fiber branches were formed by creating regions surrounding each local maxima, the size of which were defined by the result of the distance transform at the location of the local maximum point. The edges of this region were then searched for further local maxima. This process was repeated until no new local maxima were found indicating the end of a fiber branch. Short branches were then pruned from the network and closely spaced, similarly oriented fibers were merged. Fiber width (FW) was quantified for each extracted fiber by averaging the fiber widths (2R i ) at n points that were used to form the fiber


Where R i is the fiber radius at the i th point, estimated by the result of the distance transform at that location. Fiber straightness (FS) was quantified for each extracted fiber by dividing the distance between the end points of the fiber (d n) by the distance along the path of the fiber (d0).


Thus for perfectly straight fibers FS=1.0 and wavy fibers FS<1.0. After fiber objects have been extracted from each of the images, we next segment epithelial cell regions.

Epithelial Cell Segmentation

The TACS-3 phenotype consists of straightened aligned collagen fibers that terminate near regions of epithelial cells such that the angles of the collagen fibers appear perpendicular to the epithelial stromal boundary. Detecting this TACS-3 phenotype requires knowledge of the locations of epithelial cells within the sample. We must then identify regions of epithelial cell clusters and identify a boundary between the epithelial cells and surrounding stroma. This task was performed in two steps outlined here. Step 1 used the Trainable Weka Segmentation ImageJ plugin [46] to find epithelial cell nuclei and step 2 applied a cascaded matched filter, threshold operation to identify clusters and boundaries. The details of these steps are given below.


For the second step in the segmentation process, the epithelial class probability map was filtered with a Gaussian filter matched to the average width of the epithelial cell nuclei (three microns) and thresholded such that the top 80% of resulting pixels were retained. The resulting image was then filtered with a Gaussian filter matched to the width of the average sized epithelial cell cluster (25 microns), then finally thresholded such that, again, the top 80% of resulting pixels were retained. Following the final threshold step, regions smaller than 50 pixels in area were discarded and a mask was generated with epithelial cell clusters in the foreground and all else in the background. Epithelial mask pixels are represented here as e i while epithelial region boundary pixels were created using an eight-connected neighborhood and are denoted as b i.

Mask images were saved as tiff files and read, along with the extracted fiber data, into the custom, open source CurveAlign software, described more below, for fiber/epithelial region feature extraction. Outlines of the resulting mask files were overlaid onto the original white light images to qualitatively validate the segmentation accuracy of the applied epithelial region model.

Combined Fiber-Epithelial Features and Fiber Classification

In the sections above, we described our methods for epithelial cluster segmentation and collagen fiber extraction. With these two pieces of information, we associated fibers with epithelial cell clusters and measured the interaction between the two using the features described here. This task was performed by an open source, MATLAB based tool called CurveAlign. [35] This tool started by reading in a fiber database file (generated by CT-FIRE) and an epithelial mask file (generated by our epithelial segmentation script). A feature vector p i was then built for each fiber endpoint v i∊R 2 in the image. The feature vector was populated directly with features derived above in the fiber extraction section including fiber length, curvature, radius and grey level. Both endpoints were given the same values for these single fiber derived features. The rest of the features were unique to each fiber end point. All features used in TACS-3 fiber classification are listed in [Table 1]. Many of the features in this section rely heavily on the nearest neighbor search routine which is formulated here as{Table 1}


where bres indicates a modified Bresenham algorithm [50] which is used to find all pixels along a line between two points. The term q=[rexp(θv)] is the offset from v in the x and y directions. For this last feature, three values of r (50, 100 and 200 μm) were calculated. These three lengths corresponded to 5, 10, and 20 times the diameter of a typical epithelial cell and were selected based on estimates of intercellular signaling distances. [51] If no intersection was found, then the [INSIDE:3] feature value was set to zero. The angle of the tumor stromal boundary line [INSIDE:4] was estimated by fitting a quadratic to nine contiguous points on the boundary surrounding the intersection point (or nearest boundary point in the previous feature) and computing the tangent angle of the line fit at the midpoint. The steps in the process of relative angle feature extraction are diagrammed in [Figure 3].{Figure 3}

Each of these fiber level features p i were calculated for every fiber endpoint v iin the cohort. Fiber level features were then averaged among all fibers in a given image and training was performed with a subset of 16 images I t∈I i that had been manually annotated as being TACS-3 positive or negative. A linear support vector machine (SVM) was used to build a model, which was then applied to all 196 images in the cohort for classifying each image as being TACS-3 positive or negative.

Classification and Survival Analysis

The TACS-3 scores were correlated with DFS and DSS data using the Cox-proportional-hazards regression method. [52] DFS was defined as the time from date of diagnosis to the first date of recurrence and DSS was defined as the time from diagnosis to death from breast cancer or date of last follow-up evaluation. In both cases, all other events were censored. The Kaplan-Meier method was used to compare DFS and DSS between TACS-3 negative and TACS-3 positive patients. Hazard ratios were computed using a log-rank test. Correlations between manual and computationally generated TACS-3 scores were made using the Pearson's linear correlation coefficient.


Registered SHG and bright field images of a subsample of the TMA are shown in [Figure 4] along with two zoomed versions of regions within the image. SHG information is added as an alpha channel on top of the raw RGB bright field image and pseudo colored yellow. The fully zoomed panel shows the detail available in the full resolution images captured with the 20X, 0.75 NA lens and shows a region with a positive TACS-3 signature. A collection of three more TACS-3 positive and three TACS-3 negative regions were cropped out of the TMA images and shown in [Figure 5]. These images illustrate the features that are common to the TACS-3 signature including straightened, aligned fibers terminating in or near regions of epithelial cells at near perpendicular angles with respect to the epithelial region border. In addition, the TACS-3 negative cases show wavy fibers, fibers that terminate at adipose tissue, and a curved fiber encapsulating an epithelial cell cluster [Figure 5]d-f, respectively].{Figure 4}{Figure 5}

A sample of our fiber extraction and epithelial region segmentation results are shown in [Figure 6] and [Figure 7], respectively. In both cases, epithelial region segmentation and fiber extraction were observed to accurately represent the data. The orientations of the epithelial cell region boundaries were compared to collagen fiber angles derived from the results of a fiber object extraction algorithm CT-FIRE which has been shown to perform well in comparison to other techniques. [36] A representative sample of the results produced by this algorithm are shown in [Figure 6]. The intermediate product after the CT denoising step is shown in [Figure 6]a, while the extracted fiber network is shown overlaid on the original SHG image as shown in [Figure 6]b. Although some fibers are over-or under-segmented (annotated by green arrows), most of the extracted fibers properly represent the data. [Figure 7] clearly demonstrates the ability of our epithelial cell segmentation algorithm to properly classify many of the regions of epithelial cells as positive. However, a few small regions of stromal fibroblasts and endothelial cells are included in the epithelial cell regions (annotated by green arrows). Although these errors occurred occasionally throughout the cohort, the noise they generated did not overcome the TACS-3 signal. Another feature evident in [Figure 7]d is the smoothness of the epithelial region boundaries. The boundary smoothness was dependent on the selection of our filter widths and binary mask thresholds. These parameters were selected to accurately represent the boundary orientation at the spatial scale of the epithelial cell regions.{Figure 6}{Figure 7}

Although correlation with survival is our ultimate goal, automated TACS-3 scores should also correlate with manual scores for each of the images. The Pearson linear correlation coefficient was used to determine this correlation, the results of which are tabulated in [Table 2]. The manual analysis performed by Conklin et al. produced three scores. Score 1 was the number of TACS-3 positive regions divided by the total number of regions analyzed, score 2 was the average number of TACS-3 positive votes per region among three observers, and score 3 indicated if one or more region received a TACS-3 positive rating. [Table 2] shows positive correlation between all manual scoring methods and our computational scoring system presented here, with the highest correlation observed to be with manual score 2.{Table 2}

The Kaplan-Meier curves in [Figure 8] demonstrate the prognostic potential of our TACS-3 scoring system. TACS-3 negative patients showed significantly better disease-free and disease-specific survival compared to TACS-3 positive patients. In addition, Cox proportional hazard regression showed significant correlation between our computationally generated TACS-3 scores and survival as listed in [Table 3]. We also correlated scores created by individual fiber feature metrics with survival. Although fiber features alone were correlated with survival, the highest correlation was observed when the TACS-3 scores were composed of multiple integrated fiber/epithelial features. This result shows that a multimodality imaging and analysis approach that combines features of not only collagen fibers, but both collagen fibers and cellular structures is most likely to succeed in predicting survival.{Figure 8}{Table 3}

[Table 4] lists the 14 most informative features in the TACS-3 scoring process ranked according to their weight produced by the linear SVM algorithm. The SVM weight was used to assess, which features were more or less informative in the classification. Of particular interest are the features labeled as "nearest distance to boundary" and "inside epithelial region". These features indicate the proximity between fibers and epithelial cell regions and were highly important in the TACS-3 classification. In addition, the difference in mean feature scores d f =f p - f n for the training set is shown in [Table 4] for each of the ranked features. If d f is >0, then the TACS-3 positive images had larger values for those features and if d f is < 0, then the TACS-3 negative images had larger values. For example, the density features resulted in lower d f values in the TACS-3 positive cases indicating that the TACS-3 positive images had lower density collagen fibers. On the other hand, d f was positive for the alignment features indicating that TACS-3 positive images tended to have more aligned fibers. Interestingly, relative boundary angle was not as highly informative as many other features; however, still was ranked within the top 14 of 27 features.{Table 4}

 Discussion and Conclusions

The search for new prognostic and predictive breast cancer biomarkers is motivated by the need to improve patient outcome. A significant number of patients present with none of the currently available markers. In addition, survival and treatment response is often heterogeneous among patients within current biomarker classifications. The discovery and validation of new biomarkers will help to further improve breast cancer diagnosis and treatment planning. These new biomarkers need to be quantifiable, scalable and ideally correlate with both disease outcome and treatment specific response. The candidate biomarker we are focused on in this study (TACS-3) measures collagen alignment relative to tumor-stromal boundaries and has been associated with progression in mouse models and has been shown to predict disease recurrence and survival in human patients. Here, we demonstrate a protocol for using large field of view imaging techniques, image analysis and supervised learning to automate and quantify all of the steps in the process of TACS-3 scoring. These advances provide the tools for increasing the scale of TACS-3 investigations and applying TACS-3 scoring to cancers in other tissues such as ovarian [53] and pancreatic cancer [54],[55] where collagen fiber characteristics are predicted to correlate with prognosis. These techniques could also be used to characterize other TACS both current and yet to be identified to see if they have research value in animal models or prognostic value in clinical specimens.

Tumor associated collagen signatures (TACS) analysis requires the simultaneous analysis of information about epithelial cells and extracellular collagen. The interactions between collagen and cells can only be assessed computationally if the cellular information is carefully registered with images of the collagen. We have therefore optimized our imaging system for highly automated capture of large fields of view, registered SHG and bright field images of stained microscope slides with the purpose of analyzing collagen angle with respect to cell cluster boundaries. For this paper, we originally planned to use the same SHG and bright field images captured by Conklin et al. [24] since these were already manually annotated. Unfortunately, these images contained artifacts, which, although trivial for the human visual perception system to overcome, were extremely difficult for our computational systems to handle effectively. For example, SHG images were originally captured in the backwards direction with elliptically polarized light, causing two artifacts. The first was simply a low signal to noise level due to few SHG photons traveling in the backward direction from the thin tissue sections. [56],[57] The second artifact was observed as a larger relative SHG signal from fibers in the direction parallel to the long axis of the laser polarization ellipse. [58] Artifacts in the bright field images included significant vignetting at field edges and low signal to noise due to short exposure times. These artifacts were easily hurdled by the human observers making TACS-3 assessments in a previous study. [24] However, they are particularly difficult to handle by a computer vision based approach. We therefore decided to develop an optimized imaging system and protocol that would fix many of these artifacts and allow for more consistent automated imaging. Similar image quality and consistency can be achieved with other SHG microscopes including commercial systems with the appropriate hardware, but our analysis protocol did identify a necessary rigorous acquisition protocol that is best achieved with our new automated SHG microscope. In general, the system should allow for FSHG and bright field imaging with a field of view as large and flat as possible, numerical aperture of at least 0.75, automated xyz motion control with appropriate position logging, circular polarization at the sample for SHG imaging, autofocus and automated switching between SHG and bright field imaging.

Our system of imaging and analysis to produce prognostic TACS-3 scores uses standard histopathology H and E slide preparations. The technique is therefore completely compatible with routine clinical protocols and is intended to augment currently available diagnostic tests. The current process requires no changes to current clinical protocol and the sample is returned to the clinician unmodified. We present a system that uses SHG imaging to capture collagen fiber images; however, wide field polarization sensitive techniques [59] such as LC-PolScope [60] or Picrosirius red staining [61],[62] might be used to alternatively capture images of collagen fibers. One advantage of using SHG is that it does not require additional stains and can capture three-dimensional fiber information in thick, unstained tissue samples. Unfortunately, when imaging in thick unstained tissue, the identification of epithelial regions can be difficult; however, techniques using autofluoresence and fluorescent lifetime imaging have been shown to be capable of this task. [63],[64] As implemented here, our TACS-3 scoring algorithm is necessarily two-dimensional, since we are relying on H&E stained slides for our epithelial cell information. However, fiber extraction, epithelial region segmentation and relative angle measurements can be extended to three-dimensions without significant alteration of our general protocol. In addition, although our current TACS-3 scoring protocol is able to process standard H&E stains, staining for epithelial cells, with, for example, pan-cytokeratin conjugated stains, may simplify and improve epithelial cell segmentation. Future methods may also be adapted to segment clusters of fibroblasts, macrophages and other stromal cells, whose proximity and relative morphological structure with respect to surrounding collagen fibers may further improve correlation with survival or metastatic potential.

Collagen alignment related image features are interesting not only because they have been shown to be prognostic, but because they have been shown to be directly linked to cancer biology. Researchers have found that cells are more likely to invade along parallel, aligned collagen fibers, [25],[65] features that are directly being measured by our system. Access to the breadth of fiber data available with our techniques could lead to advances in our understanding of these biological phenomena. Relevant feature sets are not always available with other machine vision systems developed for biological image classification. For example, although WNDCHRM [5] is an extremely powerful image classification tool, informative image features often do not relate to the biology at hand. In the case of our TACS-3 analysis system, biological observations have driven the image analysis model; therefore, features are more easily linked back to biological functions potentially revealing new insights.

High mammographic density is one of the largest risk factors for the development of breast cancer and has been associated with increased epithelial cellularity and increased collagen density. [12],[14],[19] Increased collagen density has been observed to promote tumor progression in a mouse tumor model [23] and in node positive breast cancer [31] leading one to potentially conclude that collagen density causes elevated risk. However, Maller et al. [66] observed that high density, nonfibrillar collagen protected against tumor progression and alternatively, that linearized collagen fibers induced invasive cellular behavior. In agreement with these recent findings, we observe here that TACS-3 fibers are more commonly present in regions of lower fiber density and are more likely to be thinner, more linearized fibers. Thick, curvy, and denser collagen fibers are unlikely to contain TACS-3 fibers and are observed to be associated with a better prognosis. These observations support the hypothesis that collagen fiber shape and organization is a key aspect of the invasive extracellular matrix (ECM) phenotype.

The imaging instrumentation presented here consists of a relatively compact and highly automated multiphoton microscope with an integrated bright field slide scanner. The system has been optimized to capture registered whole slide images of both bright field and SHG images of histopathology slides by imaging each small × 20 field of view and automatically aligning and stitching each image together. Capturing large fields of view in this manner allows for a more thorough and consistent data collection potentially reducing sampling bias and supporting pipelined computational image analysis. The registration of cellular with extracellular collagen information provided by our system allows for the quantitative analysis of key relative structural features between collagen fibers and cancer cell clusters. In addition to SHG, our multiphoton system is capable of imaging other endogenous fluorophores such as nicotinamide adenine dinucleotide (NADH) or flavin adenine dinucleotide (FAD) as well as any of the routine exogenous multiphoton probes used to stain tissue.


We present an imaging and analysis protocol that uses high content imaging techniques coupled with supervised learning to perform semi-automated TACS-3 scoring of slide mounted biopsy samples. We apply our technique to a previously annotated TMA containing tissue from 207 patients with invasive breast cancer. The resulting scores are shown to positively correlate with manual annotations and to predict patient outcome with good statistical significance. Future work will attempt to validate this technique on larger cohorts of breast cancer patients, to study ECM targeted drug responses in animal models, and to study collagen alignment in other cancers. As well, future work will focus on improving the clinical application of these techniques so they can be run by untrained clinical personnel and be run at the time of acquisition to find ROIs and TACS within those regions automatically. Together with more automation, TACS screening has great potential as a clinical diagnostic tool that can provide relevant prognostic information from large numbers of tissue samples.


We would like to acknowledge assistance and valuable discussions with members of the Keely, Medical Devices and LOCI groups. We also acknowledge funding from NIH R01 Grants CA114462 and CA136590 to P.J.K and K.W.E, T32 CA009206 to J.S.B., and the Morgridge Institute for Research. The authors are grateful to Johannes Schindelin, Curtis Rueden, Andreas Velten, Ilya Goldberg, and Paolo Provenzano for their helpful technical suggestions and Andreas Friedl for providing access to the TMA. As well we wish to acknowledge the help of the WiscScan programmers at LOCI, Ajeet Vivekanandan, David Mayer and Mohit Chainani for their assistance in adding support for new functionality in WiscScan and debugging current features.


1Allred DC, Harvey JM, Berardo M, Clark GM. Prognostic and predictive factors in breast cancer by immunohistochemical analysis. Mod Pathol 1998;11:155-68.
2Press M, Slamon D, Cobleigh M, Vogel C, Zhou JY, Anderson S, et al. Improved clinical outcomes for herceptin (R)-treated patients selected by fluorescence in situ hybridization (FISH). Mod Pathol 2002;15:47A.
3Habel LA, Shak S, Jacobs MK, Capra A, Alexander C, Pho M, et al. A population-based study of tumor gene expression and risk of breast cancer death among lymph node-negative patients. Breast Cancer Res 2006;8:R25.
4Nahta R, Esteva FJ. HER2 therapy: Molecular mechanisms of trastuzumab resistance. Breast Cancer Res 2006;8:215.
5Shamir L, Delaney JD, Orlov N, Eckley DM, Goldberg IG. Pattern recognition software and techniques for biological image analysis. PLoS Comput Biol 2010;6:e1000974.
6Madabhushi A, Agner S, Basavanhally A, Doyle S, Lee G. Computer-aided prognosis: Predicting patient and disease outcome via quantitative fusion of multi-scale, multi-modal data. Comput Med Imaging Graph 2011;35:506-14.
7Myers G. Why bioimage informatics matters. Nat Methods 2012;9:659-60.
8Rønnov-Jessen L, Petersen OW, Koteliansky VE, Bissell MJ. The origin of the myofibroblasts in breast cancer. Recapitulation of tumor environment in culture unravels diversity and implicates converted fibroblasts and recruited smooth muscle cells. J Clin Invest 1995;95:859-73.
9Elenbaas B, Spirio L, Koerner F, Fleming MD, Zimonjic DB, Donaher JL, et al. Human breast cancer cells generated by oncogenic transformation of primary mammary epithelial cells. Genes Dev 2001;15:50-65.
10Tlsty TD, Hein PW. Know thy neighbor: Stromal cells can contribute oncogenic signals. Curr Opin Genet Dev 2001;11:54-9.
11Boyd NF, Martin LJ, Yaffe MJ, Minkin S. Mammographic density and breast cancer risk: Current understanding and future prospects. Breast Cancer Res 2011;13:223.
12Guo YP, Martin LJ, Hanna W, Banerjee D, Miller N, Fishell E, et al. Growth factors and stromal matrix proteins associated with mammographic densities. Cancer Epidemiol Biomarkers Prev 2001;10:243-8.
13Boyd NF, Martin LJ, Sun L, Guo H, Chiarelli A, Hislop G, et al. Body size, mammographic density, and breast cancer risk. Cancer Epidemiol Biomarkers Prev 2006;15:2086-92.
14Boyd NF, Guo H, Martin LJ, Sun L, Stone J, Fishell E, et al. Mammographic density and the risk and detection of breast cancer. N Engl J Med 2007;356:227-36.
15Boyd NF, Martin LJ, Bronskill M, Yaffe MJ, Duric N, Minkin S. Breast tissue composition and susceptibility to breast cancer. J Natl Cancer Inst 2010;102:1224-37.
16Thurfjell E. Breast density and the risk of breast cancer. N Engl J Med 2002;347:866.
17Habel LA, Dignam JJ, Land SR, Salane M, Capra AM, Julian TB. Mammographic density and breast cancer after ductal carcinoma in situ. J Natl Cancer Inst 2004;96:1467-72.
18Boyd NF, Rommens JM, Vogt K, Lee V, Hopper JL, Yaffe MJ, et al. Mammographic breast density as an intermediate phenotype for breast cancer. Lancet Oncol 2005;6:798-80.
19McCormack VA, dos Santos Silva I. Breast density and parenchymal patterns as markers of breast cancer risk: A meta-analysis. Cancer Epidemiol Biomarkers Prev 2006;15:1159-69.
20Cil T, Fishell E, Hanna W, Sun P, Rawlinson E, Narod SA, et al. Mammographic density and the risk of breast cancer recurrence after breast-conserving surgery. Cancer 2009;115:5780-7.
21Provenzano PP, Eliceiri KW, Campbell JM, Inman DR, White JG, Keely PJ. Collagen reorganization at the tumor-stromal interface facilitates local invasion. BMC Med 2006;4:38.
22Provenzano PP, Eliceiri KW, Yan L, Ada-Nguema A, Conklin MW, Inman DR, et al. Nonlinear optical imaging of cellular processes in breast cancer. Microsc Microanal 2008;14:532-48.
23Provenzano PP, Inman DR, Eliceiri KW, Knittel JG, Yan L, Rueden CT, et al. Collagen density promotes mammary tumor initiation and progression. BMC Med 2008;6:11.
24Conklin MW, Eickhoff JC, Riching KM, Pehlke CA, Eliceiri KW, Provenzano PP, et al. Aligned collagen is a prognostic signature for survival in human breast carcinoma. Am J Pathol 2011;178:1221-32.
25Provenzano PP, Inman DR, Eliceiri KW, Trier SM, Keely PJ. Contact guidance mediated three-dimensional cell migration is regulated by Rho/ROCK-dependent matrix reorganization. Biophys J 2008;95:5374-84.
26Zipfel WR, Williams RM, Christie R, Nikitin AY, Hyman BT, Webb WW. Live tissue intrinsic emission microscopy using multiphoton-excited native fluorescence and second harmonic generation. Proc Natl Acad Sci U S A 2003;100:7075-80.
27Zipfel WR, Williams RM, Webb WW. Nonlinear magic: Multiphoton microscopy in the biosciences. Nat Biotechnol 2003;21:1369-77.
28Williams RM, Zipfel WR, Webb WW. Interpreting second-harmonic generation images of collagen I fibrils. Biophys J 2005;88:1377-86.
29Chen X, Nadiarynkh O, Plotnikov S, Campagnola PJ. Second harmonic generation microscopy for quantitative analysis of collagen fibrillar structure. Nat Protoc 2012;7:654-69.
30Burke K, Tang P, Brown E. Second harmonic generation reveals matrix alterations during breast tumor progression. J Biomed Opt 2013;18:31106.
31Kakkad SM, Solaiyappan M, Argani P, Sukumar S, Jacobs LK, Leibfritz D, et al. Collagen I fiber density increases in lymph node positive breast cancers: Pilot study. J Biomed Opt 2012;17:116017.
32Ajeti V, Nadiarnykh O, Ponik SM, Keely PJ, Eliceiri KW, Campagnola PJ. Structural changes in mixed Col I/Col V collagen gels probed by SHG microscopy: Implications for probing stromal alterations in human breast cancer. Biomed Opt Express 2011;2:2307-16.
33Ambekar R, Lau TY, Walsh M, Bhargava R, Toussaint KC Jr. Quantifying collagen structure in breast biopsies using second-harmonic generation imaging. Biomed Opt Express 2012;3:2021-35.
34Altendorf H, Decencière E, Jeulin D, De sa Peixoto P, Deniset-Besseau A, Angelini E, et al. Imaging and 3D morphological analysis of collagen fibrils. J Microsc 2012;247:161-75.
35Pehlke C, Bredfeldt JS, Doot J, Sung KE, Provenzano P, Riching K, et al. Quantification of collagen architecture using the curvelet transform. Integrative Biology, in Review; January 2014.
36Bredfeldt JS, Liu Y, Pehlke CA, Conklin MW, Szulczewski JM, Inman DR, et al. Computational segmentation of collagen fibers from second-harmonic generation images of breast cancer. J Biomed Opt 2014;19:16007.
37Falzon G, Pearson S, Murison R. Analysis of collagen fibre shape changes in breast cancer. Phys Med Biol 2008;53:6641-52.
38Rubbens MP, Driessen-Mol A, Boerboom RA, Koppert MM, van Assen HC, TerHaar Romeny BM, et al. Quantification of the temporal evolution of collagen orientation in mechanically conditioned engineered cardiovascular tissues. Ann Biomed Eng 2009;37:1263-72.
39Bayan C, Levitt JM, Miller E, Kaplan D, Georgakoudi I. Fully automated, quantitative, noninvasive assessment of collagen fiber content and organization in thick collagen gels. J Appl Phys 2009;105:102042.
40Baba F, Swartz K, van Buren R, Eickhoff J, Zhang Y, Wolberg W, et al. Syndecan-1 and syndecan-4 are overexpressed in an estrogen receptor-negative, highly proliferative breast carcinoma subtype. Breast Cancer Res Treat 2006;98:91-8.
41Eliceiri K, Nazir M. Wiscscan, 2012. Available: [2012 Apr 04]
42Linkert M, Rueden CT, Allan C, Burel JM, Moore W, Patterson A, et al. Metadata matters: Access to image data in the real world. J Cell Biol 2010;189:777-82.
43Preibisch S, Saalfeld S, Tomancak P. Globally optimal stitching of tiled 3D microscopic image acquisitions. Bioinformatics 2009;25:1463-5.
44Candes E, Demanet L, Donoho D, Ying L×. Fast discrete curvelet transforms. Multiscale Model Simul 2006;5:861-99.
45Stein AM, Vader DA, Jawerth LM, Weitz DA, Sander LM. An algorithm for extracting the network geometry of three-dimensional collagen gels. J Microsc 2008;232:463-75.
46Schindelin J, Arganda-Carreras I, Frise E, Kaynig V, Longair M, Pietzsch T, et al. Fiji: An open-source platform for biological-image analysis. Nat Methods 2012;9:676-82.
47Ignacio AC, Kaynig V, Schindelin J. Trainable Weka Segmentation. Available: [2013 Oct 25].
48Breiman L. Random forests. Mach Learn 2001;45:5-32.
49Criminisi A, Shotton J, Konukoglu E. Decision forests: A unified framework for classification, regression, density estimation, manifold learning and semi-supervised learning. Found Trends Comput Graph Vision 2011;7:81-227.
50Bresenham JE. Algorithm for computer control of a digital plotter. Ibm Syst J 1965;4:25-30.
51Francis K, Palsson BO. Effective intercellular communication distances are determined by the relative time constants for cyto/chemokine secretion and diffusion. Proc Natl Acad Sci U S A 1997;94:12258-62.
52Cox DR. Regression models and life-tables. J R Stat Soc Series B Stat Methodol 1972;34:187.
53Nadiarnykh O, LaComb RB, Brewer MA, Campagnola PJ. Alterations of the extracellular matrix in ovarian cancer studied by Second Harmonic Generation imaging microscopy. BMC Cancer 2010;10:94.
54Drifka CR, Eliceiri KW, Weber SM, Kao WJ. A bioengineered heterotypic stroma-cancer microenvironment model to study pancreatic ductal adenocarcinoma. Lab Chip 2013;13:3965-75.
55Hu W, Zhao G, Wang C, Zhang J, Fu L. Nonlinear optical microscopy for histology of fresh normal and cancerous pancreatic tissues. PLoS One 2012;7:e37962.
56Cox G, Kable E, Jones A, Fraser I, Manconi F, Gorrell MD 3-dimensional imaging of collagen using second harmonic generation. J Struct Biol 2003;141:53-62.
57Lacomb R, Nadiarnykh O, Townsend SS, Campagnola PJ. Phase Matching considerations in Second Harmonic Generation from tissues: Effects on emission directionality, conversion efficiency and observed morphology. Opt Commun 2008;281:1823-32.
58Stoller P, Kim BM, Rubenchik AM, Reiser KM, Da Silva LB. Polarization-dependent optical second-harmonic imaging of a rat-tail tendon. J Biomed Opt 2002;7:205-14.
59Kliger DS, Lewis JW, Randall CE. Polarized Light in Optics and Spectroscopy. New York: Adademic Press; 1990.
60Oldenbourg R. Polarization Microscopy with the LC-PolScope. Cold Spring Harbor, New York: Cold Spring Harbor Laboratory Press; 2005.
61Junqueira LC, Bignolas G, Brentani RR. Picrosirius staining plus polarization microscopy, a specific method for collagen detection in tissue sections. Histochem J 1979;11:447-55.
62Whittaker P, Kloner RA, Boughner DR, Pickering JG. Quantitative assessment of myocardial collagen with picrosirius red staining and circularly polarized light. Basic Res Cardiol 1994;89:397-410.
63Rueden CT, Conklin MW, Provenzano PP, Keely PJ, Eliceiri KW. Nonlinear optical microscopy and computational analysis of intrinsic signatures in breast cancer. Conf Proc IEEE Eng Med Biol Soc 2009;2009:4077-80.
64Conklin MW, Provenzano PP, Eliceiri KW, Sullivan R, Keely PJ. Fluorescence lifetime imaging of endogenous fluorophores in histopathology sections reveals differences between normal and tumor epithelium in carcinoma in situ of the breast. Cell Biochem Biophys 2009;53:145-57.
65Wang W, Wyckoff JB, Goswami S, Wang Y, Sidani M, Segall JE, et al. Coordinated regulation of pathways for enhanced cell motility and chemotaxis is conserved in rat and mouse mammary tumors. Cancer Res 2007;67:3505-11.
66Maller O, Hansen KC, Lyons TR, Acerbi I, Weaver VM, Prekeris R, et al. Collagen architecture in pregnancy-induced protection from breast cancer. J Cell Sci 2013;126:4108-10.