A multisite validation of whole slide imaging for primary diagnosis using standardized data collection and analysis
Katy Wack1, Laura Drogowski2, Murray Treloar3, Andrew Evans4, Jonhan Ho5, Anil Parwani6, Michael C Montalto7
1 Western Oncolytics, LLC, Pittsburgh, PA 15238; Work peformed while at Omnyx, LLC. Pittsburgh, PA 15222, USA 2 Work peformed while at Omnyx, LLC. Pittsburgh, PA 15222, USA 3 Dynacare, Bowmanville, Ontario L1C 3K5, Canada 4 University Health Network, Toronto General Hospital, Toronto, Ontario M5G 2C4, Canada 5 Department of Dermatology, University of Pittsburgh, Pittsburgh, PA 15213, USA 6 The Ohio State University Wexner Medical Center, Columbus, OH 43210, USA 7 Work peformed while at Omnyx, LLC. Pittsburgh, PA 15222; Department of Translational Medicine, Bristol-Myers Squibb, etc. Princeton, NJ 08543, USA
Correspondence Address:
Katy Wack Western Oncolytics, LLC, Pittsburgh, PA 15238; Work peformed while at Omnyx, LLC. Pittsburgh, PA 15222 USA
 Source of Support: None, Conflict of Interest: None  | Check |
DOI: 10.4103/2153-3539.194841
|
Context: Text-based reporting and manual arbitration for whole slide imaging (WSI) validation studies are labor intensive and do not allow for consistent, scalable, and repeatable data collection or analysis. Objective: The objective of this study was to establish a method of data capture and analysis using standardized codified checklists and predetermined synoptic discordance tables and to use these methods in a pilot multisite validation study. Methods and Study Design: Fifteen case report form checklists were generated from the College of American Pathology cancer protocols. Prior to data collection, all hypothetical pairwise comparisons were generated, and a level of harm was determined for each possible discordance. Four sites with four pathologists each generated 264 independent reads of 33 cases. Preestablished discordance tables were applied to determine site by site and pooled accuracy, intrareader/intramodality, and interreader intramodality error rates. Results: Over 10,000 hypothetical pairwise comparisons were evaluated and assigned harm in discordance tables. The average difference in error rates between WSI and glass, as compared to ground truth, was 0.75% with a lower bound of 3.23% (95% confidence interval). Major discordances occurred on challenging cases, regardless of modality. The average inter-reader agreement across sites for glass was 76.5% (weighted kappa of 0.68) and for digital it was 79.1% (weighted kappa of 0.72). Conclusion: These results demonstrate the feasibility and utility of employing standardized synoptic checklists and predetermined discordance tables to gather consistent, comprehensive diagnostic data for WSI validation studies. This method of data capture and analysis can be applied in large-scale multisite WSI validations. |