Journal of Pathology Informatics Journal of Pathology Informatics
Contact us | Home | Login   |  Users Online: 326  Print this pageEmail this pageSmall font sizeDefault font sizeIncrease font size 


ORIGINAL ARTICLE
Year : 2016  |  Volume : 7  |  Issue : 1  |  Page : 46

The utility of including pathology reports in improving the computational identification of patients


1 Department of Research and Development, Research Information Solutions and Innovation, Nationwide Children's Hospital, 575 Children's Crossroad, Columbus, Ohio 43215, USA
2 Department of Gastroenterology, Nationwide Children's Hospital, 700 Children's Dr, Columbus, Ohio 43205, USA

Correspondence Address:
Wei Chen
Department of Research and Development, Research Information Solutions and Innovation, Nationwide Children's Hospital, 575 Children's Crossroad, Columbus, Ohio 43215
USA
Login to access the Email id

Source of Support: None, Conflict of Interest: None


DOI: 10.4103/2153-3539.194838

Rights and Permissions

Background: Celiac disease (CD) is a common autoimmune disorder. Efficient identification of patients may improve chronic management of the disease. Prior studies have shown searching International Classification of Diseases-9 (ICD-9) codes alone is inaccurate for identifying patients with CD. In this study, we developed automated classification algorithms leveraging pathology reports and other clinical data in Electronic Health Records (EHRs) to refine the subset population preselected using ICD-9 code (579.0). Materials and Methods: EHRs were searched for established ICD-9 code (579.0) suggesting CD, based on which an initial identification of cases was obtained. In addition, laboratory results for tissue transglutaminse were extracted. Using natural language processing we analyzed pathology reports from upper endoscopy. Twelve machine learning classifiers using different combinations of variables related to ICD-9 CD status, laboratory result status, and pathology reports were experimented to find the best possible CD classifier. Ten-fold cross-validation was used to assess the results. Results: A total of 1498 patient records were used including 363 confirmed cases and 1135 false positive cases that served as controls. Logistic model based on both clinical and pathology report features produced the best results: Kappa of 0.78, F1 of 0.92, and area under the curve (AUC) of 0.94, whereas in contrast using ICD-9 only generated poor results: Kappa of 0.28, F1 of 0.75, and AUC of 0.63. Conclusion: Our automated classification system presented an efficient and reliable way to improve the performance of CD patient identification.


[FULL TEXT] [PDF]*
Print this article     Email this article
 Next article
 Previous article
 Table of Contents

 Similar in PUBMED
   Search Pubmed for
   Search in Google Scholar for
 Related articles
 Citation Manager
 Access Statistics
 Reader Comments
 Email Alert *
 Add to My List *
 * Requires registration (Free)
 

 Article Access Statistics
    Viewed412    
    Printed2    
    Emailed0    
    PDF Downloaded101    
    Comments [Add]    

Recommend this journal