Journal of Pathology Informatics Journal of Pathology Informatics
Contact us | Home | Login   |  Users Online: 456  Print this pageEmail this pageSmall font sizeDefault font sizeIncrease font size 

Year : 2015  |  Volume : 6  |  Issue : 1  |  Page : 51

Support patient search on pathology reports with interactive online learning based data extraction

1 Department of Mathematics and Computer Science, Emory University, Atlanta, GA 30322, USA
2 Department of Pathology and Laboratory Medicine, School of Medicine, Emory University, Atlanta, GA 30322, USA
3 Department of Biomedical Informatics and Computer Science, Stony Brook University, Stony Brook, NY 11794, USA

Correspondence Address:
Fusheng Wang
Department of Biomedical Informatics and Computer Science, Stony Brook University, Stony Brook, NY 11794
James J Lu
Department of Mathematics and Computer Science, Emory University, Atlanta, GA 30322
Login to access the Email id

Source of Support: None, Conflict of Interest: None

DOI: 10.4103/2153-3539.166012

Rights and Permissions

Background: Structural reporting enables semantic understanding and prompt retrieval of clinical findings about patients. While synoptic pathology reporting provides templates for data entries, information in pathology reports remains primarily in narrative free text form. Extracting data of interest from narrative pathology reports could significantly improve the representation of the information and enable complex structured queries. However, manual extraction is tedious and error-prone, and automated tools are often constructed with a fixed training dataset and not easily adaptable. Our goal is to extract data from pathology reports to support advanced patient search with a highly adaptable semi-automated data extraction system, which can adjust and self-improve by learning from a user's interaction with minimal human effort. Methods : We have developed an online machine learning based information extraction system called IDEAL-X. With its graphical user interface, the system's data extraction engine automatically annotates values for users to review upon loading each report text. The system analyzes users' corrections regarding these annotations with online machine learning, and incrementally enhances and refines the learning model as reports are processed. The system also takes advantage of customized controlled vocabularies, which can be adaptively refined during the online learning process to further assist the data extraction. As the accuracy of automatic annotation improves overtime, the effort of human annotation is gradually reduced. After all reports are processed, a built-in query engine can be applied to conveniently define queries based on extracted structured data. Results: We have evaluated the system with a dataset of anatomic pathology reports from 50 patients. Extracted data elements include demographical data, diagnosis, genetic marker, and procedure. The system achieves F-1 scores of around 95% for the majority of tests. Conclusions: Extracting data from pathology reports could enable more accurate knowledge to support biomedical research and clinical diagnosis. IDEAL-X provides a bridge that takes advantage of online machine learning based data extraction and the knowledge from human's feedback. By combining iterative online learning and adaptive controlled vocabularies, IDEAL-X can deliver highly adaptive and accurate data extraction to support patient search.

Print this article     Email this article
 Next article
 Previous article
 Table of Contents

 Similar in PUBMED
   Search Pubmed for
   Search in Google Scholar for
 Related articles
 Citation Manager
 Access Statistics
 Reader Comments
 Email Alert *
 Add to My List *
 * Requires registration (Free)

 Article Access Statistics
    PDF Downloaded275    
    Comments [Add]    
    Cited by others 2    

Recommend this journal