Journal of Pathology Informatics

RESEARCH ARTICLE
Year
: 2010  |  Volume : 1  |  Issue : 1  |  Page : 24-

Automated ancillary cancer history classification for mesothelioma patients from free-text clinical reports


Richard A Wilson1, Wendy W Chapman1, Shawn J DeFries2, Michael J Becich1, Brian E Chapman1 
1 Department of Biomedical Informatics, University of Pittsburgh, 200 Meyran Avenue, Pittsburgh, PA, USA
2 Keller Army Community Hospital, 900 Washington Road, West Point, NY, USA

Correspondence Address:
Richard A Wilson
Department of Biomedical Informatics, University of Pittsburgh, 200 Meyran Avenue, Pittsburgh, PA
USA

Background: Clinical records are often unstructured, free-text documents that create information extraction challenges and costs. Healthcare delivery and research organizations, such as the National Mesothelioma Virtual Bank, require the aggregation of both structured and unstructured data types. Natural language processing offers techniques for automatically extracting information from unstructured, free-text documents. Methods: Five hundred and eight history and physical reports from mesothelioma patients were split into development (208) and test sets (300). A reference standard was developed and each report was annotated by experts with regard to the patient俟Q製 personal history of ancillary cancer and family history of any cancer. The Hx application was developed to process reports, extract relevant features, perform reference resolution and classify them with regard to cancer history. Two methods, Dynamic-Window and ConText, for extracting information were evaluated. Hx俟Q製 classification responses using each of the two methods were measured against the reference standard. The average Cohen俟Q製 weighted kappa served as the human benchmark in evaluating the system. Results: Hx had a high overall accuracy, with each method, scoring 96.2%. F-measures using the Dynamic-Window and ConText methods were 91.8% and 91.6%, which were comparable to the human benchmark of 92.8%. For the personal history classification, Dynamic-Window scored highest with 89.2% and for the family history classification, ConText scored highest with 97.6%, in which both methods were comparable to the human benchmark of 88.3% and 97.2%, respectively. Conclusion: We evaluated an automated application俟Q製 performance in classifying a mesothelioma patient俟Q製 personal and family history of cancer from clinical reports. To do so, the Hx application must process reports, identify cancer concepts, distinguish the known mesothelioma from ancillary cancers, recognize negation, perform reference resolution and determine the experiencer. Results indicated that both information extraction methods tested were dependant on the domain-specific lexicon and negation extraction. We showed that the more general method, ConText, performed as well as our task-specific method. Although Dynamic-Window could be modified to retrieve other concepts, ConText is more robust and performs better on inconclusive concepts. Hx could greatly improve and expedite the process of extracting data from free-text, clinical records for a variety of research or healthcare delivery organizations.


How to cite this article:
Wilson RA, Chapman WW, DeFries SJ, Becich MJ, Chapman BE. Automated ancillary cancer history classification for mesothelioma patients from free-text clinical reports.J Pathol Inform 2010;1:24-24


How to cite this URL:
Wilson RA, Chapman WW, DeFries SJ, Becich MJ, Chapman BE. Automated ancillary cancer history classification for mesothelioma patients from free-text clinical reports. J Pathol Inform [serial online] 2010 [cited 2019 Aug 23 ];1:24-24
Available from: http://www.jpathinformatics.org/article.asp?issn=2153-3539;year=2010;volume=1;issue=1;spage=24;epage=24;aulast=Wilson;type=0