Journal of Pathology Informatics

TECHNICAL NOTE
Year
: 2012  |  Volume : 3  |  Issue : 1  |  Page : 23-

The feasibility of using natural language processing to extract clinical information from breast pathology reports


Julliette M Buckley1, Suzanne B Coopey1, John Sharko1, Fernanda Polubriaginof1, Brian Drohan1, Ahmet K Belli1, Elizabeth M. H. Kim1, Judy E Garber2, Barbara L Smith1, Michele A Gadd1, Michelle C Specht1, Constance A Roche1, Thomas M Gudewicz3, Kevin S Hughes1 
1 Department of Surgical Oncology, Massachusetts General Hospital, Boston, Massachusetts, USA
2 Department of Surgical Oncology, Dana Farber Cancer Institute, Boston, Massachusetts, USA
3 Department of Surgical Pathology, Massachusetts General Hospital, Boston, Massachusetts, USA

Correspondence Address:
Kevin S Hughes
Department of Surgical Oncology, Massachusetts General Hospital, Boston, Massachusetts
USA

Objective: The opportunity to integrate clinical decision support systems into clinical practice is limited due to the lack of structured, machine readable data in the current format of the electronic health record. Natural language processing has been designed to convert free text into machine readable data. The aim of the current study was to ascertain the feasibility of using natural language processing to extract clinical information from >76,000 breast pathology reports. Approach and Procedure: Breast pathology reports from three institutions were analyzed using natural language processing software (Clearforest, Waltham, MA) to extract information on a variety of pathologic diagnoses of interest. Data tables were created from the extracted information according to date of surgery, side of surgery, and medical record number. The variety of ways in which each diagnosis could be represented was recorded, as a means of demonstrating the complexity of machine interpretation of free text. Results: There was widespread variation in how pathologists reported common pathologic diagnoses. We report, for example, 124 ways of saying invasive ductal carcinoma and 95 ways of saying invasive lobular carcinoma. There were >4000 ways of saying invasive ductal carcinoma was not present. Natural language processor sensitivity and specificity were 99.1% and 96.5% when compared to expert human coders. Conclusion: We have demonstrated how a large body of free text medical information such as seen in breast pathology reports, can be converted to a machine readable format using natural language processing, and described the inherent complexities of the task.


How to cite this article:
Buckley JM, Coopey SB, Sharko J, Polubriaginof F, Drohan B, Belli AK, Kim EM, Garber JE, Smith BL, Gadd MA, Specht MC, Roche CA, Gudewicz TM, Hughes KS. The feasibility of using natural language processing to extract clinical information from breast pathology reports.J Pathol Inform 2012;3:23-23


How to cite this URL:
Buckley JM, Coopey SB, Sharko J, Polubriaginof F, Drohan B, Belli AK, Kim EM, Garber JE, Smith BL, Gadd MA, Specht MC, Roche CA, Gudewicz TM, Hughes KS. The feasibility of using natural language processing to extract clinical information from breast pathology reports. J Pathol Inform [serial online] 2012 [cited 2019 Dec 9 ];3:23-23
Available from: http://www.jpathinformatics.org/article.asp?issn=2153-3539;year=2012;volume=3;issue=1;spage=23;epage=23;aulast=Buckley;type=0