Journal of Pathology Informatics Journal of Pathology Informatics
Contact us | Home | Login   |  Users Online: 205  Print this pageEmail this pageSmall font sizeDefault font sizeIncrease font size 

Year : 2013  |  Volume : 4  |  Issue : 1  |  Page : 20

Automated extraction of precise protein expression patterns in lymphoma by text mining abstracts of immunohistochemical studies

1 MU Informatics Institute, University of Missouri, Columbia, USA
2 MU Informatics Institute; Health Management and Informatics, University of Missouri, Columbia, USA
3 MU Informatics Institute; Department of Pathology and Anatomical Sciences, University of Missouri, Columbia, USA

Correspondence Address:
Gerald L Arthur
MU Informatics Institute; Department of Pathology and Anatomical Sciences, University of Missouri, Columbia
Login to access the Email id

Source of Support: None, Conflict of Interest: None

DOI: 10.4103/2153-3539.115880

Rights and Permissions

Background: In general, surgical pathology reviews report protein expression by tumors in a semi-quantitative manner, that is, -, -/+, +/-, +. At the same time, the experimental pathology literature provides multiple examples of precise expression levels determined by immunohistochemical (IHC) tissue examination of populations of tumors. Natural language processing (NLP) techniques enable the automated extraction of such information through text mining. We propose establishing a database linking quantitative protein expression levels with specific tumor classifications through NLP. Materials and Methods: Our method takes advantage of typical forms of representing experimental findings in terms of percentages of protein expression manifest by the tumor population under study. Characteristically, percentages are represented straightforwardly with the % symbol or as the number of positive findings of the total population. Such text is readily recognized using regular expressions and templates permitting extraction of sentences containing these forms for further analysis using grammatical structures and rule-based algorithms. Results: Our pilot study is limited to the extraction of such information related to lymphomas. We achieved a satisfactory level of retrieval as reflected in scores of 69.91% precision and 57.25% recall with an F-score of 62.95%. In addition, we demonstrate the utility of a web-based curation tool for confirming and correcting our findings. Conclusions: The experimental pathology literature represents a rich source of pathobiological information, which has been relatively underutilized. There has been a combinatorial explosion of knowledge within the pathology domain as represented by increasing numbers of immunophenotypes and disease subclassifications. NLP techniques support practical text mining techniques for extracting this knowledge and organizing it in forms appropriate for pathology decision support systems.

Print this article     Email this article
 Next article
 Previous article
 Table of Contents

 Similar in PUBMED
   Search Pubmed for
   Search in Google Scholar for
 Related articles
 Citation Manager
 Access Statistics
 Reader Comments
 Email Alert *
 Add to My List *
 * Requires registration (Free)

 Article Access Statistics
    PDF Downloaded647    
    Comments [Add]    
    Cited by others 2    

Recommend this journal