|J Pathol Inform 2018,
Network analysis of autopsy diagnoses: Insights into the “cause of death” from unbiased disease clustering
Romulo Celli1, Miguel Divo2, Monica Colunga3, Bartolome Celli2, Kisha Anne Mitchell-Richards4
1 Department of Pathology, Yale School of Medicine, New Haven, USA
2 Department of Pulmonary and Critical Care, Brigham and Women's Hospital, Boston, MA, USA
3 Department of Biostatistics, Yale School of Public Health, New Haven, USA
4 Department of Pathology, Greenwich Hospital, Greenwich, CT, USA
|Date of Submission||28-Mar-2018|
|Date of Acceptance||27-Aug-2018|
|Date of Web Publication||09-Oct-2018|
Dr. Romulo Celli
Department of Pathology, Yale School of Medicine, New Haven, CT
Source of Support: None, Conflict of Interest: None
| Abstract|| |
Background: Autopsies usually serve to inform specific “causes of death” and associated mechanisms. However, multiple diseases can co-exist and interact leading to a final demise. We approached autopsy-produced data using network analysis in an unbiased fashion to inform about interaction among different diseases and identify possible targets of system-level health care. Methods: Reports of 261 full autopsies from one institution between 2011 and 2013 were reviewed. Comorbidities were recorded and their Spearman's association coefficients were calculated. Highly associated comorbidities (P < 0.01) were selected to construct a network in which each disease is represented by a node, and each link between the nodes represents significant co-occurrence. Results: The network comprised 140 diseases connected by 419 links. The mean number of connections per node was 6. The most highly connected nodes (“hubs”) represented infectious processes, whereas less connected nodes represented neoplasms and other chronic diseases. Eight clusters of biologically plausible associated diseases were identified. Conclusions: There is an unbiased relationship among autopsy-identified diseases. There were “hubs” (primarily infectious) with significantly more associations than others that could represent obligatory or important modulators of the final expression of other diseases. Clusters of co-occurring diseases, or “modules,” suggest the presence of clinically relevant presentations of pathobiologically related entities which are until now considered individual diseases. These modules may occur together prior to death and be amenable to interventions during life.
Keywords: Autopsy network, autopsy pathology, network analysis, pathomorbidome
|How to cite this article:|
Celli R, Divo M, Colunga M, Celli B, Mitchell-Richards KA. Network analysis of autopsy diagnoses: Insights into the “cause of death” from unbiased disease clustering. J Pathol Inform 2018;9:35
|How to cite this URL:|
Celli R, Divo M, Colunga M, Celli B, Mitchell-Richards KA. Network analysis of autopsy diagnoses: Insights into the “cause of death” from unbiased disease clustering. J Pathol Inform [serial online] 2018 [cited 2018 Dec 16];9:35. Available from: http://www.jpathinformatics.org/text.asp?2018/9/1/35/242914
| Introduction|| |
Until recently, postmortem examination was regarded as an indispensable source of scientific information, critical for the advancement of medical knowledge, and an invaluable tool to sharpen clinical acumen. Autopsy rates continue a well-documented fall,,,, and it has become increasingly difficult for pathology departments to engage clinicians and hospital administrators with meaningful data that provide value to the medical field.
Network science has emerged as a tool focused on the understanding of complex systems by mapping the interconnectivity of diverse data, used in numerous disciplines including sociology, economics, and more recently, health. For example, cell biologists have described the interactions of large numbers of intracellular proteins using network graphs. Several “phenotypic disease networks” have recently been published based on large, population-level datasets.,,, These networks show diseases represented as spherical “nodes” connected to each other by “edges,” lines which represent statistically significant co-occurrence associations. This hypothesis-free approach simultaneously demonstrates associations among numerous data elements (in this case, diseases), allowing for the identification of associations which may not have been expected a priori. The patterns of disease associations may suggest common biological pathways and therapeutic targets. Networks may also provide a platform for the creation of predictive models for disease development.
The diagnoses obtained at autopsies are considered gold standard data based on the findings of anatomic evidence of disease, in the context of other biochemical, hematologic, and/or microbiologic evidence. Further, autopsies frequently detect subclinical disease that otherwise may not be reported or identified in comorbidity studies in the living patient, but which may provide insight into pathophysiologic derangements.
We hypothesized that autopsies provide a unique source of high-quality data which may be used to explore disease connectivity using network analysis. We compared the autopsy disease network to phenotypic disease networks described in living patients. In addition, we related the findings to relevant questions in the context of the current clinical environment.
| Methods|| |
After receiving approval from our Institutional Review Board, consecutive autopsy reports of all patients who underwent full examination at the Pathology Department of Yale–New Haven Hospital between January 1, 2012, and December 31, 2013, were retrospectively reviewed. A full autopsy was defined as the examination of at least three of the following body cavities: cranium, thorax, abdomen, and pelvis in individuals older than 18 years of age. Autopsies were routinely requested on in-hospital deaths and permission was obtained from the appropriate family member. Being a teaching hospital, residents in anatomic pathology were intimately involved in every facet of the autopsy procedure, for each case. Their responsibilities included, but were not limited to, confirming family consent, corpse evisceration, organ dissection, and drafting the final autopsy report documents. Autopsies were performed free of cost to the patient and without consideration of race, religion, or social status. Any cases that were transferred to the jurisdiction of the Office of the Chief Medical Examiner were excluded from the study.
All autopsy results were reported in the electronic record as a final anatomic diagnosis (FAD) which contains all diagnostic entities identified by the pathologist supported by anatomic, biochemical, microbiologic, or hematologic evidence. The FAD includes a cause of death statement, in which an opinion on the underlying cause of death is rendered. FAD reports are accompanied by a clinicopathological summary, written in prose that correlates the findings with the patient's clinical picture. Reports from all 12 attending autopsy pathologists at our institution were included.
Diagnostic entities (henceforth interchangeably referred to as diseases) were extracted from each FAD and tabulated systematically using Harrison's Textbook of Internal Medicine as a reference for categorizing diseases. The resulting dataset was independently reviewed by the autopsy director (KMR, forensic and surgical pathologist) and a pulmonary/critical care physician (MD).
Demographics including age, gender, body mass index (BMI) and self-reported race were collected at the time of case review.
Statistics and network analysis
Continuous data were presented as mean and standard deviation. Among groups, comparisons were made using Student's t-test or Fisher's exact test according to the type of variables. For the network analysis, a total of 140 distinct diseases were identified as binary variables (presence of/absence of) and from them, we calculated their prevalence. Each disease was represented by a specific node. Each node demonstrates two attributes: the diameter, which is proportional to the prevalence of the entity in our cohort, and, the color, which represents the organ system or general disease category to which the entity belongs (i.e., cardiovascular, infectious, etc.). Significant association between diseases (nodes) was determined using Spearman's rank correlation coefficient for every pair of co-occurring diseases. Any correlation with P < 0.01 was selected for the construction of the network and is represented as an interconnecting line or “edge-” connecting “nodes.” A conservative P < 0.01 threshold was chosen to correct for the family-wise error rate of testing numerous hypotheses.
The spatial layout of the network, which we defined as the “The Autopsy Multimorbidity Network,” or “Pathomorbidome,” was determined by a mathematical algorithm based on two variables: the node size (disease prevalence) and the number of edges (connectivity) per node. The result of this algorithm is that nodes with a higher number of connections lie toward the center of the graph. Gephi Graph visualization and Manipulation software V-0.8.2 beta (open source) was used to create the network. The methodology followed was previously reported by Divo et al. All statistical analyses were performed using SAS JMP Pro® software, version 11.0 (SAS Institute, Cary, NC).
In network analysis, degree refers to the number of edges connecting a particular node. Using our comorbidity network, we identified the diseases with the highest and lowest degrees of connectivity.
Computational analysis of the network identifies the existence of modules, or clusters of nodes, which are disease clusters aggregated by the unbiased statistical strength of their associations. The presence of modules was determined using the algorithm proposed by Blondel et al. included in Gephi. This algorithm is based on a network property called modularity, defined as the difference between the number of edges found within a given group of nodes and the expected number if the edges between the same set of nodes were distributed at random. In the “Pathomorbidome,” modules represent clusters of diseases occurring together in a pattern that exceeds that expected by chance alone.
| Results|| |
A total of 508 autopsies were performed between 2012 and 2013 at Yale–New Haven Hospital. We excluded 247 cases based on age (<18 years) or limited autopsy examinations (i.e., brain only). The patients' baseline clinical characteristics of the 261 autopsies included in the study are summarized in [Table 1]. The mean age at the time of death was 62 years (±15 standard deviation [SD]), and 45% of patients were female. The mean number of comorbidities per individual was 5.9 (±2.8 SD) without differences between sex (5.7 for females, 6.0 for males, P = 0.381). Men had a significantly higher rate of the following nongender-specific diseases: aortic aneurysm, B-cell lymphoma, and pneumonia. Women had an increased rate of collagen vascular diseases (8.6%–2.1%, P = 0.021). The distribution of primary causes of death by gender is summarized in [Table 2].
There was an increase in the number of diseases per patient with increasing age, with the expected significantly higher rate of several pathologies (benign prostatic hyperplasia [BPH], hypertension, and chronic obstructive pulmonary disease) in patients over 70 years old. Among patients younger than 50 years old, there was a higher prevalence only of cardiomyopathy.
Autopsy Multimorbidity Network
The Autopsy Multimorbidity Network is comprised of 140 nodes connected by a total of 419 links represented in a force-directed layout with heavily connected nodes placed toward the center [Figure 1]. These highly connected nodes are primarily represented by infectious processes. The mean degree or number of connections per disease is 5.5 (±3.1 SD), and [Table 3] summarizes the twenty most connected and twenty least connected nodes.
|Figure 1: The Autopsy Multimorbidity Network, “Pathomorbidome”. 140 diseases/nodes are connected by a total of 419 edges. Art: Artery, BPH: Benign prostatic hyperplasia, CHF: Congestive heart failure, DIC/TTP: Disseminated intravascular coagulation/thrombotic thrombocytopenic purpura, Dis: Disease, DVT/PE: Deep venous thrombosis/pulmonary embolism, GI: Gastrointestinal, GIST: Gastrointestinal stromal tumor, HCV: Hepatitis C infection, HPV: Human papillomavirus infection, HTN: Hypertension, IBD: Inflammatory bowel disease, ITP: Immune thrombocytopenic purpura, MGUS: Monoclonal gammopathy of unknown significance, NASH: Nonalcoholic steatohepatitis, NSCLC: Non-small cell lung carcinoma, NOS: Not otherwise specified, PVD: Peripheral vascular disease|
Click here to view
Identification of disease modules
The “Pathomorbidome” contains eight distinct modules of highly interconnected diseases detected in the structure of the network [Figure 2] labeled sequentially by decreasing number of component nodes. Module 1 is composed of 27 nodes connected by 39 links and highlights the relationship of liver cirrhosis with viral hepatitis and alcoholic liver disease. Module 2 is comprised of 21 nodes connected by 27 edges and shows associations among numerous cardiovascular diseases. Module 3, comprised of 13 nodes and 23 edges, demonstrates the relationship of malignancy and hypercoagulability (deep venous thrombosis/pulmonary embolism [DVT/PE]), as well as numerous diseases associated with drug abuse. Module 4 is comprised of 13 nodes and 23 edges, the major components of gastrointestinal origin, including perforation. The remaining modules also show interesting associations, some predictable, some less so [Figure 2].
|Figure 2: Disease cluster “Modules.” The modules represent groups of diseases which tend to co-occur. They are labeled sequentially by decreasing number of component diseases|
Click here to view
| Conclusions|| |
Using network analysis to explore the unbiased relationships among 140 different diseases identified at autopsy, this study had three major findings: first, the presence of certain “hub” diseases (primarily infectious) that have significantly more associations than others seems to represent important modulators of the final common expression of other diseases. These “hubs” could be targeted for screening or further intervention; second, diseases clustered into modules, suggesting the potential for clinically relevant syndromic presentations of pathobiologically related entities that are currently considered individual diseases; and third, the use of data analysis using unbiased methods provides added value to the interpretation of the rich autopsy data obtained from the “complex” biology that characterizes death.
Dissecting the “cause” of death and its clinical implications
The first important finding relates to the connectivity of diseases. Examination of the network [Figure 1] reveals numerous associations, some well described in the medical literature and some less so. Identification of known relationships supports the biological plausibility of the methodology behind this analysis and confers potential credibility for the less recognizable associations. Close examination of the diseases at opposite ends of the degree distribution or connectivity [Table 3] reveals a pattern. There exists an overrepresentation of infectious conditions among the nodes, with the highest connectivity (8/20 = 40%) primarily located at the center of the network. There is only one neoplastic entity (gastrointestinal stromal tumor [GIST]) among the twenty diseases with highest degree (highest connectivity). Neoplastic entities account for 30% (7/20) of the twenty least connected diseases, with only two infectious entities represented there (osteomyelitis and human papillomavirus infection). Based on these findings, we hypothesized that the least connected nodes tend to represent diseases with few known predisposing conditions. The highly connected “hubs” are processes that are caused by and simultaneously cause multiple other pathologies and should be special targets for interventions aimed at interrupting their ripple consequences. In keeping with this theory, the top causes of death [Table 2] are nearly all chronic conditions, while “hubs” appear to be the subacute-to-acute complications of those diseases that may truly be the immediate causes of death. This would make sense in a quaternary medical center, where patients with chronic conditions such as cancer often succumb to infectious complications of the underlying disease or treatment.
Based on its connectivity pattern, we can hypothesize that perturbation of networks could be achieved by targeting “hub” diseases, an epidemiologic strategy for which there is precedence. Independent authors have established the characteristics of the sexually transmitted human immunodeficiency virus (HIV) network,,, where nodes represent carriers and edges represent disease transmission. Trewick created an algorithm to predict the effect of targeting “hubs” in the HIV network to deter the rate of viral spread. Barabási et al. also argued that targeting the treatable “hub” diseases would mitigate their disproportionate effect on the system.
Sepsis, urinary tract infection, and multiple organ infections are all “hub” diseases and are all theoretically preventable and treatable. Other highly connected hubs include DVT/PE and the hypercoagulable state. These acute “hubs” could be targeted for selective monitoring across hospital systems since they seem to offer the best chance of preventing a poor outcome. Other diseases that are also “hubs” such as HIV/acquired immunodeficiency syndrome or liver cirrhosis are all highly prevalent, highly connected, and preventable diseases. From an epidemiological perspective, increased targeting of these diseases may increase the effectiveness of system-level health care. Eventually, network representation of comorbidity data may help hospital administrations visualize the global landscape of human illness at their respective institutions. The impact of system-level interventions could be measured qualitatively in changes seen in the network topology over time.
The second important finding relates to the interconnected grouping of diseases clustering together as independent modules. Module 1 shows a cluster of diseases with liver-related morbidity. This module bears out the intimate association of alcoholic liver disease and hepatitis C infection to liver cirrhosis. Other modules reveal other connections that may relate to epidemiology more than pathophysiology, such as Module 2, which clusters cardiovascular diseases with BPH, likely reflecting the highly prevalent co-occurrence of these conditions in elderly males. We interpreted these logical associations as validation of the method, thereby provoking new hypotheses about the reason for the unexpected associations observed in other modules. The relationship in Module 2, for example, raises the question of whether Meckel diverticulum has a role to play in infection. Similarly, Module 5 demonstrates the associations of inflammatory bowel disease (IBD), primary sclerosing cholangitis, and cholangiocarcinoma, and also raises the specter of an association between IBD and endometriosis. In this context, close examination of the different clusters may offer new avenues for research and intervention. Just as this unbiased analysis confirmed the well-established relationship between liver cirrhosis and gastrointestinal bleed,,, the integrative nature of network graphs suggests a potentially credible association of GIST and pancreatic carcinoma, or adrenal atrophy and coagulopathy. While many of these associations are likely spurious, the possibility of true correlations should be explored with further study. These early findings should prompt further investigation about other possible links that develop using this unbiased clustering of diseases.
Relating autopsy findings to the living patient
A comparison of our findings with that of studies of comorbidities in the living patient shows some similarities and also some differences. The presence of modules that closely resemble those reported by Divo et al., Barabási et al., and Hidalgo et al. among others supports the validity of our findings. In those studies, the cardiovascular, gastrointestinal, and substance abuse modules were very similar to those we describe. Importantly, none of those studies noted the presence of the “infectious hubs” that we report. While all of those studies were completed in patients in a “stable” state, ours was based on autopsies that were completed on patients in a quaternary medical center, where more patients with severe infections will be encountered. Our findings suggest that increased attention to infection surveillance and targeted monitoring of patients with preventable diseases such as liver cirrhosis may be one cost-effective method to perturb the network and potentially prevent or decrease mortality.
Finally, this study shows that novel approaches of large data analysis may add value to the list of diagnoses obtained at autopsies. To date, this long list has been seen as diseases that generate “noise” as to what should be a single cause of death. The unbiased clustering of diseases provides a new dimension to the meaning of that data. The opportunity exists to evaluate the quality of care provided by a system, by the change in the topography of “Pathomorbidomes” over time, once “hubs” are identified and intervened upon.
Our study has some limitations. The lack of standardization in reporting is a common issue in comorbidity-based studies. Standardized autopsy reporting does not necessarily lend itself to synoptic reports as used in reporting neoplasia; however, greater efforts to implement standardization are warranted. In addition, we analyzed reports from 12 attending pathologists, each with different subspecialty and research interests and in whom inherent biases may play a role in diagnosing certain conditions. As mitigation, we implemented standard diagnostic terms as used in Harrison's Principles of Internal Medicine, placing each individual diagnostic entity encountered in the autopsy reports into one of these categories. In addition, three independent physicians reviewed all disease categories prior to analysis. While imperfect, our data were extracted and screened by physicians rather than the automated methodology using diagnostic codes sometimes utilized in other epidemiological studies. Our cohort is predominantly Caucasian, while Hispanics (15%) and Asians (1%) are underrepresented. As such, the cohort imperfectly represents the evolving demographics of the United States of America, and the findings may not necessarily be extrapolated to some settings. Finally, this is a single-center study from a quaternary medical center, where the motivation to seek autopsy on patients with the most complex clinical problems introduces selection bias. We excluded cases from the office of the chief medical examiner because we wanted to focus on the causes of natural death, and felt that the highest quality of diagnostic data would be procured from an academic medical center. Future studies could incorporate data from multiple medical environments and regions, both to increase power and generalizability.
Using network analysis of autopsy data, this study demonstrated that the occurrence of “hubs” may indicate highly influential diseases and/or conditions that may represent targets of interventions. Using these data, cases with seemingly unusually related diseases should be interrogated to elucidate the underlying pathophysiologic mechanisms. We propose that network analysis is a useful technique to apply to autopsy data at hospitals in order to provide meaningful quality improvement metrics and enhance general medical knowledge and patient care.
Financial support and sponsorship
Conflicts of interest
There are no conflicts of interest.
| References|| |
King LS, Meehan MC. A history of the autopsy. A review. Am J Pathol 1973;73:514-44.
Rosenbaum GE, Burns J, Johnson J, Mitchell C, Robinson M, Truog RD, et al.
Autopsy consent practice at US teaching hospitals: Results of a national survey. Arch Intern Med 2000;160:374-80.
Devers KJ. The changing role of the autopsy: A social environmental perspective. Hum Pathol 1990;21:145-53.
Oluwasola OA, Fawole OI, Otegbayo AJ, Ogun GO, Adebamowo CA, Bamigboye AE, et al.
The autopsy: Knowledge, attitude, and perceptions of doctors and relatives of the deceased. Arch Pathol Lab Med 2009;133:78-82.
Keys E, Brownlee C, Ruff M, Baxter C, Steele L, Green FH, et al.
How well do we communicate autopsy findings to next of kin? Arch Pathol Lab Med 2008;132:66-71.
Barabási AL, Oltvai ZN. Network biology: Understanding the cell's functional organization. Nat Rev Genet 2004;5:101-13.
Hidalgo CA, Blumm N, Barabási AL, Christakis NA. A dynamic network approach for the study of human phenotypes. PLoS Comput Biol 2009;5:e1000353.
Barabási AL, Gulbahce N, Loscalzo J. Network medicine: A network-based approach to human disease. Nat Rev Genet 2011;12:56-68.
Goh KI, Choi IG. Exploring the human diseasome: The human disease network. Brief Funct Genomics 2012;11:533-42.
Divo MJ, Casanova C, Marin JM, Pinto-Plata VM, de-Torres JP, Zulueta JJ, et al.
COPD comorbidities network. Eur Respir J 2015;46:640-50.
Longo D, Fauci A, Kasper D, Hauser S, Jameson LJ, editors. Harrison's Principles of Internal Medicine. 18th
ed., Vol. 1 and 2. New York: The McGraw-Hill Companies, Inc.; 2011.
Jacomy M, Venturini T, Heymann S, Bastian M. ForceAtlas2, a continuous graph layout algorithm for handy network visualization designed for the gephi software. PLoS One 2014;9:e98679.
Bastian M, Heymann S, Jacomy M. Gephi: An Open Source Software for Exploring and Manipulating Networks. New York: International ICWSM Conference; 2009. p. 361-2.
Blondel VD, Guillaume J, Lambiotte R, Lefebvre E. Fast unfolding of community hierarchies in large networks. Networks 2008; J. Stat. Mech 2008. p. 10008.
Newman ME. Modularity and community structure in networks. Proc Natl Acad Sci U S A 2006;103:8577-82.
Latora V, Nyamba A, Simpore J, Sylvette B, Diane S, Sylvére B, et al.
Network of sexual contacts and sexually transmitted HIV infection in Burkina Faso. J Med Virol 2006;78:724-9.
Schneeberger A, Mercer CH, Gregson SA, Ferguson NM, Nyamukapa CA, Anderson RM, et al.
Scale-free networks and sexually transmitted diseases: A description of observed patterns of sexual contacts in Britain and Zimbabwe. Sex Transm Dis 2004;31:380-7.
Dezso Z, Barabási AL. Halting viruses in scale-free networks. Phys Rev E Stat Nonlin Soft Matter Phys 2002;65:055103.
Amitrano L, Guardascione MA, Brancaccio V, Balzano A. Coagulation disorders in liver disease. Semin Liver Dis 2002;22:83-96.
Biecker E. Portal hypertension and gastrointestinal bleeding: Diagnosis, prevention and management. World J Gastroenterol 2013;19:5035-50.
Odelowo OO, Smoot DT, Kim K. Upper gastrointestinal bleeding in patients with liver cirrhosis. J Natl Med Assoc 2002;94:712-5.
[Figure 1], [Figure 2]
[Table 1], [Table 2], [Table 3]