

TECHNICAL NOTE 



J Pathol Inform 2011,
2:52 
An opensource software program for performing Bonferroni and related corrections for multiple comparisons
Kyle Lesack^{1}, Christopher Naugler^{2}
^{1} Faculty of Medicine, Bachelor of Health Sciences Program, Room G503, O'Brien Centre for the BHSc, 3330 Hospital Drive N.W. Calgary, Alberta T2N 4N1, 2, Canada ^{2} Departments of Pathology and Laboratory Medicine, University of Calgary and Calgary Laboratory Services, C414, Diagnostic and Scientific Centre, 9, 3535 Research Road NW, Calgary AB Canada T2L 2K8, Canada
Date of Submission  07Sep2011 
Date of Acceptance  18Nov2011 
Date of Web Publication  26Dec2011 
Correspondence Address: Christopher Naugler Departments of Pathology and Laboratory Medicine, University of Calgary and Calgary Laboratory Services, C414, Diagnostic and Scientific Centre, 9, 3535 Research Road NW, Calgary AB Canada T2L 2K8 Canada
Source of Support: None, Conflict of Interest: None  Check 
DOI: 10.4103/21533539.91130
Abstract   
Increased type I error resulting from multiple statistical comparisons remains a common problem in the scientific literature. This may result in the reporting and promulgation of spurious findings. One approach to this problem is to correct groups of Pvalues for "familywide significance" using a Bonferroni correction or the less conservative BonferroniHolm correction or to correct for the "false discovery rate" with a BenjaminiHochberg correction. Although several solutions are available for performing this correction through commercially available software there are no widely available easy to use open source programs to perform these calculations. In this paper we present an open source program written in Python 3.2 that performs calculations for standard Bonferroni, BonferroniHolm and BenjaminiHochberg corrections. Keywords: Bonferroni correction, software program, type I error
How to cite this article: Lesack K, Naugler C. An opensource software program for performing Bonferroni and related corrections for multiple comparisons. J Pathol Inform 2011;2:52 
Background   
When multiple hypotheses are tested in a single experiment, the risk of type I error is increased and with it the risk of promulgating spurious "significant" findings. ^{[1],[2],[3]} The likelihood of obtaining a false positive result increases proportional to the number of tests performed. For example, the probability of obtaining at least one false positive result when performing 10 tests is given by
where P(A) is the confidence level of the test.
Although the problems associated with multiple testing are well known, numerous studies still fail to correct their reported Pvalues. For instance, Bennett et al. found that only between 60% and 74% of the neuroimaging articles published in several major journals corrected for multiple comparisons. ^{[4]} Similarly, a study performed by Austin et al. also demonstrated that the failure to account for multiple testing resulted in statistically significant, yet implausible results. ^{[5]} In both cases the results were no longer significant after correcting for multiple testing.
The lack of attention paid to this problem in the pathology literature stands in stark contrast to its recognition in other fields such as ecology where there has been intense interest for over two decades since the seminal publication by Rice. ^{[6]} That being said, even within the field of ecology this topic still engenders debate. ^{[7]} A systematic exploration of this problem in the pathology literature has not been undertaken; however we have previously reported on a convenience sample of 800 publications from the pathology literature in 2003, of which 37 presented multiple comparisons. Twenty one of these 37 did not attempt to control for increased type I error due to multiple comparisons. ^{[8]}
One means of reducing the type I error from multiple testing is the Bonferroni correction, which controls the familywise error rate (FWER). The FWER is the probability of type I error among the entire set of hypotheses.
The Bonferroni correction is calculated as follows:
where n is the number of hypotheses tested. There is a lack of consensus as to what actually represents a "family" of statistical tests; however it has been suggested that if it is appropriate to place multiple Pvalues in the same table, it may be appropriate to correct all values in that table for multiple comparisons. ^{[6]}
Because the Bonferroni correction is conservative with regard to statistical power, other methods of correcting for multiple testing have been developed. Another method that controls for the FWER is the BonferroniHolm correction. ^{[9]} The BonferroniHolm correction is calculated as follows:
where n is the number of hypotheses tested, and k is the ordered rank of the uncorrected Pvalues (from smallest Pvalue to largest Pvalue).
Rather than controlling for the probability of one or more type I errors in the entire experiment, some of the more recent approaches to the multiple testing problem have focused on controlling the false discovery rate (FDR) in the experiment. By controlling the proportion of type I errors, this has the advantage of further increasing the statistical power of the algorithm, and is especially suitable when conducting numerous hypothesis tests. ^{[10],[11]} The BenjaminiHochberg method ^{[12]} is a commonly used way to control the FDR of an experiment. It is calculated as follows:
where n is the number of hypotheses tested, and k is the rank of the uncorrected P value.
Several commercial statistical software packages are capable of performing one or more of these corrections as well as at least one opensource program (GNU R); however the cost of the commercial packages, and the learning curves involved, may discourage researchers from using these programs. Online tools are also available (e.g., http://www.quantitativeskills.com/sisa/calculations/bonfer.htm) but are limited in scope and available options and rely on continued access to the publisher's website.
"Bonferroni Calculator" software
Using the opensource programming language Python v 3.2, we developed a program capable of performing Bonferroni, BonferroniHolm, and BenjaminiHochberg corrections for any number of Pvalues. The user is prompted for a set of Pvalues and the desired significance (alpha) level. From the main menu the user may choose to display the results of the desired correction to the screen, or to export the corrected P values to the hard disk (text and csv file types). The source code is available free as a supplementary file to this article (which may serve as a literature reference for the program). A copy of the source code may also be obtained by email from the corresponding author. The program requires the free programming language Python 3.2 which is capable of running on Microsoft Windows, MAC OS, and Linux/Unix operating systems. It may be downloaded from http://www.python.org/getit/releases/3.2/.
The program is available for free by emailing the senior author at christopher.naugler@cls.ab.ca. Detailed instructions and a FAQ are available at https://sites.google.com/site/christophernaugler/. To use the Bonferroni Calculator software, place the files "Bonferroni Calculator.py" and "Lesack and Naugler.txt" in a folder on your hard drive. In windows, the program will run from the command line by double clicking on the "Bonferroni Calculator.py" icon; however the preferred method is to right click on the icon and select "Edit with IDLE" from the dropdown list. Press F5 to run the software, and then maximize the size of the window. Follow the instructions on the screen. If the option is selected to save the results to files, these will be found in the same folder as the "Bonferroni Calculator.py" icon. The program is also available from the authors as a standalone executable file.
References   
1.  Koch G, Gansky M. Statistical considerations for multiplicity in confirmatory protocols. Drug Inf J 1996;30:52333. 
2.  Bender R, Lange S. Adjusting for multiple testingwhen and how? J Clin Epidemiol 2001;54:3439. 
3.  Karr A, Young SS. Deming, data and observational studies. Significance 2011;8:116120. 
4.  Bennett CM, Baird AA, Miller MB, Wolford GL. Neural correlates of interspecies perspective taking in the postmortem Atlantic Salmon: an argument for multiple comparisons correction. J Serendipitous Unexpected Results 2010;1:15. 
5.  Austin PC, Mamdani MM, Juurlink DN, Hux JE. Testing multiple statistical hypotheses resulted in spurious associations: a study of astrological signs and health. J Clin Epidemiol 2006;59:9649. 
6.  Rice WR. Analyzing tables of statistical tests. Evolution 1989;43:2235. 
7.  Nakagawa S. A farewell to Bonferroni: the problems of low statistical power and publication bias. Behav Ecol 2004;15:10445. 
8.  Zheng Z, Naugler C. Type I error in pathology papers, prevalence and effect on publication citations. Poster Presentation, Canadian Association of Pathologists Annual Scientific Meeting, Montreal, PQ, Jul 1115 2010. 
9.  Holm S. A simple sequentially rejective multiple test procedure. Scand J Stat 1979;6:6570. 
10.  García LV. Escaping the Bonferroni iron claw in ecological studies. Oikos 2004;105:65763. 
11.  Wit E, McClure J. Statistics for microarrays: Design, Analysis, and Inference. 1 ^{st} ed. Hoboken, New Jersey: John Wiley and Sons; 2004. p.195. 
12.  Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Series B 1995;57:289300. 
This article has been cited by  1 
FleQ regulates both the type VI secretion system and flagella in Pseudomonas putida 

 Yuzhou Wang,Ye Li,Jianli Wang,Xiaoyuan Wang   Biotechnology and Applied Biochemistry. 2017;   [Pubmed]  [DOI]   2 
Physiological and parasitological implications of living in a city: the case of the whitefooted tamarin (Saguinus leucopus) 

 Iván Darío SotoCalderón,Yuliet Andrea AcevedoGarcés,Jóhnatan ÁlvarezCardona,Carolina HernándezCastro,Gisela María GarcíaMontoya   American Journal of Primatology. 2016;   [Pubmed]  [DOI]   3 
Gene expression signatures, pathways and networks in carotid atherosclerosis 

 L. Perisic,S. Aldi,Y. Sun,L. Folkersen,A. Razuvaev,J. Roy,M. Lengquist,S. Åkesson,C. E. Wheelock,L. Maegdefessel,A. Gabrielsen,J. Odeberg,G. K. Hansson,G. PaulssonBerne,U. Hedin   Journal of Internal Medicine. 2016; 279(3): 293   [Pubmed]  [DOI]   4 
Statistics Commentary Series 

 David L. Streiner   Journal of Clinical Psychopharmacology. 2016; 36(1): 5   [Pubmed]  [DOI]   5 
Linking GABA and glutamate levels to cognitive skill acquisition during development 

 Kathrin Cohen Kadosh,Beatrix Krause,Andrew J. King,Jamie Near,Roi Cohen Kadosh   Human Brain Mapping. 2015; 36(11): 4334   [Pubmed]  [DOI]   6 
Customizing Laboratory Information Systems 

 Peter Gershkovich,John H. Sinard   Advances In Anatomic Pathology. 2015; 22(5): 323   [Pubmed]  [DOI]   7 
Integrated Metabolomic and Proteomic Analysis Reveals Systemic Responses ofRubrivivax benzoatilyticusJA2 to Aniline Stress 

 Md Mujahid,M Lakshmi Prasuna,Ch Sasikala,Ch Venkata Ramana   Journal of Proteome Research. 2015; 14(2): 711   [Pubmed]  [DOI]   8 
The impact of ginsenosides on cognitive deficits in experimental animal studies of Alzheimer’s disease: a systematic review 

 Chenxia Sheng,Weijun Peng,Zian Xia,Yang Wang,Zeqi Chen,Nanxiang Su,Zhe Wang   BMC Complementary and Alternative Medicine. 2015; 15(1)   [Pubmed]  [DOI]   9 
Population Genetic Structure of Southern Flounder Inferred from Multilocus DNA Profiles 

 Verena H. Wang,Michael A. McCartney,Frederick S. Scharf   Marine and Coastal Fisheries. 2015; 7(1): 220   [Pubmed]  [DOI]   10 
Treatment and posttreatment effects induced by the Forsus appliance:A controlled clinical study 

 Giorgio Cacciatore,Luis Tomas Huanca Ghislanzoni,Lisa Alvetro,Veronica Giuntini,Lorenzo Franchi   The Angle Orthodontist. 2014; 84(6): 1010   [Pubmed]  [DOI]   11 
A novel compression garment with adhesive silicone stripes improves repeated sprint performance – a multiexperimental approach on the underlying mechanisms 

 DennisPeter Born,HansChrister Holmberg,Florian Goernert,Billy Sperlich   BMC Sports Science, Medicine and Rehabilitation. 2014; 6(1): 21   [Pubmed]  [DOI]   12 
Shortterm effects of a modified AltRAMEC protocol for early treatment of Class III malocclusion: a controlled study 

 C. Masucci,L. Franchi,V. Giuntini,E. Defraia   Orthodontics & Craniofacial Research. 2014; 17(4): 259   [Pubmed]  [DOI]   13 
Anteriorposterior cerebral blood volume gradient in human subiculum 

 Pratik Talati,Swati Rane,Samet Kose,John Gore,Stephan Heckers   Hippocampus. 2014; : n/a   [Pubmed]  [DOI]   14 
Characteristics of cognitive deficits and writing skills of Polish adults with developmental dyslexia 

 Katarzyna Maria Bogdanowicz,Marta Lockiewicz,Marta Bogdanowicz,Maria Pachalska   International Journal of Psychophysiology. 2013;   [Pubmed]  [DOI]   15 
Sleepiness and nocturnal hypoxemia in Peruvian men with obstructive sleep apnea 

 Charles Huamaní,Jorge Rey de Castro,Edward MezonesHolguín   Sleep and Breathing. 2013;   [Pubmed]  [DOI]   16 
Investigation of genetic risk factors for chronic adult diseases for association with preterm birth 

 Nadia Falah,Jude McElroy,Victoria Snegovskikh,Charles J. Lockwood,Errol Norwitz,Jeffey C. Murray,Edward Kuczynski,Ramkumar Menon,Kari Teramo,Louis J. Muglia,Thomas Morgan   Human Genetics. 2013; 132(1): 57   [Pubmed]  [DOI]  




