Journal of Pathology Informatics Journal of Pathology Informatics
Contact us | Home | Login   |  Users Online: 765  Print this pageEmail this pageSmall font sizeDefault font sizeIncrease font size 

Table of Contents    
J Pathol Inform 2017,  8:17

Compromising the security of “Generating unique identifiers from patient identification data using security models”

Sydney Medical School, The University of Sydney, NSW 2006; Informatics Committee, The Royal College of Pathologists of Australasia, NSW 2010, Australia

Date of Submission01-Jan-2017
Date of Acceptance23-Feb-2017
Date of Web Publication10-Apr-2017

Correspondence Address:
Arran Schlosberg
Sydney Medical School, The University of Sydney, NSW 2006
Login to access the Email id

Source of Support: None, Conflict of Interest: None

DOI: 10.4103/jpi.jpi_1_17

Rights and Permissions

How to cite this article:
Schlosberg A. Compromising the security of “Generating unique identifiers from patient identification data using security models”. J Pathol Inform 2017;8:17

How to cite this URL:
Schlosberg A. Compromising the security of “Generating unique identifiers from patient identification data using security models”. J Pathol Inform [serial online] 2017 [cited 2021 Sep 24];8:17. Available from:


I write with respect to the Technical Note “Generating unique identifiers (IDs) from patient identification data using security models,”[1] the authors of which propose a method to “create a unique one-way encrypted ID per patient that can be used for data sharing.” In summary, their method involves concatenation of a patient's date of birth, sex, and surname, utilizing either the MD5 or SHA-1 cryptographic hash of this value as the record ID.

The authors conclude that this “can be used to share patient electronic medical records between practitioners without revealing patients' identifiable data.” Here, I demonstrate that this is not the case and wish to recommend that the method should not be utilized under circumstances in which the privacy of underlying patient data is required.

The authors state that “the difficulty of coming up with any message having a given MD is on the order of 2128 operations;” however, even in the absence of known weaknesses in the MD5 algorithm,[2] this assumes an unbounded input space. The proposed methodology is strictly limited by the number of feasible birth dates, names, and sexes – excluding leap days and assuming only binary sexes, the input space for a 100-year period is only 73,000 per surname.

It is thus possible to perform a brute-force, precomputed attack utilizing common surnames. Known as a rainbow table, I calculated the proposed IDs for two sexes, birth dates spanning all of the century 1917–2016 inclusive, and the top ten most common surnames in the 2000 USA census.[3] This approach reduces the search space to < 223 and performed on my personal laptop; computation took a mere 8.8 s to compromise the IDs of over 13 million people (based on census counts) for both MD5 and SHA-1. The results of my calculations are available for download at and constitute a reverse-lookup database that fully compromises the security of the proposed method.

It is trivial to modify the input format for the precomputed IDs and to extend the rainbow table to cover more surnames; nevertheless, the secrecy of the input format would not contribute to security, under Kerckhoffs' principle (French original;[4] English elucidation [5]). Given the independence between IDs, this brute-force process is known as embarrassingly parallel,[6] allowing for computation to be shared across any number of devices (without modifying code) which results in a decreased time for compromise. A number of other weaknesses exist in the proposed methodology, but I limit myself to detailing the most severe one in the interest of being succinct.

Financial support and sponsorship


Conflicts of interest

There are no conflicts of interest.

   References Top

Mohammed EA, Slack JC, Naugler CT. Generating unique IDs from patient identification data using security models. J Pathol Inform 2016;7:55.  Back to cited text no. 1
  [Full text]  
Wang X, Yu H. In: Advances in Cryptology-EUROCRYPT. Berlin: Springer; 2005. p. 19-35.   Back to cited text no. 2
United States Census Bureau. Frequently Occurring Surnames from the Census 2000; 2000. Available from: [Last accessed on 2017 Feb 19].  Back to cited text no. 3
Kerckhoffs A. La cryptographie militaire. J Sci Mil 1883;IX: 5-38.  Back to cited text no. 4
Schlosberg A. Data security in genomics: A review of Australian privacy requirements and their relation to cryptography in data storage. J Pathol Inform 2016;7:6. [doi: 10.4103/2153-3539.175793].  Back to cited text no. 5
Maurice H, Nir S. The Art of Multiprocessor Programming, Revised Reprint. Revised Edition. San Francisco, United States: Elsevier; 2012. p. 14.  Back to cited text no. 6




   Browse articles
    Similar in PUBMED
   Search Pubmed for
   Search in Google Scholar for
    Access Statistics
    Email Alert *
    Add to My List *
* Registration required (free)  

  In this article

 Article Access Statistics
    PDF Downloaded237    
    Comments [Add]    

Recommend this journal