Journal of Pathology Informatics Journal of Pathology Informatics
Contact us | Home | Login   |  Users Online: 134  Print this pageEmail this pageSmall font sizeDefault font sizeIncrease font size 




 
Table of Contents    
LETTER TO EDITOR
J Pathol Inform 2017,  8:17

Compromising the security of “Generating unique identifiers from patient identification data using security models”


Sydney Medical School, The University of Sydney, NSW 2006; Informatics Committee, The Royal College of Pathologists of Australasia, NSW 2010, Australia

Date of Submission01-Jan-2017
Date of Acceptance23-Feb-2017
Date of Web Publication10-Apr-2017

Correspondence Address:
Arran Schlosberg
Sydney Medical School, The University of Sydney, NSW 2006
Australia
Login to access the Email id

Source of Support: None, Conflict of Interest: None


DOI: 10.4103/jpi.jpi_1_17

Rights and Permissions

How to cite this article:
Schlosberg A. Compromising the security of “Generating unique identifiers from patient identification data using security models”. J Pathol Inform 2017;8:17

How to cite this URL:
Schlosberg A. Compromising the security of “Generating unique identifiers from patient identification data using security models”. J Pathol Inform [serial online] 2017 [cited 2017 Jun 29];8:17. Available from: http://www.jpathinformatics.org/text.asp?2017/8/1/17/204195



Sir,

I write with respect to the Technical Note “Generating unique identifiers (IDs) from patient identification data using security models,”[1] the authors of which propose a method to “create a unique one-way encrypted ID per patient that can be used for data sharing.” In summary, their method involves concatenation of a patient's date of birth, sex, and surname, utilizing either the MD5 or SHA-1 cryptographic hash of this value as the record ID.

The authors conclude that this “can be used to share patient electronic medical records between practitioners without revealing patients' identifiable data.” Here, I demonstrate that this is not the case and wish to recommend that the method should not be utilized under circumstances in which the privacy of underlying patient data is required.

The authors state that “the difficulty of coming up with any message having a given MD is on the order of 2128 operations;” however, even in the absence of known weaknesses in the MD5 algorithm,[2] this assumes an unbounded input space. The proposed methodology is strictly limited by the number of feasible birth dates, names, and sexes – excluding leap days and assuming only binary sexes, the input space for a 100-year period is only 73,000 per surname.

It is thus possible to perform a brute-force, precomputed attack utilizing common surnames. Known as a rainbow table, I calculated the proposed IDs for two sexes, birth dates spanning all of the century 1917–2016 inclusive, and the top ten most common surnames in the 2000 USA census.[3] This approach reduces the search space to < 223 and performed on my personal laptop; computation took a mere 8.8 s to compromise the IDs of over 13 million people (based on census counts) for both MD5 and SHA-1. The results of my calculations are available for download at https://goo.gl/xqwphs and constitute a reverse-lookup database that fully compromises the security of the proposed method.

It is trivial to modify the input format for the precomputed IDs and to extend the rainbow table to cover more surnames; nevertheless, the secrecy of the input format would not contribute to security, under Kerckhoffs' principle (French original;[4] English elucidation [5]). Given the independence between IDs, this brute-force process is known as embarrassingly parallel,[6] allowing for computation to be shared across any number of devices (without modifying code) which results in a decreased time for compromise. A number of other weaknesses exist in the proposed methodology, but I limit myself to detailing the most severe one in the interest of being succinct.

Financial support and sponsorship

Nil.

Conflicts of interest

There are no conflicts of interest.



 
   References Top

1.
Mohammed EA, Slack JC, Naugler CT. Generating unique IDs from patient identification data using security models. J Pathol Inform 2016;7:55.  Back to cited text no. 1
  [Full text]  
2.
Wang X, Yu H. In: Advances in Cryptology-EUROCRYPT. Berlin: Springer; 2005. p. 19-35.   Back to cited text no. 2
    
3.
United States Census Bureau. Frequently Occurring Surnames from the Census 2000; 2000. Available from: https://www.census.gov/topics/population/genealogy/data/2000_surnames.html. [Last accessed on 2017 Feb 19].  Back to cited text no. 3
    
4.
Kerckhoffs A. La cryptographie militaire. J Sci Mil 1883;IX: 5-38.  Back to cited text no. 4
    
5.
Schlosberg A. Data security in genomics: A review of Australian privacy requirements and their relation to cryptography in data storage. J Pathol Inform 2016;7:6. [doi: 10.4103/2153-3539.175793].  Back to cited text no. 5
    
6.
Maurice H, Nir S. The Art of Multiprocessor Programming, Revised Reprint. Revised Edition. San Francisco, United States: Elsevier; 2012. p. 14.  Back to cited text no. 6
    




 

 
Top
  

    

 
  Search
 
   Browse articles
  
    Similar in PUBMED
   Search Pubmed for
   Search in Google Scholar for
    Access Statistics
    Email Alert *
    Add to My List *
* Registration required (free)  

 
  In this article
    References

 Article Access Statistics
    Viewed626    
    Printed6    
    Emailed0    
    PDF Downloaded60    
    Comments [Add]    

Recommend this journal