RESEARCH ARTICLE |
|
Year : 2022 | Volume
: 13
| Issue : 1 | Page : 10 |
|
Prediction of tuberculosis using an automated machine learning platform for models trained on synthetic data
Hooman H Rashidi1, Imran H Khan1, Luke T Dang1, Samer Albahra1, Ujjwal Ratan2, Nihir Chadderwala2, Wilson To2, Prathima Srinivas2, Jeffery Wajda3, Nam K Tran1
1 Department of Pathology and Laboratory Medicine, University of California, Davis, School of Medicine, Sacramento, California, United States of America 2 Amazon Web Services, Seattle, Washington, United States of America 3 UC Davis Health, Sacramento, California, United States of America
Correspondence Address:
Dr. Hooman H Rashidi Dept. of Pathology and Laboratory Medicine, University of California Davis, 4400 V St., Sacramento 95817. United States of America
 Source of Support: None, Conflict of Interest: None  | Check |
DOI: 10.4103/jpi.jpi_75_21
|
|
High-quality medical data is critical to the development and implementation of machine learning (ML) algorithms in healthcare; however, security, and privacy concerns continue to limit access. We sought to determine the utility of “synthetic data” in training ML algorithms for the detection of tuberculosis (TB) from inflammatory biomarker profiles. A retrospective dataset (A) comprised of 278 patients was used to generate synthetic datasets (B, C, and D) for training models prior to secondary validation on a generalization dataset. ML models trained and validated on the Dataset A (real) demonstrated an accuracy of 90%, a sensitivity of 89% (95% CI, 83–94%), and a specificity of 100% (95% CI, 81–100%). Models trained using the optimal synthetic dataset B showed an accuracy of 91%, a sensitivity of 93% (95% CI, 87–96%), and a specificity of 77% (95% CI, 50–93%). Synthetic datasets C and D displayed diminished performance measures (respective accuracies of 71% and 54%). This pilot study highlights the promise of synthetic data as an expedited means for ML algorithm development. |
|
|
|
[FULL TEXT] [PDF]* |
|
 |
|