|J Pathol Inform 2017,
Hepatitis C virus Genie: A web 2.0 interpretation and analytics platform for the Versant Hepatitis C virus genotype Line Probe Assay version 2.0
Alex M Dussaq1, Abha Soni1, Christopher Willey2, Seung L Park1, Shuko Harada1
1 Department of Pathology, University of Alabama at Birmingham, Birmingham, AL, USA
2 Department of Radiation Oncology, University of Alabama at Birmingham, Birmingham, AL, USA
|Date of Submission||23-May-2017|
|Date of Acceptance||27-Jul-2017|
|Date of Web Publication||03-Oct-2017|
Seung L Park
Senior Vice President and Chief Health Information Officer, Indiana University Health, 1515 N Senate Avenue, Indianapolis, IN 46202
Source of Support: None, Conflict of Interest: None
| Abstract|| |
|How to cite this article:|
Dussaq AM, Soni A, Willey C, Park SL, Harada S. Hepatitis C virus Genie: A web 2.0 interpretation and analytics platform for the Versant Hepatitis C virus genotype Line Probe Assay version 2.0. J Pathol Inform 2017;8:41
|How to cite this URL:|
Dussaq AM, Soni A, Willey C, Park SL, Harada S. Hepatitis C virus Genie: A web 2.0 interpretation and analytics platform for the Versant Hepatitis C virus genotype Line Probe Assay version 2.0. J Pathol Inform [serial online] 2017 [cited 2018 Apr 23];8:41. Available from: http://www.jpathinformatics.org/text.asp?2017/8/1/41/215896
| Introduction|| |
Hepatitis C virus (HCV) afflicts approximately 130–150 million people globally with acute and chronic hepatitis infection. If left untreated, the end-stage complications of HCV include cirrhosis and hepatocellular carcinoma. The traditional difficulties in the treatment of chronic HCV may be related to the virus genetic heterogeneity, dividing it into 11 different genotypes and multiple subtypes within each genotype. Typically, patients are infected with one genotype and subtype that rapidly mutates over time, making each genotype a mixture of multiple viruses classified as quasispecies. These genotypic variations assist the virus in quickly escaping the body's immune response, advancing disease progression, and impeding therapeutic intervention. HCV genotype distribution varies geographically, with genotypes 1–3 occurring most commonly in the United States and worldwide. Genotype 4 is more frequently identified in the Middle East and Africa, genotype 5 in South Africa, genotype 6 in South America, rare cases of genotype 7–9 in Central Africa and Vietnam, and rare cases of genotypes 10 and 11 in Indonesia.,,
The recent review by Burstow et al. highlights the substantial changes in chronic HCV treatment over the past 25 years. Treatment regimens are moving away from genotype based, less efficacious, and more complicated interferon-based drug combinations and toward pangenotypic therapies that utilize inhibitors of the HCV RNA polymerase nonstructural protein 5B. The European Association for the Study of the Liver and American Association for the Study of Liver Diseases recommended that treatment for noncirrhotic patients is sofosbuvir/ledipasvir for genotypes 1 and 4–6 and sofosbuvir/velpatasvir for genotypes 2–3. The US Food and Drug Administration has approved sofosbuvir/velpatasvir for the treatment of HCV genotypes 1–6 in noncirrhotic patients, adding ribavirin in the setting of cirrhosis. While HCV treatment is moving away from requiring genotyping to determine drug choice, treatment duration may decrease in a genotype-specific manner as trials continue.
HCV genotyping at our institute is performed using the Versant HCV genotype 2.0 Line Probe Assay (LiPA). This assay utilizes reverse hybridization technology to detect genotypes 1 through 6 along with 15 subtypes. The polymerase chain reaction-amplified products using biotinylated DNA are hybridized to immobilized oligonucleotide probes and visualized using alkaline phosphatase-labeled streptavidin, which results in a visible line pattern on the strip, specific for each genotype. Each strip has a control section and multiple parallel DNA probe lines containing sequences specific for genotypes 1 through genotype 6. Genotypes 7 through 11 are assigned to genotype 6, subtypes c to l.
The last step of this procedure is a manual interpretation that involves the comparison of bands on a test strip to a physical reference table. This manual process is time consuming and error prone. This posthybridization analysis takes 13.8 min (±0.98 for 16 samples) by an experienced technologist but takes longer for residents or inexperienced technologists.
The aim of this study is to develop an automated system to interpret HCV genotypes from Versant HCV Genotype 2.0 Assay to (a) minimize interpretation time, (b) reduce error, and therefore (c) increase the quality of patient care delivered through this test methodology. This study consists of two parts; (1) a web-based HCV genotype interpretation platform and (2) a web environment with an analytical step where an institution can utilize a scanned LiPA image to generate the genotyping results. This study also demonstrates the importance of medical informatics in clinical testing by creating a program that can perform interpretation and rapidly generate results.
| Subjects and Methods|| |
Data collection (four phases)
All patients' results were processed shortly after collection as described in the introduction. The manual method relies on taping strips to a standard sheet of paper [Supplemental Figure 1] and was utilized at some point in every phase.
Phase 1: Two hundred random manually interpreted samples from June to August 2014 were selected in September 2014 for original database validation. In addition, in the same month, three novice residents and three experienced technologists were each given two new samples to interpret, one using “HCV Genie” and the other manually using the reference chart. The time taken by each participant was then recorded and compared.
Phase 2: Forty-seven randomly selected pages, 648 patient results, from March 2015 to September 2015 were collected in January of 2016. All pages were then scanned to a portable document format (PDF) utilizing one of two flatbed scanners (HP LaserJet M3035xs MFP and HP LaserJet M4345 MFP) at the defaulted “medium” quality and 150 dpi resolution. These samples are utilized as the demonstration samples for the final tool and were utilized in the initial building of HCV Genie 2.
Phase 3: We collected 12 pages with 192 patient samples from July 2016 to September 2016 in October of 2016. All pages were then scanned to a PDF utilizing a flatbed scanner (HP LaserJet M4345 MFP) at three different qualities: 150, 200, and 300 dpi.
Phase 4: Patient results from January 2017 to March 2017 were processed shortly after collection as described in the introduction. Then, each test strip, while still wet, was placed on an 8.5 × 11” sheet of laminated beige paper [Supplemental Figure 2]. The strips were dabbed with an absorptive cloth until they appeared dry then were scanned to a PDF utilizing a flatbed scanner (HP LaserJet M3035xs MFP). We collected 11 such pages for a total of 148 test strips. These were read in parallel utilizing the software suite and the traditional method. Both accuracy and sample preparation/analysis time were compared.
HCV Genie was built in two phases:
Phase 1 was based on traditional server side tools: Server Hardware: Dell Precision T3600; Host Virtualization Hypervisor: VMWare ESXi 4.1.0; Guest Operating System: Ubuntu Linux Server 14.04 LTS 64-bit; Web Server: Nginx 1.7; Database Management System: MariaDB 10.0; Programming Language: PHP-FPM 5.5; User Interface Framework: Twitter Bootstrap 3.3.
Web Tool Kit
The reference database was populated utilizing Versant's physical reference table that corresponded to specific HCV subtypes and clinically validated against the current manual interpretation methodology initially over the 200 samples from Phase 1 by manual data entry. Further validation was preformed utilizing the image analysis program.
Beginning with a scanned PDF image, jsPDF is used to convert image data to a canvas object and RBGa pixel data are extracted. Indicator bands are detected globally by distance  on the LAB color space  from “green.” The LAB value for “green” were derived by minimizing the total color distance across forty manually identified indicator bands with a steepest descent algorithm. Initial rectangle sizes are then approximated by finding contiguous “green” regions across the page. Edges were detected utilizing a Sobel operator (minimum edge strength: 10) on a matrix of LAB distance from green. Based on the output from the Sobel operator, a novel windowed Hough transformation (https://github.com/adussaq/hcv_genie/blob/gh-pages/js/workers/houghTransformWorker.js) was applied in the region of each rectangular band. The Hough transformation  assumes the presence of a rectangle and estimates the dimensions, angle, and center of each indicator band. The result of this is then plotted on a second canvas object overlaying the original image. The algorithm utilized the rectangle angle, rectangle center, and the assumption that the bands are approximately top to bottom to “walk” down the center of each lane. The “walk” process applies a Sobel operator (minimum edge strength: 0.05, with nonmaximum suppression) to a grayness, matrix for a reasonable window around the estimated lane then moves down the estimated center of the lane. Each location investigated presents three outcomes:
- No edge detected, continue to move down
- A horizontal edge detected, begin a Hough transform
- A vertical edge detected, reevaluate the center of the lane.
In case (2), a new Sobel operator is applied without nonmaximum suppression, a Hough transform is applied, and a grey score, G is calculated:
Where ḡ is the average grayness across the rectangle, gμ½ is the median grayness across the rectangle, hh is the horizontal Hough transformation total strength, hv is the horizontal Hough transformation total strength, and a0,…, a6 are coefficients determined by linear regression (methods: Model parameterization). If the score is greater than the inputted minimum, it is called a band, plotted, and adjustments are made to the estimated angle and center of the lane and we continue to walk and distance from the indicator band is collected.
All bands and lane data are collected and band calls, up to the projected 7th band spot, are made based on this model:
Where h is the average Hough transform band height across the page, w is the average Hough transform band width, di is the Euclidean distance of the Hough transform center of the band in question from the corresponding Hough transform center of the indicator band, b0, b1, and b2 are coefficients determined by linear regression (methods: model parameterization), and l6, i is the band location, an integer. These data points provide us with the following additional parameters:
Where n is the total number of bands in the first six locations. Finally, the remaining bands were determined to have locations based on the following equation:
Where li is the band location, an integer, and c0,…c3 are coefficients determined by linear regression (methods: model parameterization).
The resulting image is interactive, bands can be removed, added, or calls edited by clicking on the image itself. An interactive table also allows changes to the band calls made. Finally, a report may be generated as a PDF utilizing jsPDF  and rasterize HTML  were used.
Varying backgrounds and image qualities create different parameterizations for the coefficients of equations 1, 2, and 4 (methods: Image analysis). These are addressed utilizing two major methods: (1) a parameter object may be input as a URL link and (2) a parameter object can be trained in an interactive way utilizing the “Train” tab of HCVGenie.com.
The training process utilizes the algorithm described above to generate an interactive image then allows the user to adjust band calls. When utilizing a PDF (recommended), this utilizes the same image with three scales (2.0, 2.25, and 2.5) from the pdfjs package. Each image analyzed extracts the distance from indicator band and various indicators of band strength, for positive bands and negative regions in between each set of bands. This then utilizes linear regression to parameterize necessary equations. Utilizing this process over eight images generated our default parameterization values. Three from phase 3 (150, 200, and 300 dpi) and 1 from Phase 4 and four selected images from Phase 2 (methods: Data collection) were utilized for this purpose.
Calculation of time statistics
Average time for original method: Due to the dependence of analysis time (correlation 0.774, P = 0.0052) on number of samples, the time/sample was calculated and found to have a mean of 0.862 (±0.245) min/sample. The average time and standard deviation for 16 samples can be found with simple multiplication. Average time for new method: This value does not depend on the number of samples (correlation 0.435, P = 0.1816); therefore average and standard deviation calculations were calculated as normal.
| Results|| |
The final tool is available at http://hcvgenie.com. Given an optional parameterization object, this tool allows a user to access an image file (highly recommended PDF) with any number of Versant HCV genotype 2.0 LiPA strips from their computer, identify the banding patterns, the genotypes, make needed edits, and export the results as a PDF. If the default parameterization is not sufficient, the user may create a parameterization object with the tool and pass it into the algorithm with a URL for future use.
Database testing and validation
This tool was developed and validated in two major iterations. The first iteration, HCV Genie 1, utilized the database of banding patterns and a manual entry text box. It was shown to be identical to human expert interpretation (n = 200). The automated method of interpreting HCV genotype decreases the time needed to interpret results by 53% in novice medical personnel (n = 2). However, more experienced technologists spend the same amount of time generating results using manual or automated methods.
Image analysis results
The fundamental image analysis steps outlined here are visualized in [Figure 1]. The general steps are: (1) detect the green indicator bands location and angle, (2) inspect the center of each lane for bands, (3) determine the banding pattern based on band distance from the indicator band, and (4) determine the genotype. The algorithm itself and the linear model derivation are described in detail in methods: image analysis. To test the accuracy of the default parameters, we ran the algorithm over 11 pages with 192 test strips at three different scanning resolutions: 300, 200, and 150 dpi. The manual method on further inspection missed two faint bands, missed one strong band, and misidentified two bands. This resulted in no genotyping errors. The 300 dpi resolution missed 14 faint bands as identified by manual methods, misidentified 0 bands, and correctly found and identified 1118/1132 (98.8%) bands. Only one missed band affected the genotype called, and all of these misses could be manually corrected with ease. The 200 dpi images had identical results to the 300 dpi images. The 150 dpi could not separate two lanes on one image; this caused 14 bands to be missed and no genotype to be identified. Due to this, we recommend utilizing at least 200 dpi scans.
|Figure 1: Basic outline of image analysis algorithm. Panel A: Section containing three Line Probe Assay strips. Panel B: Indicator bands are detected by LAB color then rectangle parameters are determined utilizing an adapted widowed Hough transformation. Panel C: Bands are identified and called based on distance from the indicator band|
Click here to view
The final parameterization for the equations described in methods utilized in laboratory testing had the following values:
Equation 1: with a minimum accepted value of 0.2064 and a model R2 = 0.9222.
Equation 2: b0:0.0130,b1:0.1179,b2:1.2547 with a model R2 = 0.9901.
Equation 4: b0:0.9143,b1:0.0464,b2:-0.0138 with a model R2 = 0.9998.
This was based on several randomly selected pages across the testing sets. These values may be updated for different institutions with different scans by utilizing the training tool built in.
HCVGenie.com testing and validation
Over a 2-month period, we tested the use of this algorithm concurrently with a new sample preparation method. Rather than waiting for the strips to dry and taping them to a piece of paper [Supplemental Figure 1] each test strip, while still wet, was placed on a standard sheet of laminated beige paper [Supplemental Figure 2], blotted with an absorptive disposable towel then scanned on a flatbed scanner. The resultant image was loaded into the website, hcvgenie.com, and checked against the physical strips for consistency. Over these samples, we had two faint bands manually added and one false band removed. Once the bands were manually corrected, all genotype calls were accurate. The algorithm caught two initially incorrect subgenotypes as identified by the technologist; the pathologist may have caught these errors upon review; however, they exemplify the utility of automation. As can be seen in [Figure 2], the algorithm and new method decreased the total analysis time. From [Figure 2], we make two observations: (1) the analysis time is correlated to the number of samples (R = 0.774, P = 0.005) when utilizing the manual method, but less so when utilizing the digital method (R = 0.774, P = 0.182) and (2) the digital method leads to reductions in the overall analysis time. If we use a 16 strip run as an example, utilizing the manual method, we would expect the analysis take 13.8 (±0.96) min (see methods for calculation of standard deviation). We can ignore the sample number correction for the digital method, which for any number of samples we would expect the analysis to take 5.0 (±1.09) min. In addition to saving 8.8 min for 16 samples, the new method creates a digital record of the analysis that can be delivered with greater ease to the pathologist for verification.
|Figure 2: Analysis time for eleven Line Probe Assay hybridizations. The analysis time in minutes for the manual method is indicated by the purple and green bars, where the green represents the time-consuming process of taping the Line Probe Assay strips to a piece of paper and the purple represents the time needed to read and interpret the banding patterns. The analysis time in minutes for the digital method is indicated by the blue and red bars, where the blue represents the amount of time needed to dry, scan, E-mail, and load the image into the tool at HCVGenie.com and the red is the time spent checking the band calls against the physical strips|
Click here to view
| Discussion|| |
Recognition of the significance of informatics to the practice of pathology is essential for the future of this specialty. At the University of Alabama at Birmingham, residents have a unique curriculum in informatics.,, The entire rotation is centered on recognizing an immediate need for informatics in the department, building a proposal, and then implementing a program to respond to that need, with a basic foundation in informatics. As exemplified by our program “HCV Genie,” there are so many arduous manual techniques in the laboratory that can be easily automated with proper guidance and supervision.
Our first program provided results that are identical to manual workflow, but with reduced manual steps and in a time frame similar to that of the well-trained manual interpreter, regardless of the program user's experience level. This iteration involved developing lane and band detection algorithms and creating a publically available tool that eliminates data privacy concerns. This program is a useful and portable tool to cross train technicians, residents, and physicians in the molecular/infectious disease laboratories. Our initial results suggested that more experienced technologists require the same amount of time needed to generate the results using manual or automated methods. This is likely due to the staff's familiarity with the manual technique and uncertainty with the automated technique. However, with the incorporation of a band detection algorithm, even the most experienced technologists preferred the automated technique and the time saving it provides. We have transitioned to utilize the digital method exclusively and as familiarity with this automation method has increased, it has become the universally preferred methodology for interpreting HCV genotype results. In addition, we have transitioned from the management-dependent server database architecture to client side computing. This shift creates an environment where your data remain secure and where tools can continue to persist with little to no interference.
While the time saving and preferential workflow introduced by HCV Genie 2 are of significant values, the potential for decreasing error should not go unmentioned. In our live testing, HCV Genie 2 found two genotyping errors made by the technicians. These may have been corrected upon review by the pathologist; however, they were caught before that was checked. In addition, in our review of previously processed samples, HCV Genie found five band identification errors. None of the errors found altered the genotype calls and as such were not critical for patient care. However, this may not always be the case and the additional “digital eyes” could lead to decreased genotyping errors in the future.
The authors would like to thank Gina M. Coshatt (Molecular Diagnostics Laboratory Supervisor), Alicia R. Armstead (Molecular Diagnostics Laboratory Technologist), and Jessica A. Levesque, MD (Molecular Genetic Pathology Fellow), for their help in beta-testing the various iterations of HCV Genie. In addition, Israel Ponce-Rodriguez and Timothy Awtrey for server setup during HCV Genie 1's development.
Financial support and sponsorship
This work was partially supported by NIGMS MSTP T32GM008361.
Conflicts of interest
There are no conflicts of interest.
| References|| |
Swann RE, Cowton VM, Robinson MW, Cole SJ, Barclay ST, Mills PR, et al.
Broad anti-hepatitis C virus (Hcv) antibody responses are associated with improved clinical disease parameters in chronic HCV infection. J Virol 2016;90:4530-43.
Preciado MV, Valva P, Escobar-Gutierrez A, Rahal P, Ruiz-Tovar K, Yamasaki L, et al.
Hepatitis C virus molecular evolution: Transmission, disease progression and antiviral therapy. World J Gastroenterol 2014;20:15992-6013.
Messina JP, Humphreys I, Flaxman A, Brown A, Cooke GS, Pybus OG, et al.
Global distribution and prevalence of hepatitis C virus genotypes. Hepatology 2015;61:77-87.
Murphy DG, Sablon E, Chamberland J, Fournier E, Dandavino R, Tremblay CL. Hepatitis C virus genotype 7, a new genotype originating from central Africa. J Clin Microbiol 2015;53:967-72.
Ali S, Ali I, Azam S, Ahmad B. Frequency distribution of HCV genotypes among chronic hepatitis C patients of Khyber Pakhtunkhwa. Virol J 2011;8:193.
Burstow NJ, Mohamed Z, Gomaa AI, Sonderup MW, Cook NA, Waked I, et al.
Hepatitis C treatment: Where are we now? Int J Gen Med 2017;10:39-52.
European Association for the Study of the Liver. Electronic address: Easloffice@easloffice.eu. EASL recommendations on treatment of hepatitis C 2016. J Hepatol 2017;66:153-94.
Verbeeck J, Stanley MJ, Shieh J, Celis L, Huyck E, Wollants E, et al.
Evaluation of versant hepatitis C virus genotype assay (LiPA) 2.0. J Clin Microbiol 2008;46:1901-6.
Mark Otto JT, Rebert C, Thilo J, Xhmikos, Fenkart H, Lauke PH. Bootstrap Release 3.3.7; 2016. Available from: https://www. Getbootstrap. Com
. [Last accessed on 2017 Jun 7]
Henricks WH, Karcher DS, Harrison JH, Sinard JH, Riben MW, Boyer PJ, et al.
Pathology informatics essentials for residents: A flexible informatics curriculum linked to accreditation council for graduate medical education milestones. J Pathol Inform 2016;7:27.
Kennell T, Laufer V, Lorenz R, Park S. This is your brain on informatics: A total-immersion data sciences course for the next generation of informaticists. In: Pathology Informatics Summit. Pittsburgh, PA, USA; 2014.
Agosto-Arroyo E, Coshatt GM, Winokur TS, Harada S, Park SL. Alchemy: A Web 2.0 real-time quality assurance platform for human immunodeficiency virus, hepatitis C virus, and BK virus quantitation assays. J Pathol Inform 2017;8:18.
] [Full text]
Park S, Kennell T, Lorenz R. One year of fried brains: Informatics as a driver of cultural change at an nih medical scientists training program. In: Pathology Informatics Summit. Pittsburgh, PA USA;2015.
[Figure 1], [Figure 2]