Genomic Testing - Genotyping/sequencing - Laura rasmussen - Torvik

Laura Rasmussen-Torvik: Thank you so much. I'm Laura Rasmussen-Torvik. I'm an assistant professor at Northwestern University, and I've been involved with eMERGE for about four years. And I'm presenting today I think as my role as one of the co-chairs of the PGx working group along with Dan Roden and Josh Denny. Can I have the next slide, please? So, I'm reporting on eMERGE progress in this area of genetic testing today, and my presentation's going to be nearly entirely about PGx because for eMERGE-II, the vast majority of genomic testing that has gone on has been in PGx. So, next slide, please. So, I know we've talked a little bit about PGx in several of the earlier talks, but I just wanted to give everyone a quick overview again and catch people up who may not have been on earlier. There's three primary aims for eMERGE PGx. In the first aim, across the network we're recruiting almost 9,000 people, and the goal is to recruit people we believe will be prescribed one of the drugs that have a CPIC guideline in the relatively near future. Then, on all of those nearly 9,000 people, we are doing deep sequencing using a capture agent I'll talk about in a little bit, and obtaining results that way. The second aim, the sort of actionable variants aim that you see on screen, has several parts. One of those parts is that at each of the eMERGE sites, they need to pick variants that they would like to return to individuals in the electronic health record. Then we need to generate clinical-grade genotypes so we can do this return. Then we need to figure out how to get those clinical genotypes into the EHR, and then we also have to develop clinical decision support to help our providers interpret that EHR information. And then in the third aim, we're taking all of this genotyping information that's coming off the PGRNSeq, which is our next-generation sequence platform, and we're creating repository of variants of unknown significance -- again, mainly the rare variants -- and also pairing that with phenotype information that we've extracted from electronic health records across all the sites, so that we hope we can initiate studies of function in genotype-phenotype relationships. So, that is the official overall aims of PGx. The reality is that PGx does vary from site to site. We've had to -- every site has had to tailor it depending on what clinicians at that site are interested in returning, what the IRB at that site is interested in letting us do. So, you know, at times, I've wanted to pull my hair out when we're trying to summarize eMERGE -- or, excuse me, PGx -- because it can look a little bit different from site to site. But in some ways I think that's really an opportunity for PGx, because since this is a diverse project and it's being implemented differently, we're having a variety of experiences, and I think it's important to report on those. Next slide, please. So, here's the progress of PGx as of mid-January 2014. We've accrued almost 4,000 people with samples to use on our next-generation sequencing platform, and this includes a mixture of sites, again, like -- it's going to be a reoccurring theme, that PGx is implemented differently across sites. So, some sites are recruiting de novo, other sites are recruiting from their existing biobank, and some sites actually had clinical samples in their existing biobank and therefore didn't need to re-recruit people for the sample. Twenty-four hundred of these people have been sequenced, and you'll notice that the denominators are slightly different because there are a couple sites that are sequencing more people than they are returning clinical results to. And then almost 1,400 people have had clinical genotypes obtained that we can put in the electronic health record. Next slide, please. So, some details about the PGx platform which we call the PGRNSeq. It's a next-generation sequencing capture agent, and it was developed by our partners PGRN. Eighty-four genes were selected by a vote of the PGRN community, and the sequence capture included the complete coding regions and some sequence upstream and downstream. The platform also includes some known variants that are present on other commercially-available platforms to make meta-analysis easier. Next slide, please. Batches of 24 or 48 samples are processed through Illumina flow-cell lanes, and there have been really, really fabulous results from this platform to date. Thirty-two diverse HapMap trios were sequenced, and, on average, the depth of coverage per sample was 496x. And then when you compare those genotypes that were derived from the PGRNSeq data, they were 99.9 percent concordant with existing SMB [spelled phonetically] data from these samples in the 1,000 Genomes project. Next slide, please. So, again, the implementation of this platform across the PGx sites has been diverse. Because of the way this supplement was funded, there are seven sites that are running samples at CIDR. Two of those sites are only running samples at CIDR, but those sites are running some samples at CIDR and some samples at two locations. This is complicated, but again, it also provides opportunities to really understand what it's like to try to implement an NGS platform across lots of sites. We have some diversity in the machinery being used: one site is using Ion Torrent and the other ones are using Illumina. Next slide, please. And here, you can see which sites are running at least some of their samples onsite, whereas opposed to others are sending all of their samples offsite to run PGx. And then, at the bottom part of the screen, you'll see that two groups, Mayo and Mt. Sinai, are hoping to return some results directly from PGRNSeq, and that would mean they are actually going to try to obtain clinical-grade results from PGRNSeq, and I'll talk a little bit more about this in a second. Can I have the next slide, please? So, again, when I was talking about the different specific aims for PGx, one of the most important aims for aim two was to get clinical-grade genotypes so that we could implement things in the electronic health record. Next slide, please. And here, I'm going to try to clarify my language, because it can get a little complicated when we're talking about PGx, but PGRNSeq is generally run on research-grade samples. And, in eMERGE, generally, when we're talking about PGRNSeq, we're talking about sequencing results on research results. Of course, return results to the electronic health record, we talk about needing CLIA-validated or clinical-level results. Often in PGx we refer to this as genotyping, because in most of the sites they are using more traditional genotyping techniques to generate these clinical results; however, there is this exception of some sites that are trying to generate clinical-grade results from PGRNSeq. So, I'll try to be careful about my language going forward so we all understand what I'm talking about. Next slide, please. And here's another sort of view of how it's being implemented across sites in terms of which specific variants are being validated, so that -- or which ones we're generating clinical-grade results on so we can put them in the electronic health record. These genes vary across sites. Several of the sites are all genotyping CYP2C19, VKOR, CYP2C9, and SLCO1B1, but not all. So again, we have a fair number of sites that are at least doing three pairs [unintelligible]. Next slide, please. Then, of course, there's diversity in the way we are clinically validating PGRNSeq. Six sites are validating some samples at the Johns Hopkins Diagnostic Library; this is using Sequenom panel. Most of the sites that are doing that are not validating all their samples this way. Other sites are using Sanger, Illumina ADME, Sequenom ADME, and, again, many sites are validating at more than one location using more than one method. It is complicated, but it also provides opportunities to compare across these different measures and even within the same sites. Next slide, please. Okay, so what are some things that PGx is really lending to the conversation in genomic testing? Next slide. And for these next couple slides I really must thank Marylyn Ritchie. She gave them to me, and she's the one -- she's very actively involved at the Coordinating Center. So we have several calling pipelines; again, these are required because we're generating the sequence data at many different sites. So, in order to make sure that we have comparable information as a group, we're doing several of those cross-site comparisons. So each site is performing sequencing on 32 HapMap trios along with the eMERGE study samples, and the Coordinating Center is calculating the concordance for these trios. And so that's the first of the two concordance checks mentioned at the bottom; the other is that the Coordinating Center is comparing VCF on eMERGE study samples generated by the sequencing facility and VCF generated by the eMERGE Coordinating Center pipeline. Next slide, please. So here's a cross-site comparison of these HapMap trios across different sites running PGRNSeq, and as you can see, we have excellent concordance. Next slide, please. This is, quickly, an overview of the eMERGE Variant Calling Pipeline that has been implemented at the Coordinating Center; they've used GATK. You can see the different filters that are used, and then there's two variant calling runs at two different time points: multi-sample calling is run on the batch sent from the Sequencing Center for each site independently, and then quarterly, they're running a multi-sample calling run on the entire batch. Next slide. So, here is the multi-sample calling by site at the CC compared to the single-sample calling by the Sequencing Center. And as you can see, we have very good data to start with, and it gets even better with multi-center calling. Next slide, please. And similarly here is the multi-sample calling of the entire eMERGE set with single-sample calling by site. Next slide, please. Another type of QC analysis that we're doing is that we're comparing the research -- and this is generally, of course, the sequencing results from the PGRNSeq -- and the clinical pharmacogenetic results, which are generally genotyping on orthogonal platforms. The idea of this was to evaluate the PGRNSeq research platforms. It's complicated by different report formats. The PGRNSeq we get data in VCF formats; the VCF is not a file format meant to be read by humans. It can be easily manipulated to get data out that's easier to understand. And then, typically, the clinical-grade results are often coming in the form of star alleles, particularly for the CYP genes. So there you have to take the star allele, translate it to a haplotype, then translate it to individual genotypes, and even then when you go to compare genotypes from that to those pulled out of VCF files, you can have strand orientation issues. So -- Male Speaker: You're right at about 11 minutes now; we'd like you to wrap it up. Laura Rasmussen-Torvik: Okay, so standardization of reports would benefit the wider community, and it's also forcing sites to develop policies about non-concordant research results with clinical genotyping. We know our research results are really good -- obviously, we have to report the clinical results, but it's an interesting situation. Next slide, please. And just my final points. There has been a lot of development of systems to integrate genotypes as computed results; I'll let the EHRI group give you the details. But as someone mentioned earlier, typically genotype results are imported as PDFs or mentioned in the notes; they're not computed results. So how do we integrate these results and also document clinical interpretation as part of these systems? I think that the documentation of clinical interpretation can be complicated with the computed results, and this is particularly complicated when you're receiving results from multiple outside laboratories. And finally, what do we do if this interpretation changes? The CPIC guidelines are fabulous, but particularly for some of the rare star alleles for some of the CYP genes, the interpretation is changing, and how do we handle that in the Electronic Health Record and how do we document it? Next slide. So, this is just a summary of everything I've talked about today, and I will hand it over to my other panel members.