Emr And Genomic Discovery - Beyond gwas, pharmcogenomics - Summary - Steve leeder

Male Speaker: Okay. Well, Steve, if you're there, we'll segue immediately to you. Steve Leeder: I am here. Can everybody hear me? Male Speaker: Yes, we can. Steve Leeder: Good. In retrospect, I guess I should've -- can we go to the next slide, please? In retrospect, if I had used the smaller font like Howard did, we could get these all on one slide and look at them at the same time, but they're broken up into two slides. On this first one, I think you should get the sense by now from both Marilyn and Debbie's presentations that the consensus of our group was that discovery research should remain a high priority for the future. And working with the phenotyping group, one step is obviously to decide what traits or phenotypes are sort of high priority for the next phase of the network. And then the topic for discussion, and some of this has been occurring in the sort of online chat, is, you know, whether or not just go with existing data and work with improving the analytical tools and methodology to use existing data, or whether there should be some effort to into [inaudible] data generation by next-gen sequencing or exon arrays. As sort of a non-genomic person pursuing this more from a pharmaco perspective, I guess we could also add in that the existing data could also include the longitudinal phenotypes that several people have alluded to, and using this in the context of, you know, the [inaudible] contribution to disease progression. This also lends [inaudible] gene-environment interactions as well. Also the impact of therapeutic intervention on the trajectory of progression. The second point that came out of our discussions was this whole issue of not throwing the baby out with the bath water, and looking at the importance of rare variants. And again, up for discussion would be the most appropriate platforms, whether it be a genotyping or a sequencing platform to capture them, but also resources that may need to be put into developing appropriate tools to detect their effects. Can I have the next slide? So, this is -- this next point gets at one of Debbie's last points, and that is considering study designs other than a straight GWAS-type format for discovery purposes. And, for example, the example that she gave looking at discordant -- extreme discordant phenotypes, at least for continuous variables, and coupling them with your platform of interest, and I put a whole genome sequencing here. And, you know, the potential for this particular approach to be a little bit more efficient in identifying causal variants, and especially rare causal variants. And then the last point that our group would propose to the larger group at whole would be, again, something that was mentioned by both Marilyn and Debbie, and that is looking at other sources of genomic material, RNA, or going back into the DNA and looking at methylation, for example, for these genomic analyses. And then on the EMR side of things, looking to see how additional data can be captured or parsed to look at environmental factors and co-morbidities or gene-environment interactions, for example. So those are the four issues that were raised by our group as being something to pose to the rest of the group for their thoughts and comments. And I'll toss it back to the chair. Male Speaker: Okay. The floor is open for comments or questions on EMR and genomic discovery. Marc Williams: Marc Williams here. A thought occurred to me as Debbie was talking, again trying to bridge the tension that we have between discovery and implementation. This was in the context of the rare variants. I think one of the issues that we're all going to be dealing with as we receive secondary findings from our genomes, exomes, and high-density ChIPs about, you know, that we're thinking about clinically returning, is the lack of information that we have on the impact clinically of some of these rare variants, even in genes that we know quite well. One of the things that we'll be doing is to try and use our traditional methods of contextualizing that data using family history and other sorts of things to understand what's the potential impact. To me, that seems to lend itself to the idea that if we did a rare variant focus, we could study how we could use electronic health record mining to try and contextualize rare variant information to add additional information for clinical return and implementation. So that could be a potential study topic for eMERGE III that would bridge, again, this discovery and implementation chasm. Dan Roden: This is Dan Roden. I have to say Roden now because there's another Dan on the phone. I agree with Marc, but at a practical level, I think you have to make some attempt to limit the minor allele frequencies down to which you're willing to go. If you find a rare variant that is one in million or one in a hundred thousand, it's going to be very, very tough unless you know something about the biology to assign any kind of phenotype to that. And so I think the sweet spot for us is probably minor allele frequencies around 0.1 percent. Plus, variants in disease genes that are, you know, have been implicated, and as Zach [spelled phonetically] said a couple of hours ago, you find that variants that have been implicated as causes of hypertrophic cardiomyopathy or channelopathies are actually much more common than you give them credit when you start to look across very large populations, and we're finding that along with everyone else. So I think that one thought is exactly which rare variants we'll want to focus on. And I think that the variant of uncertain significance is 1 in 10,000 or 1 in 1,000, something that eMERGE is really, really well-suited to attack. Female Speaker: So, you know, for capturing sequence data, it seems to me that we should report all variants, even if we only get 1 in 100,000 people, and put it in a database because other people are going be putting that data forward and annotating those variants. Even if we can't determine ourselves alone if they're pathogenic, it's going to be really important going forward. Dan Roden: So, Gail [spelled phonetically], I totally agree with that, and that's what we're going to be doing in eMERGE PGx. And as data accumulate worldwide, you can start to make some sense of that. But I think over the next five years, it's a one in a million variant; unless there's some biology around it, it's going to be hard to make sense of. But, yeah, I totally agree that we have to figure out a way archiving this worldwide. Male Speaker: So this is Halcom [spelled phonetically] here. So, as you know, the new platform from Illumina on the X10, which is currently tailored towards whole genome, I mean, it's very likely going to be adapted to exome, even though that will probably take some time. But an exome could probably be sequenced for about $100, sort of, say, a year, year and a half from now. So in the interim, a strategy to sort of customize a ChIP with this rare variant content, particular content that I was sort of -- with potential or punitive damaging impact, the loss of [unintelligible] and so forth, and that can actually be tied now extremely cost efficiently across, you know, thousands and thousands of samples for relatively sort of low amount of money, even though it's going to cost some money. So, in the interim, that would potentially be a very, very powerful strategy across the sides because that would open up the rare variant content for all phenotypes that we have, and we don't have that today. John Harley: This is John Harley [spelled phonetically], Cincinnati. I just asked the question that when we concentrate on rare variants and we don't have all of our samples genotyped, we rely on imputation. And as the frequency of the variants drops, the accuracy of the imputation is disastrous. And so how we -- you know, we aren't able to take advantage of our huge numbers because the area reduced by imputation is so big. If there is anyone who has a solution to this problem? Female Speaker: You need your sequence. Male Speaker: Right. Debbie's right. Rex Chisholm: So, this is Rex. I'd like to weigh in. I would really like to endorse the idea of thinking about environmental factors. We've played a little bit around with GIS tools, and I think one of the things we could do very well, which isn't done very well in most cases, is given the longitudinal nature of the people that we're following, is to think about some of these environmental factors. I think there's going to be increased opportunities to capture some of these bits of data. Marilyn talked a little bit about Environmental Protection Agency measurements that are being made. So I think to be able to start to tackle gene-environment interactions using GIS approaches in some of these environmental measures is also something that would uniquely be possible in an eMERGE III for us to take a look at. Chris Chute: Well, this is Chris, and while I find that idea elegant, I want to make sure we're somewhat cautious and thoughtful about this. For some populations, and your Chicago population, Rex, might be a superb candidate for this. For other populations, they're not always, as we say, population based, and hence the density of sample cases in any environmental geocode, you run into power problems very quickly with environmental association, particularly when you're treating it as a co-variant in the substrata. Marylyn Ritchie: Chris, this is Marylyn Ritchie. One of the other things that I think folks could think about, and this is something that Marshfield has done in eMERGE-II, and that is to use the Phoenix toolkit as the mechanism to collect environmental data. We were awarded a supplement as part of the Phoenix Rising program by NHGRI, and so some of the Phoenix toolkit measures were sent out to the eMERGE participants. And we've actually started mining that data and we're finding really interesting gene-environment results for type II diabetes and some for cataracts. And we only implemented a few of the Phoenix toolkit measures. That's something that other sites could do either electronically or paper form. You know, it's something that you could port to an iPad that people could do in clinics. You could put it on the Web that people could do through their myhealth@Geisinger, or Vanderbilt, or what have you. And that's another way that, even without relying on population-based environmental data, you could collect it on the participants in the biobank. Chris Chute: Well, I certainly agree that would be hugely more efficient and wouldn't suffer the, you know, broad association problem that you have with geocoding. And I actually think the Phoenix toolkits would be the appropriate choice for collecting that type, so I agree with that, Marylyn. That's good feedback. Male Speaker: That might be another agenda item to put on the discussion with large health system providers when discussing it with the vendors of health records because the patient portal is going to become a part of the mandated electronic medical records eventually, and as they're building them, it would be nice to have patients uploading various lifestyle things that can be merged with their electronic medical record. Female Speaker: This is Carrie [spelled phonetically] -- Female Speaker: One question, are there other things that the coordinating centers should be working on in the future? I mean they've done an awful lot of work with data cleaning and then the imputation, but are there other things that would make the dataset more effective for other analyses that would be a good focus for eMERGE-III? Male Speaker: So one focus there -- this is Halcom -- is on the copy number variation analysis because that's another whole dimension that, you know, focusing there from the rare variants' standpoint because most of the data is typed on Illumina, that can open up a very fruitful sort of discovery focus across all of the phenotypes, again, from a data-mining standpoint. And algorithms can -- you know, we have algorithms that can be applied on these data at the individual sites or jointly, and then the whole thing just sort of [unintelligible] together. Teri Manolio: This is Teri. I did want to ask about the issue of sequencing. When we've approached, sort of, large-scale sequencers, the question they often ask is, "Well, how many cases of a given disease do you have?" Because they're very interested in looking at, you know, thousands or tens of thousands of cases of disease X. And that has not been something that eMERGE has really focused on because we're sort of phenomic, as it were. So, how do we address that question, other than say, gee, we've got so many wonderful phenotypes just as good or better? Gail, do you -- Female Speaker: I think we don't need to have a disease focus. I'd be really excited to sequence the 56 ACMG gene. We know what those genes do, but we don't know what the variants in those genes do, and we could look both for variants, annotation, pathogenic, and informally [spelled phonetically] not pathogenic. Everybody's done that sequence variants. [inaudible] [laughter] And then we could also look for pleiotropic effects of those same genes. So there's a discovery possibility there too, and then there's lots of implementation questions, you know, [unintelligible] my health system is very concerned about those 56 genes now because of the ACMG recommendations. So how do you implement that, how do you get the business support, how do you educate providers, what do patients want to know, what do they need to see? You know, I think that really hit all the things that we can do really well, and having those phenotypes that we have so -- in such depth gives us a unique resource for that kind of annotation. And I think even at the pediatric sites, there is really important work, you know, 49, I think that was a 56, have pediatric phenotypes. Plus, the pediatric sites really could look into this idea of mandatory return of adult onset findings to children, which has been a hugely controversial recommendation. They could release -- ask [spelled phonetically] their families. What do they want? Ask their providers what they want. I think that that is a space where there is a lot of controversy, a lot of interest of the health system, and we have a really unique capability, and I don't think [unintelligible]. I would add a couple more, by the way, but... [laughter] You know, I mean, I think that's doable. Male Speaker: I agree with Gail. I think that there -- this is Irving [spelled phonetically] -- that one could add to such a panel, I think, which would be extremely meaningful and something than can be done uniquely in eMERGE, things like a list of the highly penetrant forms of -- highly-penetrant monogenic forms of diabetes and others, I think which, you know, it would be very helpful to understand among sort of common complex diseases what forms are diagnosable on a molecular level, and to what extent is that -- how frequent that is. Teri Manolio: So I'd be interested in Steve Leeder and Debbie Nickerson's comments on that. Let me just ask, both of you had pointed toward non-coding variation, and here we're really talking about focusing on genes, even though there are some non-coding regions obviously in the intron. So, Steve or Debbie, any thoughts? Debbie Nickerson: I think it's great to look broadly at genes, but I think that different platforms have different outcomes in terms of what you'd look at. I mean, many people are sequencing whole genome, but they end up looking at only the coding and that few percent that's well annotated by ENCODE as being highly functional. But I, you know, I think, broadly, whole genome is an important route to go because you can look at variants that are difficult to look at, like indels and CMVs, by just sheer capture. Teri Manolio: Thanks. Steve, what do you think? Steve Leeder: Well, for me, non-coding for me really points more to regulatory regions as being of interest. But you have to understand, I'm coming at this from a pediatric perspective as well, in that when we were looking at things in kids, there's so much change that is going on between birth and sort of adulthood that, you know, you have to look somewhere besides the coding region of the gene for what's changing as kids grow and develop. And, you know, to some extent we know very little how this really works in senescing adults as well as we move towards a geriatric population. So, for me, the non-coding stuff really -- I'm really thinking about important regulatory regions and being able to identify those and characterize them. *** Weinshilboum: Steve, this is *** Weinshilboum. In all of our studies of variation in cancer drug response, the majority of the hits that are functionally important regulate transcription, they're a non-coding region. Murray Brilliant: So, this is Murray Brilliant. I think that it's clear that, from an economic standpoint, we can't do a whole genome sequencing of 50,000 people. But we could look at a smaller number of genes, and since these genes have been, you know, implicated in human disease, they're recordable, they're actionable. We can look both at exomes and introns. I think that if we focused on the subset of genes, you know, it would be a parody for looking at the whole genome. I mean, it would be scalable. And I think that, you know, the 50 some genes here, maybe, you know, all of us have some other favorite genes. If we had 100 genes, it's kind of catchy. Instead of 1,000 genomes, we have 100 genes that are looked at across a large number of people. And again, this would be as a parody, what are we going to do when we have a large number of whole genome sequences. This really cuts across all of that. Eric Green: I would point out that's very reminiscent of the decision early on with the ENCODE project to tackle 1 percent. I mean, I don't know if I like this idea or not. That's a separate issue. Male Speaker: Sure, sure. Eric Green: But it is reminiscent that the same rationale went into, like how are we ever going to interpret the whole human genome. So there was a whole process to pick the 1 percent, which was complicated, but we got there. Everybody stayed with 1 percent until you felt comfortable enough to scale it to the whole genome. So this would be similar circumstance, however many genes you pick. Female Speaker: I think we could also exercise diversity here, so you have a specific set of genes that there's a particular interest in reporting results back, but also [unintelligible] data from minority populations for pathogenic -- Male Speaker: [inaudible] Female Speaker: I know. I think we could do really well in that avenue. Debbie Nickerson: So, is the idea to do non-coding as well, right, of those genes? Because -- Female Speaker: I think if you could find that, yes. So if you know the regulatory regions of them. Male Speaker: [inaudible] get somebody to clarify. What, would you take a gene and you would just go end to end, maybe x number of bases upstream and downstream and just do the whole segment? As opposed to know -- I mean, what Teri was implying was sort of known functional non-coding regions. Debbie Nickerson: Well, I'm just trying to stimulate the conversation maybe toward exome versus targeted panel. So, what's the difference there? If you could get exome -- Female Speaker: What's the difference? Eric Green: These are actually asking very different questions. If you're only going to go exomes, you're going to make the assumption that that's what you're going to find. I thought the idea was that you have these sets of genes that are of interest [inaudible] would be non-coding, so you want to get a complete inventory deep and lots of people. Debbie Nickerson: See, that's what I wanted you to say Eric. [laughter] Eric Green: I was trying to rearticulate what I thought I heard. So with that -- what I also heard was a variance approach which was you take x number of genes and you take all of the exons, but then you take -- Teri Manolio: Well, or any intron -- Eric Green: Any intron -- Female Speaker: -- any regulatory regions you know of. Eric Green: Any introns, or -- Teri Manolio: I mean, there are some that have regulatory regions that are identified in other chromosomes, you know, and so maybe look at those as they become added in. But, you know, it takes two years, Debbie, or more to develop a targeted platform like this? Debbie Nickerson: No, I think it's much easier now than it was. Teri Manolio: Oh, okay. Debbie Nickerson: I don't think it's -- I think we have a lot more experience. And I think that the PGRN has great data with the PGx. They can look at these questions. Female Speaker: Debbie, what do you think of [unintelligible] versus Seq next-gen? Debbie Nickerson: You know, I think it's a matter of cost and ease of implementation. I think some can be cheap, but whether they're broadly applicable to many genes is not known. Male Speaker: If you put this into an RFA, I would suggest that we could be agnostic as to the technique and let the people that are putting in proposals discuss how they would do it, because, as was pointed out, there are a number of us that are going to be generating large numbers of exomes and genomes, and so that would also allow them for a methodologic comparison of about what's the best way to actually do it. Male Speaker: Okay, and we're going to need to wrap up the discussion. Male Speaker: Just a very quick point, that aren't these commercial entities that are trying to make these panels of 2,000 or 3,000 genes? So that if you have a patient with Marfans or if you have a patient with hypertrophic, you just order that set, and then you can just pick and choose and analyze. Because what's happening is that they're realizing there are many variants that may cause, for example, an aneurysm. And so I end up ordering a panel of 15 candidate genes, which is like $5,000. And I may still not get the information because they may just do certain variants. So I think, you know, that's another, it's not whole exome, but it's like what are the thousand, or hundred, or 1,500 genes that are most often used in the clinical setting, and perhaps go with those. And also, it would go back to Debbie's point that if you're using some of these in the clinical setting, you would have the familiar structure to interpret much more efficiently. Male Speaker: Okay, so I'd like to thank all of the participants for -- actually all of our panel this morning for a very rich and thoughtful discussion for future directions in eMERGE. And we're going to -- I guess we're down to about a 20-minute break for lunch -- Eric Green: Careful because you're not going to get all of these people down to the cafeteria and through the cafeteria and back up here in 20 minutes. Male Speaker: We'll do our best. So we'll plan to start somewhere around half past the hour. Male Speaker: These folks should get their -- Male Speaker: Yes, everybody out there on the call should go run for lunch and bring it back. [laughter]