Hgp10 Symposium - Engineering a healthcare system to deliver genomic medicine - Dan roden

Dan Roden: I changed the title to Engineering Healthcare Systems because I -- I'll tell you a story about the way we're doing it, but the way we're doing it is influencing the way other people are doing it, in part because of NHGRI's support. I will have to start by thanking the organizers for inviting me to be the mouthpiece for this electronic medical records kind of effort. So this is the map that you've seen before, and if you haven't, I don't know what rock you've been hiding under. [laughter] We are -- I'm going to talk about this space right here, genomic predictors of disease susceptibility and drug response, and then I'm going to talk a little bit about engaging the electronic medical record for discovery purposes, not for patient care purposes. And then I'll talk a little bit about engaging the electronic record for delivering genomic medicine. So there are many definitions of genomic medicine. I'm part of a working group at NHGRI that spent an inordinate amount of time working on these words. And these are the words, for better or worse, and when I argue with Eric about changing those words, I get told, "The words are done. Let's work on something else." [laughter] So -- and I think that this is a very reasonable definition. I don't want to say that it's not. I will say that genomic medicine is part of a greater vision of what I think -- what people are calling precision medicine or personalized medicine. I like personalized medicine better than precision, but that's a debate we can have later. I think -- because precision may overpromise, and personalized means you're taking care of a single patient. I'll come back to that theme later. Any clinician who wants to give a talk has to quote or at least acknowledge William Osler. I'm a Canadian, and William Osler was a Canadian, so I have to acknowledge him twice. And so, he'll say that "The good physician treats the disease; the great physician treats the patient who has the disease." So it is an important part of what we do, and it has been an important part of medicine all the way along. The addition of genomic information just makes it that much more complicated, but that much more personalized. So here is the -- here's one view of the vision published in the New Yorker in 2000, when that famous press conference happened, and this woman is handing her sequence to a pharmacist, not to a physician, but to a pharmacist, so it emphasizes sort of that this is going to be team sport. She also has it written down on a piece of paper, so almost certainly that's not the way it's going to work. And he's pretty confused, and that is certainly the way it's going to work. [laughter] I was of two minds whether to show this other part. When Francis Collins was appointed director of NIH, he was asked about a lot of things, but he was asked about pharmacogenomics, and this is what he had to say in one paragraph, because pharmacogenomics is the easy stuff. You can read it, but I'll just say that -- there must be a pointer here somewhere -- I'll say that if everyone's DNA sequence is already in their record, it's simply a click of the mouse to find out all the information you need. It's going to be a lower barrier, and then wonderful things will happen, and it will improve outcomes and reverse adverse events. And obviously, those of us who play in this space really buy into idea, but I will say this, and I've said it before, I'll say it again, that I disagree with that particular word because it's anything but simple, as those of us who are trying to do it have discovered. So I think the way that this is going to happen is that some institutions, and then more and more institutions, will buy into this idea. And how are they going to implement? I think you have to implement by having excellence in basic science. I'm going to come back to that over and over again. This is not unidirectional. The translation and the implementation feed discover science. There's a commitment to information technology which cannot be overemphasized. We're very fortunate at my place to have a department of biomedical informatics that has 75 faculty members, which is pretty big. And then you put your health care system to work for discovery, and I'll talk a little bit about that. And then once you've discovered something that you think is actionable, then you can start to put it to work for patient care. And the point is that this is an iterative process, and so it goes back and forth. So what we've done for discovery, in a nutshell, is we've created a bio bank. I could talk about the bio bank forever, but as of yesterday, there were samples for 163,941 patients in the bio bank. Those are DNA samples, and it's a pretty large number, and they're coupled to electronic medical records. What can you -- why do you need it so big, and what can you do with it so big? I probably don't have to emphasize that to this audience, but I thought I would just walk you through a question that I was asked by one of our new faculty members. He said, "Do you have any patients who have vitamin D levels? Do you have any patients who have vitamin D levels and GWAS data as well?" So one of the rules of the game is once you do a GWAS on any one of these samples, it comes back to the resource. So I went to a new resource that I'm proud to show off, and I would love to do it in real time, where we just go to a website, and this is what the web-based interface looks like. I type in vitamin D, and I drag and drop, and it asks me what kind of vitamin D level do you want? And I say any lab value, and it turns out that in our electronic medical record, there are 13,847 people who have vitamin D levels. And there is a reason that this number and this number don't agree with each other. It's not just that we add it up differently. There's a great reason, somebody can ask me that afterwards. Those of them in the bio bank, that's a subset of the entire electronic record, is 5,497. Then I can ask how many people have had GWAS genotyping? Right now it's more like 20,000, but there are only 10,000 in this particular interface right now, and the intersection is 1,000 people. So that's a big set. And you can get the GWAS data on these people with vitamin D levels for free, essentially. So it's an enabling resource for discovery, but the point is you start with 163,000 to get to this set of 1,000. So if you start with 10,000, you're going to get down to a set of 20 or something like that, some relatively useless number. So I think you have to have big numbers. And what we're doing in BioVU is we're looking for genomic variants that are associated with all the things that you might think of. And then we're doing an inverse experiment called PheWAS, which I'll tell you more about in a second. So rather than dwelling on the triumphs of BioVU itself, I'll just say that we're part of the electronic medical records and genomics network that NHGRI funds. These are the nodes in the current iteration of the network. There are nine nodes, 10 centers, and there's a reason, again, that those -- that math doesn't add up perfectly. And what we do is we define phenotypes within the electronic medical record, and then identify cases and controls for -- to identify genomic variants that drive those phenotypes. One of the things that we have learned over the last five or six years of doing eMERGE-I and eMERGE-II, is that writing the phenotypes and validating the phenotypes to find cases and controls is pretty challenging. We've gotten pretty good at it, I think. We're certainly very good at finding diseases. We're not so good -- when I say "we," I don't mean me, I mean the informatics guys who work on this, I'm just a mouthpiece -- but we've gotten pretty good at that. The next challenge is finding people who have a disease, followed by drug exposure, followed by drug response phenotype, and that's a little more challenging, and we're working on that part, as well. All the phenotypes are publicly posted into something called PheKB, or pharmaco -- the phenome knowledge base, and this is what a webshot looks like; basically, all the phenotypes are listed there, who's done then, how validated they are. And what's interesting is what kind of elements go into them, what kind of codes, what kind of natural language processing, medications. There are all kinds of different ways that you can have of identifying and validating a phenotype. And we go through a lot of hand curation to make sure the phenotypes work right. So those are there for anybody who wants to play in the electronic medical records space. I am an electrophysiologist, cardiac electrophysiologist when I'm not doing this for a living. So, I thought I would show you one eMERGE project that came out of an electrophysiology idea. And that is we were looking -- interested in variability in the QRS complex in the electrocardiogram. That's this little ditzel here that tells you how fast conduction is in the heart. There's reasons that we think we should look at that. So the first thing we did was we developed algorithms to find patients who had a normal electrocardiogram, no heart disease, normal electrolytes, no confounding drugs, really, really normal people, and deployed it in the entire electronic record, not just the subset with DNA, and found 30,000 people. Andrea Ramirez, who is in this audience and who's now working on this campus, directed that effort at our place. And this is what the distribution of the QRS complexes look like. So these are entirely normal individuals, and we're interested in why people are up at this end versus down at this end. So we did our genome-wide association study supported by eMERGE-I, and got no signal. Then deployed that algorithm across the eMERGE network, got lots more cases, lots more control -- lots more cases because this is only a case-only study, and this is what the Manhattan plot looks like, and this is a signal in actually a pretty good candidate gene, anyway. So this is the cardiac sodium channel locus that -- this is the cardiac sodium channel; this a different sodium channel that people have gotten interested in because of this kind of work. And that controls conduction in the heart. So it's all very well and good. So we did -- we then did another experiment to sort of validate this result and what we did was something called a -- we've called a PHEnome-wide association. So GWAS, you take a phenotype, and you say yes or no, if it's a discrete thing to type. And you'll look across 500,000, or a million, or 14 million SNPs, and do a test of association at each locus. What we did is we said, "Let's take 13,000 people who have been genotyped at this particular SNP, and say wild-type or variants -- sorry, reference or variant" -- I'm not supposed say wild-type -- reference or variant, and do a test of association with every single diagnosis that we have in the electronic medical record. There are about 1,000, and we recognize this overlaps across those different phenotypes, but we have ways of making this more and more sophisticated. And this is what the Manhattan plot looks like for this particular SNP that happened to be the top one on the Manhattan plot that I just showed you. And what's interesting is, these two dots here are arrhythmias, arrhythmia diagnoses. So you sort of say, well, he's an arrhythmia guy, and he started with an arrhythmia question, so that's not a big deal. So, at the very least, we rescue the signal that we started with, but remember, we started with people who had normal electrocardiograms. We didn't start with people who had arrhythmias. They get arrhythmias later because we have the electronic record that follows them for years and years and years. So what this says is that when you start out at one end of that distribution, you're more likely to get an arrhythmia, and here's a genomic predictor of that. And then we were asked by the reviewers to look at it over time, and there is a gene dose effect over time with a development of atrial fibrillation. And this is in a gene called SCN10A. SCN10A is -- was originally cloned from dorsal root ganglion, and so the way it affects heart conduction has been pretty controversial. One of the other hats that I wear is we study those kinds of problems in my mouse and fish lab. And so we actually looked at what happens to wild-type myocytes. These are action potentials for most myocytes at baseline, and then when you put in a tiny, tiny, tiny concentration of a sodium channel opening toxin called ATX, and that's what happens in wild-type mice. We have generated sodium -- SCN10A knock-out mice, and we actually don't see that arrhythmia genic effect, and, in fact, that's reproduced. So I throw this in just to make sure that people know that I still think about this kind of stuff every so often, and also to make the point that there is this loop that has to be closed. Everybody has said, we have -- now have 3,000 more signals, or 3,000 more loci to look at then we did 10 years ago, and we better start to look at them because maybe this is a drug target, for example. So we've deployed the PheWAS algorithm across the entire GWAS catalog supported by NHGRI, so that's about 1,300 different tests of association. Some of them are with phenotypes that are not well-captured in the electronic record, like do you -- does your urine smell after you eat asparagus? Are you bald? Those are things that the electronic record doesn't capture very well, so we don't pay attention to those in our validation studies. So here's an example of a highly-pleiotropic SNP. This is a SNP in IRF4 that determines skin color, but when you do the PheWAS, you get tremendously significant signals for various kinds of skin cancer as well as actinic keratosis. And it turns out that the SNPs that are highly validated and highly replicated in the GWAS catalog, replicate this way as well, and we have about 70 new associations using this approach to discover pleiotropy. This is what eMERGE-II looks like. The number -- don't take the numbers excessively seriously because, for example, this one says 27,000. It's probably like I said, more like 20,000, but we count ImmunoChIP and MetaboChIP in this as well. So there's 300,000 -- 330,000 or so people. In eMERGE-II, there are about 75,000 with dense genotypic data that we're actually putting together in a very large set. So this highlights, for me, the paradox of personalized medicine. As a clinician, I've one patient in front of me in the office, but what I need to do is be able to treat them differently from the average, and in order to do that, I have to have a very large dataset to convince me that that different treatment is, in fact, justified. So that's the discovery piece. Then the implementation piece. So we've been hearing all day about pharmacogenetics. That's the easy stuff. And that's probably the first thing that's going to be implemented. I've been doing pharmacogenomics and pharmacogenetics my entire career, and we're part of the Pharmacogenomics Research Network, another effort funded by NIH. Eric already alluded to the fact that there are now many, many drugs that have labels that include pharmacogenetic information. This is only for germ-line. The other half of the drugs that have labels are for the tumor germ-line -- or the tumor genome. So, everybody says it's easy, and I like to show this. It's -- it is low hanging fruit, but it's not so simple. [laughter] I'll just say that. So we were tasked by our leadership to come up with a way of starting to deliver pharmacogenomic information in a preemptive way into the electronic medical record in the fourth quarter of 2009. We were given a year to plan. And so this is what the planning looks like, and I'm not going to walk you through any of this. If anything, you should just read down these things to understand that there are multiple communities that you have to engage and excite in order to execute a project like this. This is what we call our PREDICT Project, and this is what PREDICT stands for. So, the notion is, in brief, find patients who are at high risk for getting a drug with one of those actionable pharmacogenetic stories, one of those 58 drugs. And then you genotype them, not on the drug you think they're going to get, but on a bunch of different -- on a multiplex platform that assays many different pharmacogenomic variants. And then you do what I call the easy stuff. You store the genomics, track the outcomes, provide informatics support to clinicians who are prescribing the drug at the appropriate time. So who is at high risk? Well, one group of people who are at high risk are people who populate our internal medicine clinics. We did a study on about 50,000 people in the electronic medical record asking the question, over the course of five years, how many of them are exposed to one or more of those drugs that have FDA labels. The answer was a bit of a surprise. There's 65 percent of them that get at least one drug from that list over the course of five years, and 15 percent that get -- sorry, 15 percent get four or more drugs. So that's one group of high risk people, and we are actually including them in the PREDICT Project. The other high-risk group of people are people we can look at, and say, within the next week or two, you're going to get Drug X, and one of the -- the best example of that is that people going to the cath lab at Vanderbilt, we do about 4,000 catheterizations a year; about 1,800 of those patients end up on clopidogrel. And as we were planning this project, the FDA did us a favor, relabeled clopidogrel to include this statement, "Consider alternative treatment or treatment strategies in patients identified as CYP2C19 poor metabolizers." CYP2C19 is the enzyme that bioactivates the pro drug clopidogrel into its biologically active metabolites. And I have to say here that it was Grant Wilkinson, a faculty colleague of mine, who, in the mid-1980s, discovered the fact that CYP2C19 was polymorphic. He was not studying clopidogrel, and I remember me and many other people gave him an incredibly hard time because he was studying this incredibly obscure drug and probably a pointless line of inquiry. And, in fact, it turned it he was right, that it was an important thing to study because now it's the centerpiece of much of what goes on in pharmacogenomic implementation. The other thing we did was we took our BioVU specimens, found a group of people who had gotten the stent after an acute coronary event, looked at 30-day outcomes, and found 200 people with complications and 400 or 500 controls, and replicated the known signal for CYP2C19 and its variant in terms of imposing risk. So over the course of the last two and a half years, we've now studied about 12,521 patients in PREDICT. Bruce Korf, on the video that you just saw, described a program where you might genotype people, and then deploy the information when it became apparent. And this is -- that's exactly what we're doing here. There are 334 homozygotes for CYP2C19*2, and 2,369 heterozygotes. And what's interesting is that most people don't have a common variant. We don't actually know how many people have a rare variant. We know that they don't have a common variant. And just to show you that -- give you a sense of the fact that this is actually, although it's complicated, it's even more complicated than you think. It's not just *2 that are the hypo metabolizers, there's *3 and *4 and you could be *3/*4. There's also a *17 that nobody knows what to do with. And so the heterozygotes and homozygotes you saw on the pie chart are actually multiple genotypes, and how to translate from a genotype to a diplotype to a predicted phenotype is one of the challenges in area. When a patient who has this information in their electronic medical record has an electronic prescription written for clopidogrel, and are a poor metabolizer, this is the point-of-care decision support that pops up that suggests two alternative drugs, and we track how many times physicians look at this, how many times they change their minds, and we're just learning about responses to this kind of program. And I thought I should show this picture because, again, it's about personalizing medicine. This is one of our interventional cardiologists, and when we started the program in September 2010, we were really eager to find the first *2/*2 patient, and this is she. So she is being taken care of by him, and he knows a little bit more about her now. And that's personalizing medicine. But he's taken care of her for a long time so he knows a lot about her and her attitudes and her other diseases and her other medications, so that's what's important. And there's another quote from William Osler, which, in the interest of time, I think I'll let you read, and now you've read it or not. [laughter] So we've now deployed five drug-gene pairs, clopidogrel, simvastatin, Warfarin, thiopurines, and Tacrolimus, and those are displayed on the electronic medical record. This is a screenshot of what our electronic medical record looks like. I blacked out all the identifiers, except for this one here because I want to make sure people know that I still see patients. And so these are the genetic variants that belong to this particular patient. These are a partial list of his medications that actually go down here. And what's interesting to me is that he's been on Warfarin for a long time, so we actually didn't use the Warfarin information to tailor his dose, but he is a poor -- he has a loss-of-function allele in CYP2C9, and he does take a remarkably low dose of Warfarin, only 3 milligrams a day, so that's probably the explanation, and had we known that at the beginning, we would have started him on 3 milligrams a day. We also have to say, use this kind of technology to display variants in tumors in an effort very, very, similar to, but probably smaller than the one you just heard from Dr. Garraway in our personalized cancer initiative. So -- and that was a BRAF mutation. The other thing, after you've deployed five drug-gene pairs, you can start to ask the question, how many people have variants in one or more of these pathways. And what's interesting is that the number that don't have anything is now getting smaller. And, of course, every time we deploy a new drug-gene pair, the number has to get smaller. And one way to think, I mean, it's obvious, and it's almost trivial to sort of highlight it, except to point out that as -- if you're going to do multiplex testing, what you find out is that everybody in this audience is abnormal for something. And we all, if you live in genome-land, like we all do, we all recognize that. But these are real data that speak to that. And so when I speak to lay audiences, when I speak to non-genomics audiences, it comes as a surprise to them sometimes that we're all abnormal for something, you just don't know what it is. So this multiplexed approach, I think, is really the only way to go, and I think that many people in this room have thought that problem through and understand that. We also engage patients, so we have a website called My Health at Vanderbilt where you can go and look at pieces of your record. You can make appointments with your doctor. When you look at pieces of your record, you can look at genes that affect My Medicines, and then you can see your report for the drugs that we've targeted. Those are all sort of works in progress, and you can see we're sort of still figuring out exactly how to deliver that information. We steal from -- sorry, we adapt 23AndMe information to think about this because they do a reasonable job of explaining this. And we think that that's something that's going to very, very important to engage patients in all this. So we have Predict at Vanderbilt, and I'm part of the PGRN. We're also part of the eMERGE network, and so I was having a conversation with Teri Manolio, who directs the genomic medicine initiatives at NHGRI, and she said, "Well, why don't we take the PGRN's next-generation sequencing platform for all those important pharmacogenes that you guys are working on and stick them in eMERGE in a PREDICT kind of algorithm." So that's sort of -- that was an idea born, you know, at the end of a long day, and we actually are doing that right now. And that project is underway, so it takes advantage of the expertise and capabilities of two separate networks that I think ought to be closer aligned than they are, and I'm working on that. So I want to just summarize by highlighting in words what the lessons are. So, first of all, I think that we have not finished discovering. So those of us who focus on implementation are working on trying to deliver some of that to patients. But that doesn't mean that we know everything we need to know. We need to know a lot more, and you've heard that all day today. The low-hanging fruit of pharmacogenomics is much more complicated than we think, but I think learning the lessons that we're learning along the way in this space will make us smarter in terms of delivering genomic information in the course of health care more generally. Some of the problems I've highlighted, this business of rare variants is going to be a real problem in pharmacogenomics and everywhere else. And then there's the problem of ancestries, which we're only beginning to think about now. This is Team Science. It's interdisciplinary, and you have to engage lots and lots of people, not just get their grudging approval, but get their enthusiastic engagement. There are huge educational needs in every constituency that you can think about. The evidence changes, even in pharmacogenomics where you sort of say, well, CYP2C19 does this, and then a year later you think, well, maybe it does this in some other context. So you really have to be attuned to the fact that the advice you deliver today may not apply perfectly tomorrow. It goes without saying, but I have to say it, that an Illumina run might be 99 percent accurate. I have no idea what that number really is, but that's not good enough for a clinician because it has to be 100 percent accurate. And the reason is, I'll just say it, for those of you who are clinicians, you'll understand me. I walk out of the wards, and the nurse or the resident says to me, "This patient has renal failure. Their creatine is eight today." And I say, "Well, what does their electrocardiogram look like, how do they feel?" And they feel fine, and their electrocardiogram is normal. I say, "Well, that's a lab mistake." Any clinician would tell you that it must be a lab mistake. But if somebody says, "This person is a poor metabolizer," I have no context for that. So, I have to be sure that the data I get is correct. I think this is only going to happen in an electronic medical record environment. We're thinking about ways in which to deliver this kind of thing to people who have less advanced electronic medical record systems than we do, but I think that's going to happen. The only way this really happens is with institutional will. So I'll just close by talking about the teams. I've acknowledged these teams up here. These are the individuals at Vanderbilt. I can't walk my way through all of them, but there are geneticists, informaticists, ethicists, lab people, fellows, translational scientists, and these three guys down here, who are the institutional leadership. So thank you very much, again, for the opportunity to participate. [applause] Mark Guyer: So our next speaker, before he sits down, is David Botstein from Princeton. And David is going to tell us about the Fruits of the Genome Sequences for Society.