Highlight text to annotate itX
Dan Roden: I changed the title to Engineering Healthcare
Systems because I -- I'll tell you a story about the way we're doing it, but the way
we're doing it is influencing the way other people are doing it, in part because of NHGRI's
support. I will have to start by thanking the organizers for inviting me to be the mouthpiece
for this electronic medical records kind of effort.
So this is the map that you've seen before, and if you haven't, I don't know what rock
you've been hiding under.
We are -- I'm going to talk about this space right here, genomic predictors of disease
susceptibility and drug response, and then I'm going to talk a little bit about engaging
the electronic medical record for discovery purposes, not for patient care purposes. And
then I'll talk a little bit about engaging the electronic record for delivering genomic
So there are many definitions of genomic medicine. I'm part of a working group at NHGRI that
spent an inordinate amount of time working on these words. And these are the words, for
better or worse, and when I argue with Eric about changing those words, I get told, "The
words are done. Let's work on something else."
So -- and I think that this is a very reasonable definition. I don't want to say that it's
not. I will say that genomic medicine is part of a greater vision of what I think -- what
people are calling precision medicine or personalized medicine. I like personalized medicine better
than precision, but that's a debate we can have later. I think -- because precision may
overpromise, and personalized means you're taking care of a single patient. I'll come
back to that theme later. Any clinician who wants to give a talk has to quote or at least
acknowledge William Osler. I'm a Canadian, and William Osler was a Canadian, so I have
to acknowledge him twice. And so, he'll say that "The good physician treats the disease;
the great physician treats the patient who has the disease." So it is an important part
of what we do, and it has been an important part of medicine all the way along. The addition
of genomic information just makes it that much more complicated, but that much more
So here is the -- here's one view of the vision published in the New Yorker in 2000, when
that famous press conference happened, and this woman is handing her sequence to a pharmacist,
not to a physician, but to a pharmacist, so it emphasizes sort of that this is going to
be team sport. She also has it written down on a piece of paper, so almost certainly that's
not the way it's going to work. And he's pretty confused, and that is certainly the way it's
going to work.
I was of two minds whether to show this other part. When Francis Collins was appointed director
of NIH, he was asked about a lot of things, but he was asked about pharmacogenomics, and
this is what he had to say in one paragraph, because pharmacogenomics is the easy stuff.
You can read it, but I'll just say that -- there must be a pointer here somewhere -- I'll say
that if everyone's DNA sequence is already in their record, it's simply a click of the
mouse to find out all the information you need. It's going to be a lower barrier, and
then wonderful things will happen, and it will improve outcomes and reverse adverse
events. And obviously, those of us who play in this space really buy into idea, but I
will say this, and I've said it before, I'll say it again, that I disagree with that particular
word because it's anything but simple, as those of us who are trying to do it have discovered.
So I think the way that this is going to happen is that some institutions, and then more and
more institutions, will buy into this idea. And how are they going to implement? I think
you have to implement by having excellence in basic science. I'm going to come back to
that over and over again. This is not unidirectional. The translation and the implementation feed
discover science. There's a commitment to information technology which cannot be overemphasized.
We're very fortunate at my place to have a department of biomedical informatics that
has 75 faculty members, which is pretty big. And then you put your health care system to
work for discovery, and I'll talk a little bit about that. And then once you've discovered
something that you think is actionable, then you can start to put it to work for patient
care. And the point is that this is an iterative process, and so it goes back and forth.
So what we've done for discovery, in a nutshell, is we've created a bio bank. I could talk
about the bio bank forever, but as of yesterday, there were samples for 163,941 patients in
the bio bank. Those are DNA samples, and it's a pretty large number, and they're coupled
to electronic medical records. What can you -- why do you need it so big, and what can
you do with it so big? I probably don't have to emphasize that to this audience, but I
thought I would just walk you through a question that I was asked by one of our new faculty
members. He said, "Do you have any patients who have vitamin D levels? Do you have any
patients who have vitamin D levels and GWAS data as well?" So one of the rules of the
game is once you do a GWAS on any one of these samples, it comes back to the resource.
So I went to a new resource that I'm proud to show off, and I would love to do it in
real time, where we just go to a website, and this is what the web-based interface looks
like. I type in vitamin D, and I drag and drop, and it asks me what kind of vitamin
D level do you want? And I say any lab value, and it turns out that in our electronic medical
record, there are 13,847 people who have vitamin D levels. And there is a reason that this
number and this number don't agree with each other. It's not just that we add it up differently.
There's a great reason, somebody can ask me that afterwards.
Those of them in the bio bank, that's a subset of the entire electronic record, is 5,497.
Then I can ask how many people have had GWAS genotyping? Right now it's more like 20,000,
but there are only 10,000 in this particular interface right now, and the intersection
is 1,000 people. So that's a big set. And you can get the GWAS data on these people
with vitamin D levels for free, essentially. So it's an enabling resource for discovery,
but the point is you start with 163,000 to get to this set of 1,000. So if you start
with 10,000, you're going to get down to a set of 20 or something like that, some relatively
useless number. So I think you have to have big numbers. And what we're doing in BioVU
is we're looking for genomic variants that are associated with all the things that you
might think of. And then we're doing an inverse experiment called PheWAS, which I'll tell
you more about in a second.
So rather than dwelling on the triumphs of BioVU itself, I'll just say that we're part
of the electronic medical records and genomics network that NHGRI funds. These are the nodes
in the current iteration of the network. There are nine nodes, 10 centers, and there's a
reason, again, that those -- that math doesn't add up perfectly. And what we do is we define
phenotypes within the electronic medical record, and then identify cases and controls for -- to
identify genomic variants that drive those phenotypes.
One of the things that we have learned over the last five or six years of doing eMERGE-I
and eMERGE-II, is that writing the phenotypes and validating the phenotypes to find cases
and controls is pretty challenging. We've gotten pretty good at it, I think. We're certainly
very good at finding diseases. We're not so good -- when I say "we," I don't mean me,
I mean the informatics guys who work on this, I'm just a mouthpiece -- but we've gotten
pretty good at that. The next challenge is finding people who have a disease, followed
by drug exposure, followed by drug response phenotype, and that's a little more challenging,
and we're working on that part, as well.
All the phenotypes are publicly posted into something called PheKB, or pharmaco -- the
phenome knowledge base, and this is what a webshot looks like; basically, all the phenotypes
are listed there, who's done then, how validated they are. And what's interesting is what kind
of elements go into them, what kind of codes, what kind of natural language processing,
medications. There are all kinds of different ways that you can have of identifying and
validating a phenotype. And we go through a lot of hand curation to make sure the phenotypes
work right. So those are there for anybody who wants to play in the electronic medical
I am an electrophysiologist, cardiac electrophysiologist when I'm not doing this for a living. So,
I thought I would show you one eMERGE project that came out of an electrophysiology idea.
And that is we were looking -- interested in variability in the QRS complex in the electrocardiogram.
That's this little ditzel here that tells you how fast conduction is in the heart. There's
reasons that we think we should look at that. So the first thing we did was we developed
algorithms to find patients who had a normal electrocardiogram, no heart disease, normal
electrolytes, no confounding drugs, really, really normal people, and deployed it in the
entire electronic record, not just the subset with DNA, and found 30,000 people. Andrea
Ramirez, who is in this audience and who's now working on this campus, directed that
effort at our place. And this is what the distribution of the QRS complexes look like.
So these are entirely normal individuals, and we're interested in why people are up
at this end versus down at this end. So we did our genome-wide association study supported
by eMERGE-I, and got no signal. Then deployed that algorithm across the eMERGE network,
got lots more cases, lots more control -- lots more cases because this is only a case-only
study, and this is what the Manhattan plot looks like, and this is a signal in actually
a pretty good candidate gene, anyway. So this is the cardiac sodium channel locus that -- this
is the cardiac sodium channel; this a different sodium channel that people have gotten interested
in because of this kind of work. And that controls conduction in the heart. So it's
all very well and good.
So we did -- we then did another experiment to sort of validate this result and what we
did was something called a -- we've called a PHEnome-wide association. So GWAS, you take
a phenotype, and you say yes or no, if it's a discrete thing to type. And you'll look
across 500,000, or a million, or 14 million SNPs, and do a test of association at each
locus. What we did is we said, "Let's take 13,000 people who have been genotyped at this
particular SNP, and say wild-type or variants -- sorry, reference or variant" -- I'm not
supposed say wild-type -- reference or variant, and do a test of association with every single
diagnosis that we have in the electronic medical record. There are about 1,000, and we recognize
this overlaps across those different phenotypes, but we have ways of making this more and more
And this is what the Manhattan plot looks like for this particular SNP that happened
to be the top one on the Manhattan plot that I just showed you. And what's interesting
is, these two dots here are arrhythmias, arrhythmia diagnoses. So you sort of say, well, he's
an arrhythmia guy, and he started with an arrhythmia question, so that's not a big deal.
So, at the very least, we rescue the signal that we started with, but remember, we started
with people who had normal electrocardiograms. We didn't start with people who had arrhythmias.
They get arrhythmias later because we have the electronic record that follows them for
years and years and years.
So what this says is that when you start out at one end of that distribution, you're more
likely to get an arrhythmia, and here's a genomic predictor of that. And then we were
asked by the reviewers to look at it over time, and there is a gene dose effect over
time with a development of atrial fibrillation. And this is in a gene called SCN10A. SCN10A
is -- was originally cloned from dorsal root ganglion, and so the way it affects heart
conduction has been pretty controversial. One of the other hats that I wear is we study
those kinds of problems in my mouse and fish lab. And so we actually looked at what happens
to wild-type myocytes. These are action potentials for most myocytes at baseline, and then when
you put in a tiny, tiny, tiny concentration of a sodium channel opening toxin called ATX,
and that's what happens in wild-type mice. We have generated sodium -- SCN10A knock-out
mice, and we actually don't see that arrhythmia genic effect, and, in fact, that's reproduced.
So I throw this in just to make sure that people know that I still think about this
kind of stuff every so often, and also to make the point that there is this loop that
has to be closed. Everybody has said, we have -- now have 3,000 more signals, or 3,000 more
loci to look at then we did 10 years ago, and we better start to look at them because
maybe this is a drug target, for example.
So we've deployed the PheWAS algorithm across the entire GWAS catalog supported by NHGRI,
so that's about 1,300 different tests of association. Some of them are with phenotypes that are
not well-captured in the electronic record, like do you -- does your urine smell after
you eat asparagus? Are you bald? Those are things that the electronic record doesn't
capture very well, so we don't pay attention to those in our validation studies. So here's
an example of a highly-pleiotropic SNP. This is a SNP in IRF4 that determines skin color,
but when you do the PheWAS, you get tremendously significant signals for various kinds of skin
cancer as well as actinic keratosis. And it turns out that the SNPs that are highly validated
and highly replicated in the GWAS catalog, replicate this way as well, and we have about
70 new associations using this approach to discover pleiotropy.
This is what eMERGE-II looks like. The number -- don't take the numbers excessively seriously
because, for example, this one says 27,000. It's probably like I said, more like 20,000,
but we count ImmunoChIP and MetaboChIP in this as well. So there's 300,000 -- 330,000
or so people. In eMERGE-II, there are about 75,000 with dense genotypic data that we're
actually putting together in a very large set. So this highlights, for me, the paradox
of personalized medicine. As a clinician, I've one patient in front of me in the office,
but what I need to do is be able to treat them differently from the average, and in
order to do that, I have to have a very large dataset to convince me that that different
treatment is, in fact, justified. So that's the discovery piece.
Then the implementation piece. So we've been hearing all day about pharmacogenetics. That's
the easy stuff. And that's probably the first thing that's going to be implemented. I've
been doing pharmacogenomics and pharmacogenetics my entire career, and we're part of the Pharmacogenomics
Research Network, another effort funded by NIH. Eric already alluded to the fact that
there are now many, many drugs that have labels that include pharmacogenetic information.
This is only for germ-line. The other half of the drugs that have labels are for the
tumor germ-line -- or the tumor genome. So, everybody says it's easy, and I like to show
this. It's -- it is low hanging fruit, but it's not so simple.
I'll just say that. So we were tasked by our leadership to come up with a way of starting
to deliver pharmacogenomic information in a preemptive way into the electronic medical
record in the fourth quarter of 2009. We were given a year to plan. And so this is what
the planning looks like, and I'm not going to walk you through any of this. If anything,
you should just read down these things to understand that there are multiple communities
that you have to engage and excite in order to execute a project like this. This is what
we call our PREDICT Project, and this is what PREDICT stands for.
So, the notion is, in brief, find patients who are at high risk for getting a drug with
one of those actionable pharmacogenetic stories, one of those 58 drugs. And then you genotype
them, not on the drug you think they're going to get, but on a bunch of different -- on
a multiplex platform that assays many different pharmacogenomic variants. And then you do
what I call the easy stuff. You store the genomics, track the outcomes, provide informatics
support to clinicians who are prescribing the drug at the appropriate time.
So who is at high risk? Well, one group of people who are at high risk are people who
populate our internal medicine clinics. We did a study on about 50,000 people in the
electronic medical record asking the question, over the course of five years, how many of
them are exposed to one or more of those drugs that have FDA labels. The answer was a bit
of a surprise. There's 65 percent of them that get at least one drug from that list
over the course of five years, and 15 percent that get -- sorry, 15 percent get four or
more drugs. So that's one group of high risk people, and we are actually including them
in the PREDICT Project.
The other high-risk group of people are people we can look at, and say, within the next week
or two, you're going to get Drug X, and one of the -- the best example of that is that
people going to the cath lab at Vanderbilt, we do about 4,000 catheterizations a year;
about 1,800 of those patients end up on clopidogrel. And as we were planning this project, the
FDA did us a favor, relabeled clopidogrel to include this statement, "Consider alternative
treatment or treatment strategies in patients identified as CYP2C19 poor metabolizers."
CYP2C19 is the enzyme that bioactivates the pro drug clopidogrel into its biologically
And I have to say here that it was Grant Wilkinson, a faculty colleague of mine, who, in the mid-1980s,
discovered the fact that CYP2C19 was polymorphic. He was not studying clopidogrel, and I remember
me and many other people gave him an incredibly hard time because he was studying this incredibly
obscure drug and probably a pointless line of inquiry. And, in fact, it turned it he
was right, that it was an important thing to study because now it's the centerpiece
of much of what goes on in pharmacogenomic implementation.
The other thing we did was we took our BioVU specimens, found a group of people who had
gotten the stent after an acute coronary event, looked at 30-day outcomes, and found 200 people
with complications and 400 or 500 controls, and replicated the known signal for CYP2C19
and its variant in terms of imposing risk.
So over the course of the last two and a half years, we've now studied about 12,521 patients
in PREDICT. Bruce Korf, on the video that you just saw, described a program where you
might genotype people, and then deploy the information when it became apparent. And this
is -- that's exactly what we're doing here.
There are 334 homozygotes for CYP2C19*2, and 2,369 heterozygotes. And what's interesting
is that most people don't have a common variant. We don't actually know how many people have
a rare variant. We know that they don't have a common variant. And just to show you that
-- give you a sense of the fact that this is actually, although it's complicated, it's
even more complicated than you think. It's not just *2 that are the hypo metabolizers,
there's *3 and *4 and you could be *3/*4. There's also a *17 that nobody knows what
to do with. And so the heterozygotes and homozygotes you saw on the pie chart are actually multiple
genotypes, and how to translate from a genotype to a diplotype to a predicted phenotype is
one of the challenges in area.
When a patient who has this information in their electronic medical record has an electronic
prescription written for clopidogrel, and are a poor metabolizer, this is the point-of-care
decision support that pops up that suggests two alternative drugs, and we track how many
times physicians look at this, how many times they change their minds, and we're just learning
about responses to this kind of program.
And I thought I should show this picture because, again, it's about personalizing medicine.
This is one of our interventional cardiologists, and when we started the program in September
2010, we were really eager to find the first *2/*2 patient, and this is she. So she is
being taken care of by him, and he knows a little bit more about her now. And that's
personalizing medicine. But he's taken care of her for a long time so he knows a lot about
her and her attitudes and her other diseases and her other medications, so that's what's
important. And there's another quote from William Osler, which, in the interest of time,
I think I'll let you read, and now you've read it or not.
So we've now deployed five drug-gene pairs, clopidogrel, simvastatin, Warfarin, thiopurines,
and Tacrolimus, and those are displayed on the electronic medical record. This is a screenshot
of what our electronic medical record looks like. I blacked out all the identifiers, except
for this one here because I want to make sure people know that I still see patients. And
so these are the genetic variants that belong to this particular patient. These are a partial
list of his medications that actually go down here. And what's interesting to me is that
he's been on Warfarin for a long time, so we actually didn't use the Warfarin information
to tailor his dose, but he is a poor -- he has a loss-of-function allele in CYP2C9, and
he does take a remarkably low dose of Warfarin, only 3 milligrams a day, so that's probably
the explanation, and had we known that at the beginning, we would have started him on
3 milligrams a day. We also have to say, use this kind of technology to display variants
in tumors in an effort very, very, similar to, but probably smaller than the one you
just heard from Dr. Garraway in our personalized cancer initiative. So -- and that was a BRAF
The other thing, after you've deployed five drug-gene pairs, you can start to ask the
question, how many people have variants in one or more of these pathways. And what's
interesting is that the number that don't have anything is now getting smaller. And,
of course, every time we deploy a new drug-gene pair, the number has to get smaller. And one
way to think, I mean, it's obvious, and it's almost trivial to sort of highlight it, except
to point out that as -- if you're going to do multiplex testing, what you find out is
that everybody in this audience is abnormal for something. And we all, if you live in
genome-land, like we all do, we all recognize that. But these are real data that speak to
that. And so when I speak to lay audiences, when I speak to non-genomics audiences, it
comes as a surprise to them sometimes that we're all abnormal for something, you just
don't know what it is. So this multiplexed approach, I think, is really the only way
to go, and I think that many people in this room have thought that problem through and
We also engage patients, so we have a website called My Health at Vanderbilt where you can
go and look at pieces of your record. You can make appointments with your doctor. When
you look at pieces of your record, you can look at genes that affect My Medicines, and
then you can see your report for the drugs that we've targeted. Those are all sort of
works in progress, and you can see we're sort of still figuring out exactly how to deliver
that information. We steal from -- sorry, we adapt 23AndMe information to think about
this because they do a reasonable job of explaining this. And we think that that's something that's
going to very, very important to engage patients in all this.
So we have Predict at Vanderbilt, and I'm part of the PGRN. We're also part of the eMERGE
network, and so I was having a conversation with Teri Manolio, who directs the genomic
medicine initiatives at NHGRI, and she said, "Well, why don't we take the PGRN's next-generation
sequencing platform for all those important pharmacogenes that you guys are working on
and stick them in eMERGE in a PREDICT kind of algorithm." So that's sort of -- that was
an idea born, you know, at the end of a long day, and we actually are doing that right
now. And that project is underway, so it takes advantage of the expertise and capabilities
of two separate networks that I think ought to be closer aligned than they are, and I'm
working on that.
So I want to just summarize by highlighting in words what the lessons are. So, first of
all, I think that we have not finished discovering. So those of us who focus on implementation
are working on trying to deliver some of that to patients. But that doesn't mean that we
know everything we need to know. We need to know a lot more, and you've heard that all
day today. The low-hanging fruit of pharmacogenomics is much more complicated than we think, but
I think learning the lessons that we're learning along the way in this space will make us smarter
in terms of delivering genomic information in the course of health care more generally.
Some of the problems I've highlighted, this business of rare variants is going to be a
real problem in pharmacogenomics and everywhere else. And then there's the problem of ancestries,
which we're only beginning to think about now.
This is Team Science. It's interdisciplinary, and you have to engage lots and lots of people,
not just get their grudging approval, but get their enthusiastic engagement. There are
huge educational needs in every constituency that you can think about. The evidence changes,
even in pharmacogenomics where you sort of say, well, CYP2C19 does this, and then a year
later you think, well, maybe it does this in some other context. So you really have
to be attuned to the fact that the advice you deliver today may not apply perfectly
It goes without saying, but I have to say it, that an Illumina run might be 99 percent
accurate. I have no idea what that number really is, but that's not good enough for
a clinician because it has to be 100 percent accurate. And the reason is, I'll just say
it, for those of you who are clinicians, you'll understand me. I walk out of the wards, and
the nurse or the resident says to me, "This patient has renal failure. Their creatine
is eight today." And I say, "Well, what does their electrocardiogram look like, how do
they feel?" And they feel fine, and their electrocardiogram is normal. I say, "Well,
that's a lab mistake." Any clinician would tell you that it must be a lab mistake. But
if somebody says, "This person is a poor metabolizer," I have no context for that. So, I have to
be sure that the data I get is correct. I think this is only going to happen in an electronic
medical record environment. We're thinking about ways in which to deliver this kind of
thing to people who have less advanced electronic medical record systems than we do, but I think
that's going to happen. The only way this really happens is with institutional will.
So I'll just close by talking about the teams. I've acknowledged these teams up here. These
are the individuals at Vanderbilt. I can't walk my way through all of them, but there
are geneticists, informaticists, ethicists, lab people, fellows, translational scientists,
and these three guys down here, who are the institutional leadership. So thank you very
much, again, for the opportunity to participate.
Mark Guyer: So our next speaker, before he sits down,
is David Botstein from Princeton. And David is going to tell us about the Fruits of the
Genome Sequences for Society.