Tip:
Highlight text to annotate it
X
Male Speaker: Okay. Well, Steve, if you're there, we'll
segue immediately to you.
Steve Leeder: I am here. Can everybody hear me?
Male Speaker: Yes, we can.
Steve Leeder: Good. In retrospect, I guess I should've -- can
we go to the next slide, please? In retrospect, if I had used the smaller font like Howard
did, we could get these all on one slide and look at them at the same time, but they're
broken up into two slides. On this first one, I think you should get the sense by now from
both Marilyn and Debbie's presentations that the consensus of our group was that discovery
research should remain a high priority for the future. And working with the phenotyping
group, one step is obviously to decide what traits or phenotypes are sort of high priority
for the next phase of the network. And then the topic for discussion, and some of this
has been occurring in the sort of online chat, is, you know, whether or not just go with
existing data and work with improving the analytical tools and methodology to use existing
data, or whether there should be some effort to into [inaudible] data generation by next-gen
sequencing or exon arrays. As sort of a non-genomic person pursuing this more from a pharmaco
perspective, I guess we could also add in that the existing data could also include
the longitudinal phenotypes that several people have alluded to, and using this in the context
of, you know, the [inaudible] contribution to disease progression. This also lends [inaudible]
gene-environment interactions as well. Also the impact of therapeutic intervention on
the trajectory of progression.
The second point that came out of our discussions was this whole issue of not throwing the baby
out with the bath water, and looking at the importance of rare variants. And again, up
for discussion would be the most appropriate platforms, whether it be a genotyping or a
sequencing platform to capture them, but also resources that may need to be put into developing
appropriate tools to detect their effects.
Can I have the next slide? So, this is -- this next point gets at one of Debbie's last points,
and that is considering study designs other than a straight GWAS-type format for discovery
purposes. And, for example, the example that she gave looking at discordant -- extreme
discordant phenotypes, at least for continuous variables, and coupling them with your platform
of interest, and I put a whole genome sequencing here. And, you know, the potential for this
particular approach to be a little bit more efficient in identifying causal variants,
and especially rare causal variants.
And then the last point that our group would propose to the larger group at whole would
be, again, something that was mentioned by both Marilyn and Debbie, and that is looking
at other sources of genomic material, RNA, or going back into the DNA and looking at
methylation, for example, for these genomic analyses. And then on the EMR side of things,
looking to see how additional data can be captured or parsed to look at environmental
factors and co-morbidities or gene-environment interactions, for example.
So those are the four issues that were raised by our group as being something to pose to
the rest of the group for their thoughts and comments. And I'll toss it back to the chair.
Male Speaker: Okay. The floor is open for comments or questions
on EMR and genomic discovery.
Marc Williams: Marc Williams here. A thought occurred to
me as Debbie was talking, again trying to bridge the tension that we have between discovery
and implementation. This was in the context of the rare variants. I think one of the issues
that we're all going to be dealing with as we receive secondary findings from our genomes,
exomes, and high-density ChIPs about, you know, that we're thinking about clinically
returning, is the lack of information that we have on the impact clinically of some of
these rare variants, even in genes that we know quite well. One of the things that we'll
be doing is to try and use our traditional methods of contextualizing that data using
family history and other sorts of things to understand what's the potential impact. To
me, that seems to lend itself to the idea that if we did a rare variant focus, we could
study how we could use electronic health record mining to try and contextualize rare variant
information to add additional information for clinical return and implementation. So
that could be a potential study topic for eMERGE III that would bridge, again, this
discovery and implementation chasm.
Dan Roden: This is Dan Roden. I have to say Roden now
because there's another Dan on the phone. I agree with Marc, but at a practical level,
I think you have to make some attempt to limit the minor allele frequencies down to which
you're willing to go. If you find a rare variant that is one in million or one in a hundred
thousand, it's going to be very, very tough unless you know something about the biology
to assign any kind of phenotype to that. And so I think the sweet spot for us is probably
minor allele frequencies around 0.1 percent. Plus, variants in disease genes that are,
you know, have been implicated, and as Zach [spelled phonetically] said a couple of hours
ago, you find that variants that have been implicated as causes of hypertrophic cardiomyopathy
or channelopathies are actually much more common than you give them credit when you
start to look across very large populations, and we're finding that along with everyone
else.
So I think that one thought is exactly which rare variants we'll want to focus on. And
I think that the variant of uncertain significance is 1 in 10,000 or 1 in 1,000, something that
eMERGE is really, really well-suited to attack.
Female Speaker: So, you know, for capturing sequence data,
it seems to me that we should report all variants, even if we only get 1 in 100,000 people, and
put it in a database because other people are going be putting that data forward and
annotating those variants. Even if we can't determine ourselves alone if they're pathogenic,
it's going to be really important going forward.
Dan Roden: So, Gail [spelled phonetically], I totally
agree with that, and that's what we're going to be doing in eMERGE PGx. And as data accumulate
worldwide, you can start to make some sense of that. But I think over the next five years,
it's a one in a million variant; unless there's some biology around it, it's going to be hard
to make sense of. But, yeah, I totally agree that we have to figure out a way archiving
this worldwide.
Male Speaker: So this is Halcom [spelled phonetically] here.
So, as you know, the new platform from Illumina on the X10, which is currently tailored towards
whole genome, I mean, it's very likely going to be adapted to exome, even though that will
probably take some time. But an exome could probably be sequenced for about $100, sort
of, say, a year, year and a half from now. So in the interim, a strategy to sort of customize
a ChIP with this rare variant content, particular content that I was sort of -- with potential
or punitive damaging impact, the loss of [unintelligible] and so forth, and that can actually be tied
now extremely cost efficiently across, you know, thousands and thousands of samples for
relatively sort of low amount of money, even though it's going to cost some money. So,
in the interim, that would potentially be a very, very powerful strategy across the
sides because that would open up the rare variant content for all phenotypes that we
have, and we don't have that today.
John Harley: This is John Harley [spelled phonetically],
Cincinnati. I just asked the question that when we concentrate on rare variants and we
don't have all of our samples genotyped, we rely on imputation. And as the frequency of
the variants drops, the accuracy of the imputation is disastrous. And so how we -- you know,
we aren't able to take advantage of our huge numbers because the area reduced by imputation
is so big. If there is anyone who has a solution to this problem?
Female Speaker: You need your sequence.
Male Speaker: Right. Debbie's right.
Rex Chisholm: So, this is Rex. I'd like to weigh in. I would
really like to endorse the idea of thinking about environmental factors. We've played
a little bit around with GIS tools, and I think one of the things we could do very well,
which isn't done very well in most cases, is given the longitudinal nature of the people
that we're following, is to think about some of these environmental factors. I think there's
going to be increased opportunities to capture some of these bits of data. Marilyn talked
a little bit about Environmental Protection Agency measurements that are being made. So
I think to be able to start to tackle gene-environment interactions using GIS approaches in some
of these environmental measures is also something that would uniquely be possible in an eMERGE
III for us to take a look at.
Chris Chute: Well, this is Chris, and while I find that
idea elegant, I want to make sure we're somewhat cautious and thoughtful about this. For some
populations, and your Chicago population, Rex, might be a superb candidate for this.
For other populations, they're not always, as we say, population based, and hence the
density of sample cases in any environmental geocode, you run into power problems very
quickly with environmental association, particularly when you're treating it as a co-variant in
the substrata.
Marylyn Ritchie: Chris, this is Marylyn Ritchie. One of the
other things that I think folks could think about, and this is something that Marshfield
has done in eMERGE-II, and that is to use the Phoenix toolkit as the mechanism to collect
environmental data. We were awarded a supplement as part of the Phoenix Rising program by NHGRI,
and so some of the Phoenix toolkit measures were sent out to the eMERGE participants.
And we've actually started mining that data and we're finding really interesting gene-environment
results for type II diabetes and some for cataracts. And we only implemented a few of
the Phoenix toolkit measures. That's something that other sites could do either electronically
or paper form. You know, it's something that you could port to an iPad that people could
do in clinics. You could put it on the Web that people could do through their myhealth@Geisinger,
or Vanderbilt, or what have you. And that's another way that, even without relying on
population-based environmental data, you could collect it on the participants in the biobank.
Chris Chute: Well, I certainly agree that would be hugely
more efficient and wouldn't suffer the, you know, broad association problem that you have
with geocoding. And I actually think the Phoenix toolkits would be the appropriate choice for
collecting that type, so I agree with that, Marylyn. That's good feedback.
Male Speaker: That might be another agenda item to put on
the discussion with large health system providers when discussing it with the vendors of health
records because the patient portal is going to become a part of the mandated electronic
medical records eventually, and as they're building them, it would be nice to have patients
uploading various lifestyle things that can be merged with their electronic medical record.
Female Speaker: This is Carrie [spelled phonetically] --
Female Speaker: One question, are there other things that
the coordinating centers should be working on in the future? I mean they've done an awful
lot of work with data cleaning and then the imputation, but are there other things that
would make the dataset more effective for other analyses that would be a good focus
for eMERGE-III?
Male Speaker: So one focus there -- this is Halcom -- is
on the copy number variation analysis because that's another whole dimension that, you know,
focusing there from the rare variants' standpoint because most of the data is typed on Illumina,
that can open up a very fruitful sort of discovery focus across all of the phenotypes, again,
from a data-mining standpoint. And algorithms can -- you know, we have algorithms that can
be applied on these data at the individual sites or jointly, and then the whole thing
just sort of [unintelligible] together.
Teri Manolio: This is Teri. I did want to ask about the
issue of sequencing. When we've approached, sort of, large-scale sequencers, the question
they often ask is, "Well, how many cases of a given disease do you have?" Because they're
very interested in looking at, you know, thousands or tens of thousands of cases of disease X.
And that has not been something that eMERGE has really focused on because we're sort of
phenomic, as it were. So, how do we address that question, other than say, gee, we've
got so many wonderful phenotypes just as good or better? Gail, do you --
Female Speaker: I think we don't need to have a disease focus.
I'd be really excited to sequence the 56 ACMG gene. We know what those genes do, but we
don't know what the variants in those genes do, and we could look both for variants, annotation,
pathogenic, and informally [spelled phonetically] not pathogenic. Everybody's done that sequence
variants. [inaudible]
[laughter]
And then we could also look for pleiotropic effects of those same genes. So there's a
discovery possibility there too, and then there's lots of implementation questions,
you know, [unintelligible] my health system is very concerned about those 56 genes now
because of the ACMG recommendations. So how do you implement that, how do you get the
business support, how do you educate providers, what do patients want to know, what do they
need to see? You know, I think that really hit all the things that we can do really well,
and having those phenotypes that we have so -- in such depth gives us a unique resource
for that kind of annotation.
And I think even at the pediatric sites, there is really important work, you know, 49, I
think that was a 56, have pediatric phenotypes. Plus, the pediatric sites really could look
into this idea of mandatory return of adult onset findings to children, which has been
a hugely controversial recommendation. They could release -- ask [spelled phonetically]
their families. What do they want? Ask their providers what they want. I think that that
is a space where there is a lot of controversy, a lot of interest of the health system, and
we have a really unique capability, and I don't think [unintelligible]. I would add
a couple more, by the way, but...
[laughter]
You know, I mean, I think that's doable.
Male Speaker: I agree with Gail. I think that there -- this
is Irving [spelled phonetically] -- that one could add to such a panel, I think, which
would be extremely meaningful and something than can be done uniquely in eMERGE, things
like a list of the highly penetrant forms of -- highly-penetrant monogenic forms of
diabetes and others, I think which, you know, it would be very helpful to understand among
sort of common complex diseases what forms are diagnosable on a molecular level, and
to what extent is that -- how frequent that is.
Teri Manolio: So I'd be interested in Steve Leeder and Debbie
Nickerson's comments on that. Let me just ask, both of you had pointed toward non-coding
variation, and here we're really talking about focusing on genes, even though there are some
non-coding regions obviously in the intron. So, Steve or Debbie, any thoughts?
Debbie Nickerson: I think it's great to look broadly at genes,
but I think that different platforms have different outcomes in terms of what you'd
look at. I mean, many people are sequencing whole genome, but they end up looking at only
the coding and that few percent that's well annotated by ENCODE as being highly functional.
But I, you know, I think, broadly, whole genome is an important route to go because you can
look at variants that are difficult to look at, like indels and CMVs, by just sheer capture.
Teri Manolio: Thanks. Steve, what do you think?
Steve Leeder: Well, for me, non-coding for me really points
more to regulatory regions as being of interest. But you have to understand, I'm coming at
this from a pediatric perspective as well, in that when we were looking at things in
kids, there's so much change that is going on between birth and sort of adulthood that,
you know, you have to look somewhere besides the coding region of the gene for what's changing
as kids grow and develop. And, you know, to some extent we know very little how this really
works in senescing adults as well as we move towards a geriatric population. So, for me,
the non-coding stuff really -- I'm really thinking about important regulatory regions
and being able to identify those and characterize them.
*** Weinshilboum: Steve, this is *** Weinshilboum. In all of
our studies of variation in cancer drug response, the majority of the hits that are functionally
important regulate transcription, they're a non-coding region.
Murray Brilliant: So, this is Murray Brilliant. I think that
it's clear that, from an economic standpoint, we can't do a whole genome sequencing of 50,000
people. But we could look at a smaller number of genes, and since these genes have been,
you know, implicated in human disease, they're recordable, they're actionable. We can look
both at exomes and introns. I think that if we focused on the subset of genes, you know,
it would be a parody for looking at the whole genome. I mean, it would be scalable. And
I think that, you know, the 50 some genes here, maybe, you know, all of us have some
other favorite genes. If we had 100 genes, it's kind of catchy. Instead of 1,000 genomes,
we have 100 genes that are looked at across a large number of people. And again, this
would be as a parody, what are we going to do when we have a large number of whole genome
sequences. This really cuts across all of that.
Eric Green: I would point out that's very reminiscent
of the decision early on with the ENCODE project to tackle 1 percent. I mean, I don't know
if I like this idea or not. That's a separate issue.
Male Speaker: Sure, sure.
Eric Green: But it is reminiscent that the same rationale
went into, like how are we ever going to interpret the whole human genome. So there was a whole
process to pick the 1 percent, which was complicated, but we got there. Everybody stayed with 1
percent until you felt comfortable enough to scale it to the whole genome. So this would
be similar circumstance, however many genes you pick.
Female Speaker: I think we could also exercise diversity here,
so you have a specific set of genes that there's a particular interest in reporting results
back, but also [unintelligible] data from minority populations for pathogenic --
Male Speaker: [inaudible]
Female Speaker: I know. I think we could do really well in
that avenue.
Debbie Nickerson: So, is the idea to do non-coding as well,
right, of those genes? Because --
Female Speaker: I think if you could find that, yes. So if
you know the regulatory regions of them.
Male Speaker: [inaudible] get somebody to clarify. What,
would you take a gene and you would just go end to end, maybe x number of bases upstream
and downstream and just do the whole segment? As opposed to know -- I mean, what Teri was
implying was sort of known functional non-coding regions.
Debbie Nickerson: Well, I'm just trying to stimulate the conversation
maybe toward exome versus targeted panel. So, what's the difference there? If you could
get exome --
Female Speaker: What's the difference?
Eric Green: These are actually asking very different questions.
If you're only going to go exomes, you're going to make the assumption that that's what
you're going to find. I thought the idea was that you have these sets of genes that are
of interest [inaudible] would be non-coding, so you want to get a complete inventory deep
and lots of people.
Debbie Nickerson: See, that's what I wanted you to say Eric.
[laughter]
Eric Green: I was trying to rearticulate what I thought
I heard. So with that -- what I also heard was a variance approach which was you take
x number of genes and you take all of the exons, but then you take --
Teri Manolio: Well, or any intron --
Eric Green: Any intron --
Female Speaker: -- any regulatory regions you know of.
Eric Green: Any introns, or --
Teri Manolio: I mean, there are some that have regulatory
regions that are identified in other chromosomes, you know, and so maybe look at those as they
become added in. But, you know, it takes two years, Debbie, or more to develop a targeted
platform like this?
Debbie Nickerson: No, I think it's much easier now than it was.
Teri Manolio: Oh, okay.
Debbie Nickerson: I don't think it's -- I think we have a lot
more experience. And I think that the PGRN has great data with the PGx. They can look
at these questions.
Female Speaker: Debbie, what do you think of [unintelligible]
versus Seq next-gen?
Debbie Nickerson: You know, I think it's a matter of cost and
ease of implementation. I think some can be cheap, but whether they're broadly applicable
to many genes is not known.
Male Speaker: If you put this into an RFA, I would suggest
that we could be agnostic as to the technique and let the people that are putting in proposals
discuss how they would do it, because, as was pointed out, there are a number of us
that are going to be generating large numbers of exomes and genomes, and so that would also
allow them for a methodologic comparison of about what's the best way to actually do it.
Male Speaker: Okay, and we're going to need to wrap up the
discussion.
Male Speaker: Just a very quick point, that aren't these
commercial entities that are trying to make these panels of 2,000 or 3,000 genes? So that
if you have a patient with Marfans or if you have a patient with hypertrophic, you just
order that set, and then you can just pick and choose and analyze. Because what's happening
is that they're realizing there are many variants that may cause, for example, an aneurysm.
And so I end up ordering a panel of 15 candidate genes, which is like $5,000. And I may still
not get the information because they may just do certain variants. So I think, you know,
that's another, it's not whole exome, but it's like what are the thousand, or hundred,
or 1,500 genes that are most often used in the clinical setting, and perhaps go with
those. And also, it would go back to Debbie's point that if you're using some of these in
the clinical setting, you would have the familiar structure to interpret much more efficiently.
Male Speaker: Okay, so I'd like to thank all of the participants
for -- actually all of our panel this morning for a very rich and thoughtful discussion
for future directions in eMERGE. And we're going to -- I guess we're down to about a
20-minute break for lunch --
Eric Green: Careful because you're not going to get all
of these people down to the cafeteria and through the cafeteria and back up here in
20 minutes.
Male Speaker: We'll do our best. So we'll plan to start
somewhere around half past the hour.
Male Speaker: These folks should get their --
Male Speaker: Yes, everybody out there on the call should
go run for lunch and bring it back.
[laughter]