Tip:
Highlight text to annotate it
X
Male Speaker: Last, but certainly not least of our topics,
which is Genomic Medicine in Pediatric Patients, and, Hakon, are you ready to go?
Hakon Hakonarson: Yes, I'm good to go. Can you hear me?
Male Speaker: Yes, we can.
Hakon Hakonarson: Okay. Great. So this presentation is sort
of focused on obstacles, but I am going to sort of address a few of the successes as
well. So if I have next slide.
And there are really three areas that I was going to focus on sort of collectively for
the pediatric sites: review of current pediatric projects, and largely focusing on phenotyping
and sequencing. So, Kyle Brothers has worked on the consent form but there's a separate
section on that so I wasn't going to go into that in any detail. And then talk about new
approaches to analyzing existing data and prospective directions where we felt that
an inexpensive custom-based phenotyping chip sort of focusing on functional variants or
a [unintelligible] functional variants would be of interest.
Next one. And go on.
So this is addressing some of the algorithms that we have been conducting across the pediatric
sites at Boston, Cincinnati, and CHOP, and we have basically cross-validated several
between pediatric sites and across pediatric and adult sites, and we validated adult algorithms
as well. I apologize for using CAG -- this stands for the Center for Applied Genomics
-- but it really should say CHOP there. And we still have a few that are in development,
but both the atopic dermatitis and ADHD have been validated sort of on our end. We have
tested them on to all the sites, and they were felt to exclude too many cases, so we
are sort of adjusting the algorithm before we take it forward. Next one.
And this is just a representative example of the asthma algorithm. We were quite conservative,
so we, for example, here have asthma diagnosis in our EPIC database with about 15,000 cases
but restricted it down to about 4,500 so that also impacts on the other site, but these
are really carefully confirmed and documented asthma cases, and this is now under analysis,
as are many of the other phenotype algorithms that I listed earlier. Next.
And on the adult site, I mean, we have sort of done what we can to address some of the
adult phenotypes, but this is sort of one of the obstacles that we have. I mean, we
don't have a lot of pediatric patients who have deep thrombus and/or diverticulosis or
zoster, I mean, for obvious reasons. So if you take the next slide. And for those obvious
reasons, I mean the pediatric numbers are obviously going to have to work sort of in
the context of several of the adult phenotypes. But on the other hand, many of the -- and
multiple of the algorithms we are taking forward in pediatrics as I showed you, they cross
pediatric and adult fields, so we can obviously address phenotypes longitudinally from pediatric
and up to old ages. Next one.
So sort of our often obstacle point in that setting is the fact that many of the adult
algorithms, as I said, they don't really apply well in pediatrics and vice versa. When we
looked at developmental traits and pediatric language development, cognition, or motor
skills, I mean, they are not going to be big numbers on the adult front to address that
either, but, you know, the goal here, of course, is to try to optimize and reach this to the
best we can. So, next slide. And the options that we have in that setting is to sort of,
you know, proceed as we are doing on a case by case basis, and not really necessarily
worry about the fact that they are obviously going to be diseases that have no overlap,
such as Alzheimer disease and dementia and other things in the adult front, and the developmental
phenotype, as I mentioned, in pediatrics. But then, you know, what we have, though -- our
conflict is the sort of cross-validation of several of the phenotypes algorithms that
we have come up with in pediatrics and validated in adult sites and vice versa. So that's actually
very nice to see. Next slide.
And as a part of the sequencing program, I mean, obviously some of the gene variants
that are targeted there are much more applicable towards the adult diseases. We have now done
preliminary analysis of the first about 280 or so samples of the sequence, and we have
another 140 going through, about half of them through. And it's actually quite interesting
that there's a lot of novel variants absent in all databases that actually have come up
from the panel, and this is in the middle of an analysis, and we can talk about that
in the meeting tomorrow or Friday. Next one.
And this sort of is the focus on the approaches to analyze existing data, the new methods.
Next slide.
And, as I mentioned briefly earlier, copy number variation is obviously another whole
really domain but can be leveraged across all the sites. We all have SNP data where
we can derive log R ratio of the allele frequency, and we have an algorithm and you may also
have algorithm up to all the sites that we can work on optimizing, but the same algorithm
should be utilized across all the sites and then the data sort of meta-analyzed.
Another sort of potential way of moving forward is to impute loss-of-function variants and
drug-gene interaction variant, and I'll show you an example of that. And then, of course,
CNV analysis across sequencing data which we have developed, and many others as well,
is progressing nicely and sort of getting to the stage of the GWAS CNV analysis and
picking up, obviously, much, much, smaller CNVs than we could with the arrays.
Now, I'm mentioning here also the high-sensitivity GWAS. This is really an algorithm that was
implemented in the R-package of assets and sort of focuses on subset analysis, and we
have applied this, for example, across multiple different autoimmune diseases, and we have
enriched our genome-wide significant finding multi-multifold by using this algorithm. It
really picks the -- for each SNP or for each locus the most optimal disease model, and
then it basically, sort of from a common control analysis, drives the most informative analysis
from a subset standpoint. It's actually quite, quite a powerful way of enriching for significance,
and we could definitely leverage.
And then there are various functional biological annotations that obviously can be applied
to optimize sort of genome-wide marginally significant hits, et cetera. Gene-based association
testing obviously cuts down the multiple testing issues and various tissue-specific and cell-specific
assays that can effectively be integrated into GWAS data are actually often very informative
in pinpointing specific cells but drive the sort of the significant signals across related
diseases, et cetera. And then, you know, pathway or protein interaction analysis had been mentioned
before, and here some of the newer tools that are used. Next one.
And for the copy number variants, I mean, as we mentioned, these are -- this is really
an untapped resource within eMERGE crossover sites, and there's the 56,000 samples, actually,
of genotypes. I mean this could be a very, very valuable approach, and obviously these
variants are very common, even though the interest is more on the rare variant forms,
and that's really what the Illumina platform is optimized to pick up.
Next one.
And this is just sort of the schematic of the different approaches depending on the
array platform or sequencing platform you have, and it's sort of self-explanatory. Obviously,
the array data where you have both the allele frequency and the intensity data, in my view
at least, is better powered than any of the other methods that rely on intensity alone.
Next one.
Male Speaker: So you're at 11 minutes now so it'd be good
to wrap up.
Hakon Hakonarson: Okay. So the opportunity exists here to basically,
you know, genome-wide this across the sites and do a meta-analysis.
Next one.
So, the pathogenicity and the database of DGV, which is the genome variation, I mean,
this is obviously not optimal, and so we could do a much better job, I think, in the network
-- next slide -- by figuring out sort of, you know, the proper control and the proper
way of doing this.
Next one. And this is just to sort of demonstrate the example of the haplotype imputation for
the TPMT.
Next one.
And shows -- you can just roll these slides -- these are the four variants that we used
to impute to 87,000 samples.
Next one. And this shows, you know, that there is actually, you know, ethnic difference in
the prevalence of these variants, homozygosity state is obviously much more rare, but the
heterozygosity, which is still influential, is significant.
Next one. And this summarizes the data set in terms of the individuals we picked out,
and these are rare variants so -- and if you take the next slide, which shows the -- sort
of the accuracy from the variants that are typed, which is 99.8 percent to that of the
imputation, and the Sanger sequence is actually a full plate of 94 samples.
Next slide. And you can see better the accuracy there, that, you know, for the homozygote
state, I mean, it's not perfect. There were a few individuals predicted to be heterozygous,
and one missed from each of the standpoint.
Next slide. I think we're coming to the end here. So, the accuracy here of imputation
is obviously not perfect, but still, you captured the vast majority of the patients.
Next slide. And for the prospective sort of direction -- next slide -- what we were going
to propose was sort of this cost-effective, inexpensive custom-based genotyping.
Next slide. And we have, you know, done this for a couple of projects before, proposed
it for a third project that may or may not go.
Next slide. And this is the -- sort of the contents of the organ transplant chip. This
is basically taking all putative damaging variants, all copy number variants, GWAS loci
and sort of content that is available from the public domain and other sites where we
could access data, and typing across 25,000 to 30,000 samples.
Next slide. And so this would be sort of the proposal for a future eMERGE chip in that
setting, which will obviously allow us to integrate the -- you know, every single sample
by typing them on the same chip with informative content of data set that sort of can go as
low as 0.1 percent frequency as was discussed before.
And the last slide, sort of, again, just sort of allows all this sort of coordinated effort
to take place. So I'll stop here. Thanks.