Genomic Medicine in Pediatric Patients - Obstacles & future directions - Hakon hakonarson

Male Speaker: Last, but certainly not least of our topics, which is Genomic Medicine in Pediatric Patients, and, Hakon, are you ready to go? Hakon Hakonarson: Yes, I'm good to go. Can you hear me? Male Speaker: Yes, we can. Hakon Hakonarson: Okay. Great. So this presentation is sort of focused on obstacles, but I am going to sort of address a few of the successes as well. So if I have next slide. And there are really three areas that I was going to focus on sort of collectively for the pediatric sites: review of current pediatric projects, and largely focusing on phenotyping and sequencing. So, Kyle Brothers has worked on the consent form but there's a separate section on that so I wasn't going to go into that in any detail. And then talk about new approaches to analyzing existing data and prospective directions where we felt that an inexpensive custom-based phenotyping chip sort of focusing on functional variants or a [unintelligible] functional variants would be of interest. Next one. And go on. So this is addressing some of the algorithms that we have been conducting across the pediatric sites at Boston, Cincinnati, and CHOP, and we have basically cross-validated several between pediatric sites and across pediatric and adult sites, and we validated adult algorithms as well. I apologize for using CAG -- this stands for the Center for Applied Genomics -- but it really should say CHOP there. And we still have a few that are in development, but both the atopic dermatitis and ADHD have been validated sort of on our end. We have tested them on to all the sites, and they were felt to exclude too many cases, so we are sort of adjusting the algorithm before we take it forward. Next one. And this is just a representative example of the asthma algorithm. We were quite conservative, so we, for example, here have asthma diagnosis in our EPIC database with about 15,000 cases but restricted it down to about 4,500 so that also impacts on the other site, but these are really carefully confirmed and documented asthma cases, and this is now under analysis, as are many of the other phenotype algorithms that I listed earlier. Next. And on the adult site, I mean, we have sort of done what we can to address some of the adult phenotypes, but this is sort of one of the obstacles that we have. I mean, we don't have a lot of pediatric patients who have deep thrombus and/or diverticulosis or zoster, I mean, for obvious reasons. So if you take the next slide. And for those obvious reasons, I mean the pediatric numbers are obviously going to have to work sort of in the context of several of the adult phenotypes. But on the other hand, many of the -- and multiple of the algorithms we are taking forward in pediatrics as I showed you, they cross pediatric and adult fields, so we can obviously address phenotypes longitudinally from pediatric and up to old ages. Next one. So sort of our often obstacle point in that setting is the fact that many of the adult algorithms, as I said, they don't really apply well in pediatrics and vice versa. When we looked at developmental traits and pediatric language development, cognition, or motor skills, I mean, they are not going to be big numbers on the adult front to address that either, but, you know, the goal here, of course, is to try to optimize and reach this to the best we can. So, next slide. And the options that we have in that setting is to sort of, you know, proceed as we are doing on a case by case basis, and not really necessarily worry about the fact that they are obviously going to be diseases that have no overlap, such as Alzheimer disease and dementia and other things in the adult front, and the developmental phenotype, as I mentioned, in pediatrics. But then, you know, what we have, though -- our conflict is the sort of cross-validation of several of the phenotypes algorithms that we have come up with in pediatrics and validated in adult sites and vice versa. So that's actually very nice to see. Next slide. And as a part of the sequencing program, I mean, obviously some of the gene variants that are targeted there are much more applicable towards the adult diseases. We have now done preliminary analysis of the first about 280 or so samples of the sequence, and we have another 140 going through, about half of them through. And it's actually quite interesting that there's a lot of novel variants absent in all databases that actually have come up from the panel, and this is in the middle of an analysis, and we can talk about that in the meeting tomorrow or Friday. Next one. And this sort of is the focus on the approaches to analyze existing data, the new methods. Next slide. And, as I mentioned briefly earlier, copy number variation is obviously another whole really domain but can be leveraged across all the sites. We all have SNP data where we can derive log R ratio of the allele frequency, and we have an algorithm and you may also have algorithm up to all the sites that we can work on optimizing, but the same algorithm should be utilized across all the sites and then the data sort of meta-analyzed. Another sort of potential way of moving forward is to impute loss-of-function variants and drug-gene interaction variant, and I'll show you an example of that. And then, of course, CNV analysis across sequencing data which we have developed, and many others as well, is progressing nicely and sort of getting to the stage of the GWAS CNV analysis and picking up, obviously, much, much, smaller CNVs than we could with the arrays. Now, I'm mentioning here also the high-sensitivity GWAS. This is really an algorithm that was implemented in the R-package of assets and sort of focuses on subset analysis, and we have applied this, for example, across multiple different autoimmune diseases, and we have enriched our genome-wide significant finding multi-multifold by using this algorithm. It really picks the -- for each SNP or for each locus the most optimal disease model, and then it basically, sort of from a common control analysis, drives the most informative analysis from a subset standpoint. It's actually quite, quite a powerful way of enriching for significance, and we could definitely leverage. And then there are various functional biological annotations that obviously can be applied to optimize sort of genome-wide marginally significant hits, et cetera. Gene-based association testing obviously cuts down the multiple testing issues and various tissue-specific and cell-specific assays that can effectively be integrated into GWAS data are actually often very informative in pinpointing specific cells but drive the sort of the significant signals across related diseases, et cetera. And then, you know, pathway or protein interaction analysis had been mentioned before, and here some of the newer tools that are used. Next one. And for the copy number variants, I mean, as we mentioned, these are -- this is really an untapped resource within eMERGE crossover sites, and there's the 56,000 samples, actually, of genotypes. I mean this could be a very, very valuable approach, and obviously these variants are very common, even though the interest is more on the rare variant forms, and that's really what the Illumina platform is optimized to pick up. Next one. And this is just sort of the schematic of the different approaches depending on the array platform or sequencing platform you have, and it's sort of self-explanatory. Obviously, the array data where you have both the allele frequency and the intensity data, in my view at least, is better powered than any of the other methods that rely on intensity alone. Next one. Male Speaker: So you're at 11 minutes now so it'd be good to wrap up. Hakon Hakonarson: Okay. So the opportunity exists here to basically, you know, genome-wide this across the sites and do a meta-analysis. Next one. So, the pathogenicity and the database of DGV, which is the genome variation, I mean, this is obviously not optimal, and so we could do a much better job, I think, in the network -- next slide -- by figuring out sort of, you know, the proper control and the proper way of doing this. Next one. And this is just to sort of demonstrate the example of the haplotype imputation for the TPMT. Next one. And shows -- you can just roll these slides -- these are the four variants that we used to impute to 87,000 samples. Next one. And this shows, you know, that there is actually, you know, ethnic difference in the prevalence of these variants, homozygosity state is obviously much more rare, but the heterozygosity, which is still influential, is significant. Next one. And this summarizes the data set in terms of the individuals we picked out, and these are rare variants so -- and if you take the next slide, which shows the -- sort of the accuracy from the variants that are typed, which is 99.8 percent to that of the imputation, and the Sanger sequence is actually a full plate of 94 samples. Next slide. And you can see better the accuracy there, that, you know, for the homozygote state, I mean, it's not perfect. There were a few individuals predicted to be heterozygous, and one missed from each of the standpoint. Next slide. I think we're coming to the end here. So, the accuracy here of imputation is obviously not perfect, but still, you captured the vast majority of the patients. Next slide. And for the prospective sort of direction -- next slide -- what we were going to propose was sort of this cost-effective, inexpensive custom-based genotyping. Next slide. And we have, you know, done this for a couple of projects before, proposed it for a third project that may or may not go. Next slide. And this is the -- sort of the contents of the organ transplant chip. This is basically taking all putative damaging variants, all copy number variants, GWAS loci and sort of content that is available from the public domain and other sites where we could access data, and typing across 25,000 to 30,000 samples. Next slide. And so this would be sort of the proposal for a future eMERGE chip in that setting, which will obviously allow us to integrate the -- you know, every single sample by typing them on the same chip with informative content of data set that sort of can go as low as 0.1 percent frequency as was discussed before. And the last slide, sort of, again, just sort of allows all this sort of coordinated effort to take place. So I'll stop here. Thanks.