Future Clinical Applications of Exome Sequencing Using The Ion Proton™ System

Well, thank you, Rob, and thanks to the organizers for inviting me to speak today. Iím going to tell you a little bit about some of the work weíre doing at Sick Kids Hospital. And like the last presenter who offered a really nice segue into my talk, we are interested in looking at whole exome sequencing from the Ion Protonô sequencer for detection of medically relevant variants. This is just a legal slide. Just to give some context, I wanted to talk a little bit about where we are right now and where weíre going. So this is data from our genome center and diagnostic clinical lab as well, looking at the number of experiments that weíve done over the years. When I arrived as a post doc with Steve Scherer in 2005, we were really interested in using microarray technology for finding relevant copy number variants in individuals and this is just the number of high resolutions, microarrays, mostly sniffer arrays, have been running at the genome center over the years, so weíre still running around 8,000 or 9,000 a year. This value here is the number of clinical microarrays that are run in a diagnostic lab, and this is with Jim Servopolous and Peter Ray. This inflection point here is when the test was -- the Ministry of Ontario repatriated the test, and it was started to be run back in Canada, and thatís largely, or in part, at least, due to some of the work that weíve done, working closely with the diagnostic lab in translating some of our experiences of array results. And so where are we now? You know, we havenít run a lot of samples, but weíre running around a thousand a year, in whole exome sequencing at the genome center. And whole genome sequencing is something that, weíre not really a large production lab, and itís easier for us to send it out, but as part of many research projects, weíre doing a lot of whole genome sequencing as well. And so what our goal is obviously is to translate this type of technology into the clinical space. So of the many research projects that weíre exploring the utility of these, and in the future want to use them for the diagnostic in pediatric cases for diagnostic use. And Sick Kids has created this kind of virtual center, the Center for Genetic Medicine, and essentially this is a testing ground for doing these types of things. And so what is that? So the Sick Kids Center for Genetic Medicine, as I said, itís a sort of a virtual center, and one of the research projects within it is something called the Genome Clinic, and this is really aimed at providing evidence for the responsible introduction of whole genome sequencing into the future clinical care of children at Sick Kids, interested in looking at the analytical and future clinical validity, the clinical utility of doing these tests, a lot of the ethical, legal and social implications around these results, and also the health economics. And really what this is, itís a research project thatís a testing ground for different technologies, and in the future will be diagnostic. And so this is how weíre sort of thinking about it, at least from the Genome Center perspective. As I said, this is a whole genome sequencing project but for right now itís not really feasible to do in-house, and so like many people, weíre evaluating lots of different technologies, including whole exome sequencing as sort of a stopgap, lower cost, faster turnaround time, etc. and this will allow us to compare the whole genome sequencing thatís done as well. So the Ion Protonô sequencer is obviously ideal alternative, doing traditional targeted panel sequencing, for clinical research. It also maintains the possibility of discovering mutations in genes that are previously not associated with a disorder or gene negative cases. And this is sort of how weíre thinking about the project. And so this is just a simple sort of schematic about our workflow or how weíre thinking about things, so whole genome sequence, whole exome sequence, an individual thatís signed up for our research study, if itís a known gene, you do some sort of in silico panel, so for whatever the indication is, and I should say that most of these disorders that weíre choosing are ones that typically would have a gene panel that one would typically send out as part of standard clinical care, and in this sense, weíre just looking for the known genes using in silico panel. If it happens to be a positive result, then you might have some sort of genetic variant that is probably related to the disorder, but as I said, most of these individuals are part of a research study, and they also have clinical panel testing as part of standard care, so we can do that kind of comparison. If there isnít any mutation, then obviously you want to look at other known genes that are associated with disorders, so often we see this expansion of phenotype where in some cases it might be a related disorder and/or the gene panel just didnít exist and there are other known genes, so this is sort of a refinement of analysis and/or expansion of phenotype. If itís no, if itís completely novel, then we get into gene discovery mode, and looking at novel disorders. So even through this matter, even though probably 50% of the individuals will have something in one of these two categories, there is opportunity for gene discovery as well. And really what weíre using here is whole exome sequencing in the Ion Proton as a rapid and low cost approached to sequence genes quickly. And so I just wanted to give you a couple examples in this project here, so these are examples of research subjects one, two and three. Hereís the different prior clinical diagnoses, so FSGS, oneís a cone-rod dystrophy and ocular albinism, these would be the genes that would be tested as part of normal clinical standard of care testing, and these were all sent as part of our research project to Complete Genomics but weíve also sequenced them using Ion Protonô technology at the Center for Applied Genomics as well. And just to give you some kind of sequencing results and metrics here, weíve been using and evolving as weíve been going, obviously, so the first two were TargetSeqô enrichment, the last one was AmpliSeqô, as youíve heard about, these are all on the P1V2 chip. The total number of map bases, average read lengths are here. Total reads coverage around 100, the AmpliSeqô is higher here, this happened to be one sample on one chip, and now weíre routinely running two of course. These are some of the metrics here, the uniformity of coverage, so 2X to 30X coverage, and I would say that in all of these cases these numbers are a little bit lower because it was sort of the first time that we were running these experiments, especially for the AmpliSeqô, and Iíll tell you about that in a second. But for TargetSeqô, generally now, were somewhere around 10 to 12 gigs of raw data, more than 100X coverage, and 30X coverage is usually above 90%, and thatís just with one sample per chip. But weíre now running mostly AmpliSeq, two samples per V2 chip, P1V2 chip, and we typically, again, have greater than 100X coverage in the 30X above 90. This really low number here was because of actually one of our PCR machines and there was really unbalanced reads in some of the pools, which was decreasing the uniformity of coverage. So what kind of results are we getting from these samples? So this is done mostly with Torrent Serverô 3.6 and different types of parameters, a total number of calls, etc. so thereís nothing too astonishing here. As you know, thereís been a lot of movement and a lot of grounds made and improvements in the way that these samples have been called, so we keep recalling them, but this just happens to be the numbers from earlier. So, at this point we were kind of using a custom annotation and filtering pipeline that was developed at the Center for Applied Genomics, so annotation and filtering mostly done with this custom pipeline. Variant annotation with Anovar and SNPeff to annotate the variants. Filtering was mostly based on population allele frequencies like everybody does, including our in-hours database, and prioritizing damaging mutations in these candidate genes that were in the panel and so hereís some results. For the three different subjects in the research study, again, the different diagnoses, and the gene panel tests. The first one here was negative, from whole exome sequencing, in the panel itself in those genes that we tested, there was nine variants but most of these didnít really fit the diagnosis. We did find an obvious candidate that was just outside of what would be tested on this gene panel. This is a fossil IPC gene, and it fit the diagnosis. It was a homozygous stop mutation that had been described before. In this case, cone-rod dystrophy, the gene panel results came back, PROM1 was probably the gene that was causing this disorder. We found it using the whole exome sequencing, these happened to be two different indels and they were easily called. We found 16 variants in this panel, on the whole exome sequencing. The PROM1 like I said was detected. There was also a mutation in this X-Linked CACNA1F that is responsible for night blindness and also might be contributing to the phenotype, so this was outside the panel but also may be contributing. In the third case it was negative on the gene panel results. We couldnít find anything, either based on the panel and thereís some candidates but nothing explaining the phenotype at this point. In general there was decent concordance or good concordance of calls from our gene plan and the whole exome sequencing. And this is just to show ñ itís kind of a little bit hard to see ñ but this was a shot that I had to put in here relatively quickly. We were able to do some analysis with the new Ion Reporterô, and this is just the subject here. Itís hard to see, but what we did was just simply put in the genes of interest, there was four different genes, this was for subject two, and we filtered on a minor allele frequency of less than 5%, and it only came up with four mutations, and thpe two of the mutations in the PROM1 happened to be here for this individual. So instead of going through a sort of custom bioinformatics pipeline using ñ that weíve developed at TCAG, this was just as easy, obviously. A couple clicks and youíre done. And for the remaining part of my talk, Iíll just talk a little bit about how I see a lot of sort of this genomic medicine happening. This is an interesting case. This is a subject with a prior diagnosis of Adams-Oliver Syndrome, and so he was born to nonconsanguineous European parents, only one child. Adams-Oliver is characterized, a congenital absence of skin, itís usually limited to the scalp vertex and thereís also transverse limb defects that sort of range in severity. The clinical features are highly variable, it can also lead to ñ or also have, or include vascular effects and congenital cardiac malformations, etc. Right now, in the literature, at the time that we looked at this, there was three known genes, two dominant and one recessive that were known. So this was a case that was presented by one of the clinicians at our Thursday case conference and being these three genes arenít actually offered as a clinical test, we decided to enroll as a research project, I thought it would be an interesting sample to sequence as part of our research efforts and at the time we had just had ñ this was December of last year ñ we had just had the installation of the four Protonsô in the genome center, so I thought this would be a good test case. So it was sequenced on a P1ô chip, TargetSeqô enrichment, so this was P1ô, I think the version 1 of the chip. These are the sort of mean depth of coverage, etc. Analysis was with 3.4. Hereís the variance in the known genes. We found, you know, a couple variants in each of the genes, nothing interesting, all of them were over 20% allele frequency so at this point we thought there was no variance in the known gene that could explain the syndrome at this point. So now what? So it was negative for the known genes, some possibilities obviously as you start going through this and going over it with the clinician. Maybe the variant wasnít detected, it was either a coverage issue or there was a larger indel or CNV that wasnít detected, the subject has variance in a new gene for AOS, so this where we can go into this kind of gene discovery mode, or maybe the subject doesnít have AOS, but a related disorder. See, these are all sort of possibilities, in speaking with the clinician. And so obviously with any exome or capture sequencing experiments, some exons are not sequenced that well. Well, this just happens to be one of the genes: Dock6, exon 15. You can see here that thereís coverage, the percent target above 15X of the base pairs here is about 70% so you can see thereís some of the exon missing. So what we did is went ahead and did some Sanger Sequencing, sort of to backfill, just to make sure that we werenít missing anything out of the total of 85 targets. I think we re-sequenced 10 exons, some of them were in the same amplicon, and found only one additional variant, this was in the 5'UTR of doc 6, and this happened to be a common snip. We also did a high resolution genotyping using a Illumina Omni2.5M SNP array, subsequent CNV-analysis also found no copy number variants that we could link to the disorder. So obviously itís conceivable that there was a variant thatís not detected by this type of technology and you might need something else, but we were pretty confident that itís not one of these genes. And so whatís interesting is that as we were doing this experiment, a new gene was published, and so this was a new gene published in American Journal in sometime around April, this is a fourth gene, itís called EGOT, itís another recessive form of the disorder. At this point obviously we could go back, look at our exome data and see if this is the causative gene in this case. We looked. Our subject has a lot of coverage in this region, I think thereís only 2 exons in this gene and there happened to be no variance at all detected. So again, highlighting that the whole exome sequencing sort of allows us expansion as gene list and discoveries are made. Our case did not appear to have a variant here in any one of the four known genes. So then weíre moving on into searching for a novel AOS gene, and as many of you know when youíre doing these types of experiments, you find lots of rare variants. In this particular case, you had over 200 rare coding variants, 16% of them or so were related to an OMIM disorder, OMIM Morbid Map disorder, so now weíre at a stage, could this be a novel Adams-Oliver gene or is it perhaps not Adams-Oliver and something different again? Because the mode of inheritance isnít known, dominant or recessive, we sequence both parents, and this is obviously something that you could have done in the beginning. Two Life Tech developments were happening around this time, and one of them being the AmpliSeqô whole exome sequencing we started to use, faster, less DNA, less expensive than TargetSeqô enrichment, so we were happy to try that except that since the proband had been sequenced with the TargetSeqô, we decided to do the parents with TargetSeqô as well just to keep things as consistent as possible. And of course the Ion Reporterô 4 was sort of coming out with this trio analysis. So we sent the data to Life, this is analysis done mainly by them, Fiona Hyland and others, so this is a screen shot, I think this might be from 1.6, but essentially we were going on the assumption that this is probably a dominant, or a new dominant, mutation so this is gene category type here. You can filter on other things like gene symbol and functional impact and location, etc. This pulled out, assuming, a new dominant. It pulled out 12 different variants. This is a list of them here. It just tells you the number of variants fitting that category, that genetic category, minor allele frequency is low. These are all new dominant mutations or putative dominant mutations. Only one of them actually looked really interesting and this happened to be in a gene known to cause a disorder and this was a glycine to glutamic acid mutation in this gene ACVR1, which happens to be a bone ñ a morphogenic receptor protein. Looking at the alignments, this looked really good. We were pretty confident that it was a real event. To be sure, obviously we went on and did Sanger Sequencing, and of course this is confirmed as a de novo variant in this individual. And hereís the interesting thing is that when you go back and look, this variant is actually associated or mutations in this gene are associated with a disease called Fibrodysplasia ossificans progressiva, or FOP for short. So this is a rare disorder characterized by a physical handicap due to progressive ectopic ossification, malformed big toes that are often monophalangic, so youíve got these limb defects and this increasing ossification. Occasional features also included short limbs and the fifth finger clinodactyly, other things as well, all to do with bones. Scalp baldness was one of them, and also mild intellectual disability. So many of these features overlapped with Adams-Oliver Syndrome. We went back to the clinician after we validated this variant, said what do you think? He started looking through the files and became more and more convinced that this was probably the variant that was causing the phenotype in this individual. So in fact it wasnít Adams-Oliver Syndrome, it was a variant of FOP. When you go to OMIM and read about FOP, itís not even close to what this individual has but sort of buried in the literature, you can see that thereís some variants that are in different domains of this particular gene that can cause a variant of FOP. And so in this way really what weíre doing ñ individuals have a phenotype, weíre genotyping them quickly looking for known genes. Youíve got to interpret and integrate all this type of information and go back to phenotype all the time, and this is exactly what happened in this case, going back to the clinician in this research project and looking at how this might fit. And this can be done, in my experience, in several iterations with people, and clinicians, to sort of interpret what variants mean. So genome sequencing often identified variants that can only be interpreted when going back to the phenotype. So just some summary and observations. Sick Kids Center for Genomic Medicine has a genome clinic that weíre using as a research project, and really it is a testing ground, and especially to look at different technologies in the future that will be diagnostic. Weíre using a whole exome sequencing, lower cost and turnaround time compared to genome sequencing at this point. Ion Reporterô (Protonô) sequencer as an alternative, really a way to effectively sequence a lot of genes really quickly and get good results. So far results are good concordance with gene panel testing and offer the ability to find other variants that may be contributing to the phenotype. So this is really ñ we see it obviously as everybody else does as getting genome sequence or getting sequence as a tool to aid in the future clinical diagnosis of individuals. With that, Iíd just like to thank the group at the Hospital for Sick Children, Center for Applied Genomics, the clinicians and individuals in the diagnostic center that worked on this, and a lot of people at Life Technologies that helped make a lot of this happen. Thank you.