Tip:
Highlight text to annotate it
X
Well, thank you, Rob, and thanks to the organizers for inviting me to speak today. Iím going
to tell you a little bit about some of the work weíre doing at Sick Kids Hospital. And
like the last presenter who offered a really nice segue into my talk, we are interested
in looking at whole exome sequencing from the Ion Protonô sequencer for detection of
medically relevant variants. This is just a legal slide. Just to give some context,
I wanted to talk a little bit about where we are right now and where weíre going. So
this is data from our genome center and diagnostic clinical lab as well, looking at the number
of experiments that weíve done over the years. When I arrived as a post doc with Steve Scherer
in 2005, we were really interested in using microarray technology for finding relevant
copy number variants in individuals and this is just the number of high resolutions, microarrays,
mostly sniffer arrays, have been running at the genome center over the years, so weíre
still running around 8,000 or 9,000 a year. This value here is the number of clinical
microarrays that are run in a diagnostic lab, and this is with Jim Servopolous and Peter
Ray. This inflection point here is when the test was -- the Ministry of Ontario repatriated
the test, and it was started to be run back in Canada, and thatís largely, or in part,
at least, due to some of the work that weíve done, working closely with the diagnostic
lab in translating some of our experiences of array results.
And so where are we now? You know, we havenít run a lot of samples, but weíre running around
a thousand a year, in whole exome sequencing at the genome center. And whole genome sequencing
is something that, weíre not really a large production lab, and itís easier for us to
send it out, but as part of many research projects, weíre doing a lot of whole genome
sequencing as well. And so what our goal is obviously is to translate this type of technology
into the clinical space. So of the many research projects that weíre exploring the utility
of these, and in the future want to use them for the diagnostic in pediatric cases for
diagnostic use. And Sick Kids has created this kind of virtual center, the Center for
Genetic Medicine, and essentially this is a testing ground for doing these types of
things. And so what is that? So the Sick Kids Center
for Genetic Medicine, as I said, itís a sort of a virtual center, and one of the research
projects within it is something called the Genome Clinic, and this is really aimed at
providing evidence for the responsible introduction of whole genome sequencing into the future
clinical care of children at Sick Kids, interested in looking at the analytical and future clinical
validity, the clinical utility of doing these tests, a lot of the ethical, legal and social
implications around these results, and also the health economics. And really what this
is, itís a research project thatís a testing ground for different technologies, and in
the future will be diagnostic. And so this is how weíre sort of thinking about it, at
least from the Genome Center perspective. As I said, this is a whole genome sequencing
project but for right now itís not really feasible to do in-house, and so like many
people, weíre evaluating lots of different technologies, including whole exome sequencing
as sort of a stopgap, lower cost, faster turnaround time, etc. and this will allow us to compare
the whole genome sequencing thatís done as well. So the Ion Protonô sequencer is obviously
ideal alternative, doing traditional targeted panel sequencing, for clinical research. It
also maintains the possibility of discovering mutations in genes that are previously not
associated with a disorder or gene negative cases. And this is sort of how weíre thinking
about the project. And so this is just a simple sort of schematic about our workflow or how
weíre thinking about things, so whole genome sequence, whole exome sequence, an individual
thatís signed up for our research study, if itís a known gene, you do some sort of
in silico panel, so for whatever the indication is, and I should say that most of these disorders
that weíre choosing are ones that typically would have a gene panel that one would typically
send out as part of standard clinical care, and in this sense, weíre just looking for
the known genes using in silico panel. If it happens to be a positive result, then
you might have some sort of genetic variant that is probably related to the disorder,
but as I said, most of these individuals are part of a research study, and they also have
clinical panel testing as part of standard care, so we can do that kind of comparison.
If there isnít any mutation, then obviously you want to look at other known genes that
are associated with disorders, so often we see this expansion of phenotype where in some
cases it might be a related disorder and/or the gene panel just didnít exist and there
are other known genes, so this is sort of a refinement of analysis and/or expansion
of phenotype. If itís no, if itís completely novel, then we get into gene discovery mode,
and looking at novel disorders. So even through this matter, even though probably 50% of the
individuals will have something in one of these two categories, there is opportunity
for gene discovery as well. And really what weíre using here is whole exome sequencing
in the Ion Proton as a rapid and low cost approached to sequence genes quickly.
And so I just wanted to give you a couple examples in this project here, so these are
examples of research subjects one, two and three. Hereís the different prior clinical
diagnoses, so FSGS, oneís a cone-rod dystrophy and ocular albinism, these would be the genes
that would be tested as part of normal clinical standard of care testing, and these were all
sent as part of our research project to Complete Genomics but weíve also sequenced them using
Ion Protonô technology at the Center for Applied Genomics as well. And just to give
you some kind of sequencing results and metrics here, weíve been using and evolving as weíve
been going, obviously, so the first two were TargetSeqô enrichment, the last one was AmpliSeqô,
as youíve heard about, these are all on the P1V2 chip. The total number of map bases,
average read lengths are here. Total reads coverage around 100, the AmpliSeqô is higher
here, this happened to be one sample on one chip, and now weíre routinely running two
of course. These are some of the metrics here, the uniformity of coverage, so 2X to 30X coverage,
and I would say that in all of these cases these numbers are a little bit lower because
it was sort of the first time that we were running these experiments, especially for
the AmpliSeqô, and Iíll tell you about that in a second.
But for TargetSeqô, generally now, were somewhere around 10 to 12 gigs of raw data, more than
100X coverage, and 30X coverage is usually above 90%, and thatís just with one sample
per chip. But weíre now running mostly AmpliSeq, two samples per V2 chip, P1V2 chip, and we
typically, again, have greater than 100X coverage in the 30X above 90. This really low number
here was because of actually one of our PCR machines and there was really unbalanced reads
in some of the pools, which was decreasing the uniformity of coverage. So what kind of
results are we getting from these samples? So this is done mostly with Torrent Serverô
3.6 and different types of parameters, a total number of calls, etc. so thereís nothing
too astonishing here. As you know, thereís been a lot of movement and a lot of grounds
made and improvements in the way that these samples have been called, so we keep recalling
them, but this just happens to be the numbers from earlier.
So, at this point we were kind of using a custom annotation and filtering pipeline that
was developed at the Center for Applied Genomics, so annotation and filtering mostly done with
this custom pipeline. Variant annotation with Anovar and SNPeff to annotate the variants.
Filtering was mostly based on population allele frequencies like everybody does, including
our in-hours database, and prioritizing damaging mutations in these candidate genes that were
in the panel and so hereís some results. For the three different subjects in the research
study, again, the different diagnoses, and the gene panel tests. The first one here was
negative, from whole exome sequencing, in the panel itself in those genes that we tested,
there was nine variants but most of these didnít really fit the diagnosis. We did find
an obvious candidate that was just outside of what would be tested on this gene panel.
This is a fossil IPC gene, and it fit the diagnosis. It was a homozygous stop mutation
that had been described before. In this case, cone-rod dystrophy, the gene panel results
came back, PROM1 was probably the gene that was causing this disorder. We found it using
the whole exome sequencing, these happened to be two different indels and they were easily
called. We found 16 variants in this panel, on the
whole exome sequencing. The PROM1 like I said was detected. There was also a mutation in
this X-Linked CACNA1F that is responsible for night blindness and also might be contributing
to the phenotype, so this was outside the panel but also may be contributing. In the
third case it was negative on the gene panel results. We couldnít find anything, either
based on the panel and thereís some candidates but nothing explaining the phenotype at this
point. In general there was decent concordance or good concordance of calls from our gene
plan and the whole exome sequencing. And this is just to show ñ itís kind of a little
bit hard to see ñ but this was a shot that I had to put in here relatively quickly. We
were able to do some analysis with the new Ion Reporterô, and this is just the subject
here. Itís hard to see, but what we did was just simply put in the genes of interest,
there was four different genes, this was for subject two, and we filtered on a minor allele
frequency of less than 5%, and it only came up with four mutations, and thpe two of the
mutations in the PROM1 happened to be here for this individual. So instead of going through
a sort of custom bioinformatics pipeline using ñ that weíve developed at TCAG, this was
just as easy, obviously. A couple clicks and youíre done.
And for the remaining part of my talk, Iíll just talk a little bit about how I see a lot
of sort of this genomic medicine happening. This is an interesting case. This is a subject
with a prior diagnosis of Adams-Oliver Syndrome, and so he was born to nonconsanguineous European
parents, only one child. Adams-Oliver is characterized, a congenital absence of skin, itís usually
limited to the scalp vertex and thereís also transverse limb defects that sort of range
in severity. The clinical features are highly variable, it can also lead to ñ or also have,
or include vascular effects and congenital cardiac malformations, etc. Right now, in
the literature, at the time that we looked at this, there was three known genes, two
dominant and one recessive that were known. So this was a case that was presented by one
of the clinicians at our Thursday case conference and being these three genes arenít actually
offered as a clinical test, we decided to enroll as a research project, I thought it
would be an interesting sample to sequence as part of our research efforts and at the
time we had just had ñ this was December of last year ñ we had just had the installation
of the four Protonsô in the genome center, so I thought this would be a good test case.
So it was sequenced on a P1ô chip, TargetSeqô enrichment, so this was P1ô, I think the
version 1 of the chip. These are the sort of mean depth of coverage, etc. Analysis was
with 3.4. Hereís the variance in the known genes. We found, you know, a couple variants
in each of the genes, nothing interesting, all of them were over 20% allele frequency
so at this point we thought there was no variance in the known gene that could explain the syndrome
at this point. So now what? So it was negative for the known genes, some possibilities obviously
as you start going through this and going over it with the clinician. Maybe the variant
wasnít detected, it was either a coverage issue or there was a larger indel or CNV that
wasnít detected, the subject has variance in a new gene for AOS, so this where we can
go into this kind of gene discovery mode, or maybe the subject doesnít have AOS, but
a related disorder. See, these are all sort of possibilities, in speaking with the clinician.
And so obviously with any exome or capture sequencing experiments, some exons are not
sequenced that well. Well, this just happens to be one of the genes: Dock6, exon 15. You
can see here that thereís coverage, the percent target above 15X of the base pairs here is
about 70% so you can see thereís some of the exon missing. So what we did is went ahead
and did some Sanger Sequencing, sort of to backfill, just to make sure that we werenít
missing anything out of the total of 85 targets. I think we re-sequenced 10 exons, some of
them were in the same amplicon, and found only one additional variant, this was in the
5'UTR of doc 6, and this happened to be a common snip. We also did a high resolution
genotyping using a Illumina Omni2.5M SNP array, subsequent CNV-analysis also found no copy
number variants that we could link to the disorder. So obviously itís conceivable that
there was a variant thatís not detected by this type of technology and you might need
something else, but we were pretty confident that itís not one of these genes.
And so whatís interesting is that as we were doing this experiment, a new gene was published,
and so this was a new gene published in American Journal in sometime around April, this is
a fourth gene, itís called EGOT, itís another recessive form of the disorder. At this point
obviously we could go back, look at our exome data and see if this is the causative gene
in this case. We looked. Our subject has a lot of coverage in this region, I think thereís
only 2 exons in this gene and there happened to be no variance at all detected. So again,
highlighting that the whole exome sequencing sort of allows us expansion as gene list and
discoveries are made. Our case did not appear to have a variant here in any one of the four
known genes. So then weíre moving on into searching for
a novel AOS gene, and as many of you know when youíre doing these types of experiments,
you find lots of rare variants. In this particular case, you had over 200 rare coding variants,
16% of them or so were related to an OMIM disorder, OMIM Morbid Map disorder, so now
weíre at a stage, could this be a novel Adams-Oliver gene or is it perhaps not Adams-Oliver and
something different again? Because the mode of inheritance isnít known, dominant or recessive,
we sequence both parents, and this is obviously something that you could have done in the
beginning. Two Life Tech developments were happening around this time, and one of them
being the AmpliSeqô whole exome sequencing we started to use, faster, less DNA, less
expensive than TargetSeqô enrichment, so we were happy to try that except that since
the proband had been sequenced with the TargetSeqô, we decided to do the parents with TargetSeqô
as well just to keep things as consistent as possible.
And of course the Ion Reporterô 4 was sort of coming out with this trio analysis. So
we sent the data to Life, this is analysis done mainly by them, Fiona Hyland and others,
so this is a screen shot, I think this might be from 1.6, but essentially we were going
on the assumption that this is probably a dominant, or a new dominant, mutation so this
is gene category type here. You can filter on other things like gene symbol and functional
impact and location, etc. This pulled out, assuming, a new dominant. It pulled out 12
different variants. This is a list of them here. It just tells you the number of variants
fitting that category, that genetic category, minor allele frequency is low. These are all
new dominant mutations or putative dominant mutations. Only one of them actually looked
really interesting and this happened to be in a gene known to cause a disorder and this
was a glycine to glutamic acid mutation in this gene ACVR1, which happens to be a bone
ñ a morphogenic receptor protein. Looking at the alignments, this looked really
good. We were pretty confident that it was a real event. To be sure, obviously we went
on and did Sanger Sequencing, and of course this is confirmed as a de novo variant in
this individual. And hereís the interesting thing is that when you go back and look, this
variant is actually associated or mutations in this gene are associated with a disease
called Fibrodysplasia ossificans progressiva, or FOP for short. So this is a rare disorder
characterized by a physical handicap due to progressive ectopic ossification, malformed
big toes that are often monophalangic, so youíve got these limb defects and this increasing
ossification. Occasional features also included short limbs and the fifth finger clinodactyly,
other things as well, all to do with bones. Scalp baldness was one of them, and also mild
intellectual disability. So many of these features overlapped with Adams-Oliver Syndrome.
We went back to the clinician after we validated this variant, said what do you think? He started
looking through the files and became more and more convinced that this was probably
the variant that was causing the phenotype in this individual. So in fact it wasnít
Adams-Oliver Syndrome, it was a variant of FOP. When you go to OMIM and read about FOP,
itís not even close to what this individual has but sort of buried in the literature,
you can see that thereís some variants that are in different domains of this particular
gene that can cause a variant of FOP. And so in this way really what weíre doing
ñ individuals have a phenotype, weíre genotyping them quickly looking for known genes. Youíve
got to interpret and integrate all this type of information and go back to phenotype all
the time, and this is exactly what happened in this case, going back to the clinician
in this research project and looking at how this might fit. And this can be done, in my
experience, in several iterations with people, and clinicians, to sort of interpret what
variants mean. So genome sequencing often identified variants that can only be interpreted
when going back to the phenotype. So just some summary and observations. Sick
Kids Center for Genomic Medicine has a genome clinic that weíre using as a research project,
and really it is a testing ground, and especially to look at different technologies in the future
that will be diagnostic. Weíre using a whole exome sequencing, lower cost and turnaround
time compared to genome sequencing at this point. Ion Reporterô (Protonô) sequencer
as an alternative, really a way to effectively sequence a lot of genes really quickly and
get good results. So far results are good concordance with gene panel testing and offer
the ability to find other variants that may be contributing to the phenotype. So this
is really ñ we see it obviously as everybody else does as getting genome sequence or getting
sequence as a tool to aid in the future clinical diagnosis of individuals. With that, Iíd
just like to thank the group at the Hospital for Sick Children, Center for Applied Genomics,
the clinicians and individuals in the diagnostic center that worked on this, and a lot of people
at Life Technologies that helped make a lot of this happen. Thank you.