Tip:
Highlight text to annotate it
X
Gad Getz: Hi, there. Thank you, John. As you see, I'm
not Kristin Ardlie, but you all know me so -- and I can go -- how do -- hmm. Where's
the next thing? Okay.
So why do we want to use FFPE samples? Clearly, there are a very large number of sample data
banks and tissue banks in the Biorepositories worldwide. So these are in FFPE. These samples
often have rich, clinical information in histological, pathological, and follow-up clinical data.
So we want to use FFPE. And that could -- if we could use FFPE samples, that could fill
the accrual gap in TCGA, in particular the future of TCGA, and we heard Lou Staudt say,
"We need to get to 10,000 patient per tumor type." So if we need to do that, we need to
go to FFPE, if we don't want a 20-year project.
So another reason we want to move to FFPE is because these remain the standard clinical
practice when you take biopsies and samples out of patients. And if you want to put clinical
sequencing into, you know, the clinic, basically sequencing into the clinic, you want to be
able to work with FFPE.
So there are challenges working with FFPE. There -- of course, it's difficult to extract
from the samples, you need to deparrafinize them and de-cross-link the protein and DNA.
The physical size of the samples with the paraffin block is sometimes small, and there's
the yield of the DNA that you can generate out of them is problematic. There's clearly
poor quality of the DNA that comes out of it, as you can see, from these kind of smeared
gel runs. You can see that the FFPE DNA fragments are broken, compared to what you can see from
a fresh frozen.
So here, for example, the size problem. In TCGA and Nationwide Children's Hospital separated
the blocks, the sample sizes into three categories: the large tissues, the medium, and the small
tissues. And if you go to clinical samples, you can see small tissues, very small tissues,
tiny tissues, and don't know where the tissue is.
[laughter]
So that's -- this is kind of what you could get from clinical samples.
So what data sets we have. We have TCGA prostate 4 FFPE tumor samples, and the fresh frozen
tumor, and the blood normal. We call these trios. This is kind of the icon depicting
that. We have the tumor from the frozen, the tumor from the FFPE, and the blood normal.
And when we call mutations, we can compare the tumor to the -- frozen to the normal or
the FFPE tumor to the normal. There's also breast trios, which I actually won't talking
about today. And there's lung cancer data that we have from the Broad, 17 FFPE tumor
normal pairs. Here we have both the FFPE, both tumor and normal, and the frozen, both
tumor and normal. So those are kind of quartets. So I will talk on both of the prostate and
the lung.
So what are the questions we want to answer? There are seven -- six questions I want to
answer in this few minutes that I have. So can we get high quality exome sequencing data
from FFPE samples? Can we detect mutations in FFPE samples and are they artifacts due
to kind of the fixing procedure? Can we detect copy number data based on the extra exome
sequencing of FFPE samples? Are we finding the same mutations in those trios or quartets
between FFPE and frozen? Can we perform a cancer genome project using FFPE samples?
And finally, can we use those FFPE in the clinic?
So are we getting similar library sizes? So what you can see here is actually the coverage
on FFPE versus the frozen. And in these kind of cases, actually, we sequence deeper in
the FFPE so you can see actually the coverage is a bit higher in the FFPE compared to the
frozen, but in general there are -- they look even better here, the FFPE because we sequenced
them deeper. For library size, actually, the frozen has a slightly, I mean, actually double
the library size, but even hundred million molecules in library size is well above what
we need in order to sequence deep, so there's no problem in getting a deep enough library
to sequence.
Here we can talk about the coverage on frozen versus FFPE, and you can see these are all
the targets that we try to capture. And these are different -- these are the TCGA prostrate
cancers. And you can see that most of the targets that are captured well in the frozen
are also captured well in the FFPE. And the ones that are not captured well in the frozen
are not captured well in the FFPE, so it looks kind of consistent. And this is kind of the
coverage criteria of 14 and 8 reads in the tumor are normal. The same thing we see for
the lung cancer, so the coverage is not an issue.
Then in terms of the number of mutations that we find. So in the four prostate samples,
the frozen -- we do this comparison of the frozen. We find 135 mutations in a total territory
of 130 megabases, and in the FFPE, 137 mutations in 130. So, you know, these are very similar
numbers. In the lung data we find 5,332 mutations in the frozen and 5,013 in the FFPE, and the
territory is similar. So it seems like we're finding similar number of mutations.
What are the pattern, the spectra of mutations that we're finding? So here's the prostate
and the lung, and as you can see -- the FFPE is the blue and the frozen is the red -- we
find the same spectra of mutation. So not only this number is the same, the distribution
of mutations are the same, meaning they are not dominated by some artifact caused by the
fixation because otherwise it won't follow the distribution of real somatic mutations.
So we are happy that there are really no artifacts in FFPE.
Can we find copy number changes? So we are using an algorithm we developed called CapSeg
that generate copy number from -- segmented copy number from captured data. And here what
you could see is, actually, it's before segmentation. Every point here is a target exome, and you
can see here the copy number. Here's an example. You can see the frozen and you can see the
FFPE. They look pretty much the same. Actually, there's a region here that doesn't look the
same, and I'll talk about it in a second, what does that mean. But there's a -- here's
the frozen and the FFPE. The noise level look the same so we feel that we could do copy
number from FFPE as well.
So now the big questions, are we finding the same mutations? So when we look at those Venn
diagrams, those 3,000 versus 3,300, we -- actually, this is actually from matching 19 lung. So
we see, actually, only 44 percent overlap. So what's going on? Everything looks good
and now we see only 44 percent overlap; that something is weird going on here. The same
thing we see, actually, prostate is even worse.
So now we think about it and we come to this fundamental observation that we all need to
understand, and this is the biggest take-home message from this talk, is that when we do
this comparison, we make two different changes. One is FFPE and one is frozen. This is what
we want to compare, but the other difference that exists is this is one part of the tumor
and this is another part of the tumor. And when we compare different parts of the tumor,
they have different purities, and then they have different sub-clonalities of mutations
and things like that. So -- and we can't distinguish between the two. There's FFPE -- so now how
can we analyze it in a more careful way to really do the comparison between the FFPE
and the frozen? And this kind of problem affects any comparison, a DNA level, RNA level, methylation
level, protein level; any comparison will be affected by these problems.
So just to remind you, this is a slide that you all have seen from me many times. When
you clone mutations, you -- the ability to find mutations depend on the allelic fraction
of the mutation, which depends by itself on the purity of the sample. And then given the
allelic fraction, the number of reads, the coverage and the sequencing tells you what's
the probability of finding the mutation. So if you have two different pieces of the tumor,
that one has one purity and the other one has a different purity, you don't have the
same chance of finding the mutation, even if it had the exact same -- if it was clonal,
basically appeared in all cancer cells in one side then the other.
And indeed when we look at the samples, the reason that we don't find -- so, for example,
some samples have reasonably the same purity, and we find most of the mutations in both
of them. But some samples have a very different purity between the FFPE and the frozen, and
then we basically don't find any of the mutations; either they're red or they're all blue. And
this would explain -- so we need to look sample by sample, explains those differences. So
how could we do it in a better way?
So there's another problem. Everything could be clonal mutation, but what happens if there
are sub-clonal mutation? So a sub-clonal mutation could be different proportion in one side
or the other, or does not exist even in one side or the other. So we need to distinguish
between clonal mutations and sub-clonal mutations. So we have a method called ABSOLUTE that could
actually distinguish between clonal mutations and sub-clonal mutations. And in fact, in
ovarian cancer, half of the mutations are sub-clonal, and it's similar in many other
cancers.
So how can we compare frozen to FFPE? First of all, we don't really need to call the mutations
independently in one of them. So once we find the mutation in FFPE, we could ask, does it
validate in the frozen? And for that we don't need to use the high, strict criteria that
we use when we call mutation because we only test a few sites. So we might need only to
require two reads, or three reads, or a few reads that support the mutation to validate
the mutation. Then we need to correct for the different allelic fraction that exists
between the two different samples. One of frozen and one of FFPE could have different
purities. So we need to fit these lines to these little scatter plots that you saw before.
Once we fit this allelic fraction, we need also to calculate the power to validate based
on the coverage of the sequencing. And we could look at the power to the 80 percent
power line, or the 95, or the 99 power line, and then say -- distinguish between sites
that we could not validate between ones that don't validate. There's a big difference:
cannot validate and don't validate. And then, finally, we need to distinguish between clonal
mutation and sub-clonal mutations.
So if we do all of that, what's the picture? So let me skip those two slides, which tells
you how many reads, actually, you need in order to validate. For example, if you have
a mutation at 5 percent allelic fraction, in order to have 80 percent power, you need
60 reads to have 80 percent power to validate it. And if you have -- to validate you need
to see two reads of the mutation in another sample. If you -- if you want to seek five
times, you need 135 reads to see the mutation with 80 percent chance. And if you reach the
-- if you want to use 95 percent power, you need 93 reads for 5 percent allelic fraction
or 180 reads. So you need deep coverage to really be able to validate.
So what happens when you take all these kind of things into account? The green bar here
represents what we've validated. The red, what we did not validate at sites that had
more than 80 percent power. The grey represents those sites that we didn't have power. So,
previously, we divided the green by the total bar, but now we need to divide the green just
by the red -- green plus the red. And as you can see, when we look at lower and lower allelic
fraction, we actually see that the number of grey, the size of the grey is actually
larger at low allelic fraction. This is if you use all the mutation. If you use only
the clonal mutations, you can see that the green is even higher because those are mutations
that are predicted to be clonal. All these low allelic fraction mutations are actually
sub-clonal mutations that we don't expect to be in the two sides of the samples. If
we do 95 percent, it even is more dramatic and the success rate goes up.
So here look at this line, which is very faint, but if you take only the clonal mutations
and use the 95 percent power, you see that actually, they are very close to 100 percent
that we find the mutations in both sides. So actually, we can use FFPE in samples to
find mutations. It's just that we need to be careful in analyzing them.
Finally, can we use that to do genome sequencing, and run cancer genome project, and run MutSig?
I'll just say that, yes, we find the same list of significant genes. And this is an
old version of MutSig, but the same lists when using the frozen and the FFPE.
Can we use it for clinical? Yes, when we look at all these known cancer genes in the lung,
we find them in the FFPE and the frozen, so it could be used for clinical sequencing.
So finally, the conclusions. We could perform exome sequencing in FFPE, we can calculate
the overlap between FFPE and frozen sample controlling for the clonality and the coverage.
The mutation rates are the same. Sub-clones can contribute to the differences. We can
perform cancer genome projects using FFPE for sequencing, at least, and copy number
based on exome. We could use it in the clinic. And we need to have more samples to really
reach the final conclusions.
There's still more challenges for whole genome sequencing. The low yield is also a problem.
As you saw, there are very, small, tiny samples that we need to address. And older blocks
are problematic because roughly 10 years ago, people changed from using an unbuffered to
a buffered formalin. And the unbuffered formalin was kind of problematic to the DNA/RNA so
roughly things from 10 years ago and beyond would be better.
So just I want to thank Kristin Ardlie, which also kind of -- is kind of expert in extracting
these samples and understands all about FFPE. And Petar Stojanov, Andrey, and Scott for
doing the analysis. Thank you.
[applause]
John Weinstein: That's obviously an extremely, important question.
I wish we had more time for more of the detail. Is there one quick question? Yeah.
Male Speaker: Yeah, this problem of small clinical samples
is the rule rather than the exception. What -- is there any possibility to amplify, I
mean, and if so, how does that change copy number mutations? I know it's a huge subject.
Can you just say "never do that," or "it's possible"?
Gad Getz: I would prefer to say never do that. We --
Male Speaker: [laughs] That's what I was afraid.
Gad Getz: The reason that I'm saying that is because
we -- the technology to go to lower and lower input DNA exists, and we could sequence down
to lower and lower. And now the standard is under 100 nanogram, but we could go lower
than that, and that's the way we need to invest rather than amplify with kind of weird artifacts
of whole genome amplification that we have before. Yeah. Okay, thank you.