Tcga - Comparative mutational analysis in frozen and ffpe tumor samples - Gaddy getz

Gad Getz: Hi, there. Thank you, John. As you see, I'm not Kristin Ardlie, but you all know me so -- and I can go -- how do -- hmm. Where's the next thing? Okay. So why do we want to use FFPE samples? Clearly, there are a very large number of sample data banks and tissue banks in the Biorepositories worldwide. So these are in FFPE. These samples often have rich, clinical information in histological, pathological, and follow-up clinical data. So we want to use FFPE. And that could -- if we could use FFPE samples, that could fill the accrual gap in TCGA, in particular the future of TCGA, and we heard Lou Staudt say, "We need to get to 10,000 patient per tumor type." So if we need to do that, we need to go to FFPE, if we don't want a 20-year project. So another reason we want to move to FFPE is because these remain the standard clinical practice when you take biopsies and samples out of patients. And if you want to put clinical sequencing into, you know, the clinic, basically sequencing into the clinic, you want to be able to work with FFPE. So there are challenges working with FFPE. There -- of course, it's difficult to extract from the samples, you need to deparrafinize them and de-cross-link the protein and DNA. The physical size of the samples with the paraffin block is sometimes small, and there's the yield of the DNA that you can generate out of them is problematic. There's clearly poor quality of the DNA that comes out of it, as you can see, from these kind of smeared gel runs. You can see that the FFPE DNA fragments are broken, compared to what you can see from a fresh frozen. So here, for example, the size problem. In TCGA and Nationwide Children's Hospital separated the blocks, the sample sizes into three categories: the large tissues, the medium, and the small tissues. And if you go to clinical samples, you can see small tissues, very small tissues, tiny tissues, and don't know where the tissue is. [laughter] So that's -- this is kind of what you could get from clinical samples. So what data sets we have. We have TCGA prostate 4 FFPE tumor samples, and the fresh frozen tumor, and the blood normal. We call these trios. This is kind of the icon depicting that. We have the tumor from the frozen, the tumor from the FFPE, and the blood normal. And when we call mutations, we can compare the tumor to the -- frozen to the normal or the FFPE tumor to the normal. There's also breast trios, which I actually won't talking about today. And there's lung cancer data that we have from the Broad, 17 FFPE tumor normal pairs. Here we have both the FFPE, both tumor and normal, and the frozen, both tumor and normal. So those are kind of quartets. So I will talk on both of the prostate and the lung. So what are the questions we want to answer? There are seven -- six questions I want to answer in this few minutes that I have. So can we get high quality exome sequencing data from FFPE samples? Can we detect mutations in FFPE samples and are they artifacts due to kind of the fixing procedure? Can we detect copy number data based on the extra exome sequencing of FFPE samples? Are we finding the same mutations in those trios or quartets between FFPE and frozen? Can we perform a cancer genome project using FFPE samples? And finally, can we use those FFPE in the clinic? So are we getting similar library sizes? So what you can see here is actually the coverage on FFPE versus the frozen. And in these kind of cases, actually, we sequence deeper in the FFPE so you can see actually the coverage is a bit higher in the FFPE compared to the frozen, but in general there are -- they look even better here, the FFPE because we sequenced them deeper. For library size, actually, the frozen has a slightly, I mean, actually double the library size, but even hundred million molecules in library size is well above what we need in order to sequence deep, so there's no problem in getting a deep enough library to sequence. Here we can talk about the coverage on frozen versus FFPE, and you can see these are all the targets that we try to capture. And these are different -- these are the TCGA prostrate cancers. And you can see that most of the targets that are captured well in the frozen are also captured well in the FFPE. And the ones that are not captured well in the frozen are not captured well in the FFPE, so it looks kind of consistent. And this is kind of the coverage criteria of 14 and 8 reads in the tumor are normal. The same thing we see for the lung cancer, so the coverage is not an issue. Then in terms of the number of mutations that we find. So in the four prostate samples, the frozen -- we do this comparison of the frozen. We find 135 mutations in a total territory of 130 megabases, and in the FFPE, 137 mutations in 130. So, you know, these are very similar numbers. In the lung data we find 5,332 mutations in the frozen and 5,013 in the FFPE, and the territory is similar. So it seems like we're finding similar number of mutations. What are the pattern, the spectra of mutations that we're finding? So here's the prostate and the lung, and as you can see -- the FFPE is the blue and the frozen is the red -- we find the same spectra of mutation. So not only this number is the same, the distribution of mutations are the same, meaning they are not dominated by some artifact caused by the fixation because otherwise it won't follow the distribution of real somatic mutations. So we are happy that there are really no artifacts in FFPE. Can we find copy number changes? So we are using an algorithm we developed called CapSeg that generate copy number from -- segmented copy number from captured data. And here what you could see is, actually, it's before segmentation. Every point here is a target exome, and you can see here the copy number. Here's an example. You can see the frozen and you can see the FFPE. They look pretty much the same. Actually, there's a region here that doesn't look the same, and I'll talk about it in a second, what does that mean. But there's a -- here's the frozen and the FFPE. The noise level look the same so we feel that we could do copy number from FFPE as well. So now the big questions, are we finding the same mutations? So when we look at those Venn diagrams, those 3,000 versus 3,300, we -- actually, this is actually from matching 19 lung. So we see, actually, only 44 percent overlap. So what's going on? Everything looks good and now we see only 44 percent overlap; that something is weird going on here. The same thing we see, actually, prostate is even worse. So now we think about it and we come to this fundamental observation that we all need to understand, and this is the biggest take-home message from this talk, is that when we do this comparison, we make two different changes. One is FFPE and one is frozen. This is what we want to compare, but the other difference that exists is this is one part of the tumor and this is another part of the tumor. And when we compare different parts of the tumor, they have different purities, and then they have different sub-clonalities of mutations and things like that. So -- and we can't distinguish between the two. There's FFPE -- so now how can we analyze it in a more careful way to really do the comparison between the FFPE and the frozen? And this kind of problem affects any comparison, a DNA level, RNA level, methylation level, protein level; any comparison will be affected by these problems. So just to remind you, this is a slide that you all have seen from me many times. When you clone mutations, you -- the ability to find mutations depend on the allelic fraction of the mutation, which depends by itself on the purity of the sample. And then given the allelic fraction, the number of reads, the coverage and the sequencing tells you what's the probability of finding the mutation. So if you have two different pieces of the tumor, that one has one purity and the other one has a different purity, you don't have the same chance of finding the mutation, even if it had the exact same -- if it was clonal, basically appeared in all cancer cells in one side then the other. And indeed when we look at the samples, the reason that we don't find -- so, for example, some samples have reasonably the same purity, and we find most of the mutations in both of them. But some samples have a very different purity between the FFPE and the frozen, and then we basically don't find any of the mutations; either they're red or they're all blue. And this would explain -- so we need to look sample by sample, explains those differences. So how could we do it in a better way? So there's another problem. Everything could be clonal mutation, but what happens if there are sub-clonal mutation? So a sub-clonal mutation could be different proportion in one side or the other, or does not exist even in one side or the other. So we need to distinguish between clonal mutations and sub-clonal mutations. So we have a method called ABSOLUTE that could actually distinguish between clonal mutations and sub-clonal mutations. And in fact, in ovarian cancer, half of the mutations are sub-clonal, and it's similar in many other cancers. So how can we compare frozen to FFPE? First of all, we don't really need to call the mutations independently in one of them. So once we find the mutation in FFPE, we could ask, does it validate in the frozen? And for that we don't need to use the high, strict criteria that we use when we call mutation because we only test a few sites. So we might need only to require two reads, or three reads, or a few reads that support the mutation to validate the mutation. Then we need to correct for the different allelic fraction that exists between the two different samples. One of frozen and one of FFPE could have different purities. So we need to fit these lines to these little scatter plots that you saw before. Once we fit this allelic fraction, we need also to calculate the power to validate based on the coverage of the sequencing. And we could look at the power to the 80 percent power line, or the 95, or the 99 power line, and then say -- distinguish between sites that we could not validate between ones that don't validate. There's a big difference: cannot validate and don't validate. And then, finally, we need to distinguish between clonal mutation and sub-clonal mutations. So if we do all of that, what's the picture? So let me skip those two slides, which tells you how many reads, actually, you need in order to validate. For example, if you have a mutation at 5 percent allelic fraction, in order to have 80 percent power, you need 60 reads to have 80 percent power to validate it. And if you have -- to validate you need to see two reads of the mutation in another sample. If you -- if you want to seek five times, you need 135 reads to see the mutation with 80 percent chance. And if you reach the -- if you want to use 95 percent power, you need 93 reads for 5 percent allelic fraction or 180 reads. So you need deep coverage to really be able to validate. So what happens when you take all these kind of things into account? The green bar here represents what we've validated. The red, what we did not validate at sites that had more than 80 percent power. The grey represents those sites that we didn't have power. So, previously, we divided the green by the total bar, but now we need to divide the green just by the red -- green plus the red. And as you can see, when we look at lower and lower allelic fraction, we actually see that the number of grey, the size of the grey is actually larger at low allelic fraction. This is if you use all the mutation. If you use only the clonal mutations, you can see that the green is even higher because those are mutations that are predicted to be clonal. All these low allelic fraction mutations are actually sub-clonal mutations that we don't expect to be in the two sides of the samples. If we do 95 percent, it even is more dramatic and the success rate goes up. So here look at this line, which is very faint, but if you take only the clonal mutations and use the 95 percent power, you see that actually, they are very close to 100 percent that we find the mutations in both sides. So actually, we can use FFPE in samples to find mutations. It's just that we need to be careful in analyzing them. Finally, can we use that to do genome sequencing, and run cancer genome project, and run MutSig? I'll just say that, yes, we find the same list of significant genes. And this is an old version of MutSig, but the same lists when using the frozen and the FFPE. Can we use it for clinical? Yes, when we look at all these known cancer genes in the lung, we find them in the FFPE and the frozen, so it could be used for clinical sequencing. So finally, the conclusions. We could perform exome sequencing in FFPE, we can calculate the overlap between FFPE and frozen sample controlling for the clonality and the coverage. The mutation rates are the same. Sub-clones can contribute to the differences. We can perform cancer genome projects using FFPE for sequencing, at least, and copy number based on exome. We could use it in the clinic. And we need to have more samples to really reach the final conclusions. There's still more challenges for whole genome sequencing. The low yield is also a problem. As you saw, there are very, small, tiny samples that we need to address. And older blocks are problematic because roughly 10 years ago, people changed from using an unbuffered to a buffered formalin. And the unbuffered formalin was kind of problematic to the DNA/RNA so roughly things from 10 years ago and beyond would be better. So just I want to thank Kristin Ardlie, which also kind of -- is kind of expert in extracting these samples and understands all about FFPE. And Petar Stojanov, Andrey, and Scott for doing the analysis. Thank you. [applause] John Weinstein: That's obviously an extremely, important question. I wish we had more time for more of the detail. Is there one quick question? Yeah. Male Speaker: Yeah, this problem of small clinical samples is the rule rather than the exception. What -- is there any possibility to amplify, I mean, and if so, how does that change copy number mutations? I know it's a huge subject. Can you just say "never do that," or "it's possible"? Gad Getz: I would prefer to say never do that. We -- Male Speaker: [laughs] That's what I was afraid. Gad Getz: The reason that I'm saying that is because we -- the technology to go to lower and lower input DNA exists, and we could sequence down to lower and lower. And now the standard is under 100 nanogram, but we could go lower than that, and that's the way we need to invest rather than amplify with kind of weird artifacts of whole genome amplification that we have before. Yeah. Okay, thank you.