Tcga - Tcga benchmark 4 - Evaluating sv and snv calls using cell line genomes - Adam ewing

Adam Ewing: Okay. So I'd like to start by thanking the organizers for giving me this excellent opportunity to present our call for participation in the mutation calling benchmark exercise we've put together, Mutation Calling: Benchmark 4. This is the fourth benchmarking exercise that TCGA has carried out. So I'm going to tell you a bit about how we've gone about setting this up, what the motivation is, and how to get involved. So, just briefly to go over the Mutation Calling Benchmark process, in case anyone is unfamiliar, we start out by selecting pairs of tumor and normal BAMs. BAMs contain short read alignments, and these BAMs are then distributed to the participants in the benchmarking exercise, and they call mutations and return mutations as VCF files. So, VCF stands for variant call file. It's just -- it's a standard widely-used method for expressing all variety of mutations in a unified file format that we really want to push people to express their mutations in. So, VCFs are collected and compared, compared for concordance and discordance across somatic, and we want to encourage people to submit germline calls as well. And so at the end of the day what we get is a picture of sort of where the field of mutation calling in cancer stands, and that's a really valuable thing. Just really briefly to give you a little bit of background and make sure we're all on the same page with the kinds of mutations I'm going to be talking about. I'm sure this is fundamental, but SNVs, or single nucleotide variants, so just base pair changes that define nucleotide positions, and also short insertions and deletions, less than 100 base pairs. Larger rearrangements like insertions, deletions, duplications, inversions, transductions are referred to collectively as structural variants, or SVs. And regions where genomes differ from diploid copy number differ from absolute allele count of two are referred to as copy number variants, or CNVs. And so since this is Benchmark 4, clearly there are three other benchmarks which occurred prior to this, and just to go through briefly the history of benchmarking efforts in TCGA. So Benchmark 1 was single nucleotide variant calling on six pairs of whole genomes. Benchmark 2 was single nucleotide variant calling on 14 pairs, tumor normal pairs of exomes. Similarly, Benchmark 3, again, single nucleotide variants on 25 pairs of exomes, this time with associated validation data, so deep sequencing data over selected regions to validate the presence of mutations. And so what I'm calling for participation for today is Benchmark 4, so in addition to single nucleotide variants we're going to take INDELs, SVs, and CNVs into account, and we're going to do this on whole genomes from -- derived from cell lines. So why is it important that we do another benchmark? So we've done three, why is it important to do another? Well, if we're going to accomplish the goal of comprehensively characterizing cancer genomes, TCGA has to get together and measure and set standards for the accuracy of mutation calls. And sort of toward this end in this benchmark, we're being more comprehensive about the variety of mutations that we're considering. So, like I said, in addition to single nucleotide variants, we want to extend this to INDELs, structural variants, and copy number variants to get the full spectrum of variation and evaluate how different mutation calling algorithms are performing across these different types of somatic variants. So as I'll talk about on the subsequent slides, Benchmark 4 really is a controlled experiment. So we have these cell lines. We can take advantage of their clonality to do things like simulate normal contamination, which -- so Gaddy's talk and Chris Miller's talk were great lead-ins for this. We can simulate subclonal expansions by using spike-in mutations. Spike-in mutations also give us the opportunity to evaluate false negative rates. So they give us sort of the ground truth and that hasn't been possible in previous benchmarking efforts. And since the cell line genome data is publically distributable, we can encourage wide participation, both within TCGA and outside of TCGA. So, for instance, we're reaching out to ICGC, and they're participating in this benchmark, and others outside of cancer genome consortiums who have an interest in mutation calling in this sort of tumor normal context are encouraged to participate. So further -- let's see, on this theme of why are we doing another benchmark? Well, so there's still a lot of discordance in the mutation calls that we get. So this is a -- sort of a representative example from a previous benchmark exercise, and so what is shown here on this Venn diagram there's calls on the same pair of tumor normal BAM file made by the Broad Institute, by Wash U, and by UCSC. And you can see sort of the concordance and discordance here in this Venn diagram. So it's sort of a majority of mutations are concordant between at least two of the centers, but there's still a lot of discordance happening, and this is important to take into consideration since sort of mutation calling is fundamental to cancer genomics. Cancer genomics depends on the sort of fidelity of mutation calling algorithms. So the samples that we are using to derive all of the BAM files that we're distributing for Benchmark 4 are based on these two pairs of cell lines. So, HCC1143 and HCC1954 -- so these are both -- 1143 and 1954 are both derived from breast tumors, and the -- they each have a paired normal sample, which is derived from blood from the same -- it's a cell line derived from blood from the same patient. All of these lines are available through ATCC and they've been sequenced to between -- the sequences that we have for this benchmark are at between 50x and 71x, sequenced at the Broad Institute. And that's all I'll talk about. As I mentioned, all this data is publically distributed through CGHub. So this is sort of what we want participants to do. So there's three parts to this mutation calling exercise. So the first part is pretty straightforward. We just want participants to compare the tumor cell line full genome BAM to the corresponding normal full genome BAM for both pairs of cell lines. This will sort of establish a baseline under sort of ideal conditions, so you've got sort of higher coverage genomes here and their cell line, so they're presumably clonal. And so from there we can use sort of this clonal property of the cell lines to do interesting things, and so these are -- we can simulate normal contamination, so sort of in this row A here what I'm showing is samples. Each one of these pie charts represents a BAM file that we've generated for the benchmarking exercise, and so what I'm showing here is we have mixed the normal and tumor BAMs to yield a 30x coverage BAM file in various proportions. So over here it's 5 percent, simulates 5 percent normal contamination, and over here we're simulating 95 percent normal contamination. And as has been alluded to in previous talks, normal contamination is an important factor in mutation calling fidelity. And so in addition to simulating normal contamination, we can simulate subclone expansion. And the way we do this is by taking the original tumor BAM file, spiking-in single nucleotide variants and structural variants into a single allele, and we can spike-in to a single allele by using results from Scott Carter and Gaddy Getz's group's absolute algorithm. So we can selectively spike-in to one allele, and then we can mix that so -- by spiking-in we get a genetically distinct tumor BAM, and then we can mix that back in with some amount of normal contamination and some amount of the original tumor to simulate the presence of subclone in the tumors. And so we can scale -- we've scaled that from 1 percent subclone, which will be difficult, if not impossible, to detect mutations, and up to 40 percent, which should be feasible. And so this sort of normal contamination model scheme and subclone expansion scheme were generated for both pairs of cell lines, and so, in total, there's -- we're doing six comparisons here. So this BAM versus the normal, this BAM versus the normal, et cetera. So, in total, we end up with 28 BAM files, which are distributed publically via CGHub. So if you navigate to this URL here you can download a public key, and you can use that public key in Gene Torrent to grab the BAM files for the benchmarking exercise. So if -- those of you who attended the CGHub workshop yesterday evening should be familiar a bit with this process, and many thanks to Chris Wilks and the CGHub team for helping us to get these BAMs up and dealing with our requests to replace them and so on. So in addition to providing data whereby we can evaluate the performance of mutation callers comparatively, Benchmark 4 has also been stimulating the creation of new evaluation tools for VCF files and BAM files, and so I'm listing some of these here. Just to point out that VCF is a successful standard for expressing mutation calls. There's a whole bunch of tools out there for it, including VCFtools, the Genome Analysis Toolkit at the Broad, PyVCF, et cetera, et cetera. So here are some of the tools that we've created, either specifically for Benchmark 4 or related tools. So Bamsurgeon is the method I'm using for spiking-in single nucleotide variants and structural variants into pre-existing BAM files. So if you're interested in that kind of thing, definitely come talk to me. I'd be happy to have you use it. VCFcomparator is the comparison engine that we're using to evaluate the outcome of the -- this benchmarking exercise. LeftShiftBreakends is a bit esoteric, and if you're really into accurate calling of structural variants breakends, this is something to consider, but something to talk to me about. And I also want to plug the upcoming VCF to MAF converter. Many thanks there been discussion among the benchmarking group about the purpose of different pipelines? evaluating both sensitivity and specificity. So, the sort of large scale effort you mentioned would really benefit from improved sensitivity, whereas sort of the clinical efforts would benefit from improved specificity. So we'll evaluate it sort of both ways and give recommendations for how to appropriately tune mutation calling pipelines both for sensitivity and for specificity, depending on what your goals are. Charles Perou: I have a quick question, which is, when you make the mixed BAM file between the tumor and normal -- see where you want to have it more specific or more sensitive, and I'm sure there will different criteria that you might think -- someone would argue the opposite of what you say. clinical settings. Our experience is that we want to open up the calls pretty loose is Giovanna Ciriello from Memorial Sloan-Kettering. He'll be talking about --