Highlight text to annotate itX
Chris Miller: Thank you. Hi, I'm Chris Miller from Wash
U. I'm going to be talking to you today about tumor heterogeneity, clonal evolution, and
how we're using sequencing to get some insight into both of these phenomena.
So my first statement here will be that tumors are heterogeneous. This was suspected as far
back as the '70s, but it's really taken the advent of high-throughput sequencing before
we were able to dive deep into these tumors and see that they are, in fact, genetically
diverse populations of cells, and that because of that, within these, evolution is occurring
at the cellular level.
And then last year we were able to, you know, view this in action in a case of relapsed
AML, where we sequenced both an AML tumor, a match normal, and a relapse. And using that
we were able to put together a model of exactly how this clonal evolution works, at least
in this case.
It starts off with the hematopoietic stem cell, which gains initiating mutations here,
and then as the tumor expands, some of these cells acquire additional mutations, represented
here in purple, yellow, and orange. And these mutations may expand. And so when we assay
the tumor at diagnosis what we're getting is really a cross-section of this clonal architecture
of the tumor, where some of the cells look a lot like the founding clone, some of them
are this subclone that occurs in about 50 percent of the cells, and others are smaller
fractions of the tumor. As chemotherapy, then, is induced, and treatment goes on, it creates
a population bottleneck, where this population of cells is reduced, and only a few pass through.
And then this expands back into a relapse, then, and acquires additional mutations after
the treatment ceases.
And so what's really interesting in this particular case is that the clonal fraction that actually
went on to form the relapse only appeared in about 5 percent of the cells in the original
tumor, which is a little frightening, to be perfectly honest. And it makes us wonder,
you know, whether we could have missed it if we hadn't looked more carefully. And so
detecting these minor subclones is, we think, crucially important to understanding, you
know, how these tumors are responding to therapy, and to make sure that we get the whole tumor
and not just the major subclone.
I think there are several challenges that remain in detecting these. First of all, the
genomes are sequenced with low coverage. I mean 30x whole genome sequencing is clearly
not enough to detect events that are present only at 1 or 2 percent in the tumor, and even
a 100 or 150x, you know, exome sequencing may not be deep enough. But that at least
seems like a tractable problem. The sequencing costs are dropping rapidly. Perhaps a more
pervasive problem that we're interested in is that algorithms aren't designed to detect
these low frequency events, by and large.
If you look at this power simulation from our somatic sniper algorithm, which is one
of the kind of first generation of variant callers, you'll see that even with 90x coverage,
our power to detect events at 20 percent variant allele frequency is only 85 percent. And if
we drop down to 10 percent variant allele frequency, it's only 10 percent. So, we're
clearly missing a lot of these low frequency things.
And so that spurred us to develop an algorithm called BASSOVAC, Bayesian Scoring of Somatic
Variant read Counts. It's a little convoluted, but it works. And so this incorporates purity,
ploidy, base quality, and a host of other factors into a more complex model. We pull
these altogether into a Bayesian framework, and then obtain the probabilities that a particular
single nucleotide variant is either heterozygous or homozygous, given the input data.
And so we've tested this against other algorithms. This is kind of a worse case simulation, but
you can see that even in this difficult environment it's pushing our curve of variant allele frequency
far to the left compared to somatic sniper in these kind of first-generation callers.
We've also done some real world testing, and I want to tell you about one particularly
cool dataset that we've been working with. This is a quintet of samples, including a
primary breast tumor, a match normal, and three different metastases, from the spinal,
the liver, and the adrenal glands.
And so we whole-genome sequenced all these to 30x, and ran these through our initial
pipeline. This was prior to the development of BASSOVAC. And then capture validation was
formed for all these variants. So we have very deep sequencing read counts for all of
these variants and all of these samples. And so we were able to combine these all, and
then make cool plots that look like this.
So I'm showing you here on the x-axis is the variant allele frequency of single nucleotide
variants in the primary tumor, and on the y-axis you're seeing the frequency of events
in the metastasis. And so several trends emerge from this kind of plot. You can see that in
the -- at about 50 percent you see -- which corresponds to 100 percent of the cells in
the tumor for heterozygous events -- we see the major clone, or the founding clone, which
is also present in about 50 percent of the metastasis as we'd expect. Down here we see
another cluster of a kind of minor clone; that's present at about 25 percent, and that,
again, also passed through to metastasis. And then contrast down here on the x-axis
what we see is a clone that was present in the original tumor, but didn't pass through
to the metastasis. So, this was a separate population of cells that didn't make it through
the population bottleneck, or didn't make it through the metastasis event. Then on the
y-axis what we see are events that happened in the spinal metastasis, presumably after
the split, since they're not present in the original tumor -- or at least that's what
When we zoomed in a little bit close to this y-axis what we can see is that these events
I've highlighted in red actually were present at the tumor, just at a very low frequency.
And so that suggests that maybe they either had a growth advantage in the environment
of the metastasis, or just made up a majority of the cells that split off into the metastasis.
But either way, they're clearly present.
And getting back to our variant calling, then, this gives us a source of very low frequency
variants in this tumor that we know are real, because they're present in the metastasis
as well. And so we use these kind of events to test the sensitivity of our algorithms.
So, this is a comparison between three algorithms: BASSOVAC, our new caller; Sniper, our old
caller; and Strelka which is a caller from Illumina, which reports to do better on these
kind of low variant allele frequency events. What you can see is that BASSOVAC and Sniper
detect a lot of events and they're very comparable performance at kind of high-variant allele
frequencies and mid-variant allele frequencies. Strelka maybe doesn't do as well, but about
10 percent there's a inflection point where Sniper just isn't able to detect stuff, Strelka
does a little bit better, but BASSOVAC detects a huge number of these very low frequency
true positive events.
And, you know, in the end even with this kind of biased 30x approach originally, we can
see that 50 percent of the variants present in the metastasis are present detectable level
in the tumor, even though we would have expected a much smaller proportion if we hadn't looked
closely and looked deeply with the capture validation. But more importantly what we can
say here is that we can use BASSOVAC to dissect these true variants at very low frequencies,
down to and even lower than 2 percent.
So, given this kind of information, then, about these low frequency variants, how can
we put this to use to kind of infer the subclonal architecture of a tumor and find out, you
know, how many clones are present in there? Which variants are present in the different
subclones? And this really requires an integrative approach, where you look at both the variant
allele frequencies of the SNVs, as well as information on copy number calls, purity,
And so we can put it altogether into beautiful charts that look like this. I'm going to zoom
in so you can actually see what's going on here. And so we segregate the SNVs according
to copy number, and then we plot the variant allele frequencies along with the depth, just
kind of for our reference on the Y-axis. And you can see here in the 2x plot you get a
clearer indication of the founding clone. And then we overlay it with a kernel density
plot on top here. So you can see this, this clear peak at 50 percent which tells us this
is the major, the founding clone. And then you see these variants down here with a little
bit lower frequency correspond to blips up here that represent subclones. And then over
on the right side you can see events that are copy number neutral, loss of heterozygocity,
up here near 100 percent. And the copy number 3 regions, what you can see is that instead
of a 50 percent major clone as we expect, we expect peaks at 33 and 66 percent, depending
on whether the wild-type or the mutant allele got amplified. And we do see that indeed in
And so we can build these plots for all of our tumors, and you kind of eyeball it and
say, "This clearly looks like it's a two-clone tumor, a major clone and a minor clone." But
we get very leery of kind of eyeballing plots. We like to do it in a more rigorous fashion.
So we decided to come up with a method that could do this in an automated and kind of
unbiased manner. And so what we ended up doing was creating an algorithm that uses a mixture
model of binomial distributions to kind of model this data, and then use maximum likelihood
expectation to determine what the optimal number of clusters was for any given solution.
And so we can see that indeed this algorithm clusters this into two groups, says there's
a major clone and a minor clone, have overlaid the calls here. And this is a bi-clonal sample.
Here is a case where it's a tri-clonal sample that, you know, clearly agrees with our eyeballing
of the data. And there are cases that are a little more -- less intuitive, I guess.
This is a case where maybe if you looked at just the density plot you might say this was
a two-clone tumor, but if you look carefully, you can see that there's a nice peak here,
there's a nice peak here, and then kind of a smear in the middle. And the algorithm does
a very nice job of picking that up, fitting another curve in the middle there, and saying
this is indeed a three-clone tumor.
And then we also have, you know, more messy tumors. This is a multi-clonal sample with
a smear of data, and we don't think we're doing too much of a fitting in this kind of
case. We think there really are a variety of clones here, but it's very difficult to
segregate them accurately at this kind of -- with this kind of smear of mutations.
So we've applied this across a large sample set of tumors, looking mostly at AML, breast
cancer, and endometrial cancer, and we can say that most of the tumors in those data
sets have at least one founding clone and one or more subclones. I also want to emphasize
that the numbers I'm showing here, these are going to be lower bound on the number of clones.
First of all, detection sensitivity hurts us, because not all of these calls were made
using BASSOVAC. But more importantly, I think, is that we're unable to distinguish with this
kind of data between two independent clones that both occur at say 20 percent variant
allele frequency. Without, you know, kind of single cell methods, there's no way to
get that from this data.
So, in conclusion, we can detect somatic mutations at very low frequencies using BASSOVAC, our
new caller. And we developed an R package for automatically inferring the subclonal
architecture in tumors. We hope to release beta versions both of these by the end of
the year. They're not currently available, but will be shortly. And really the overarching
goal of this kind of research is to characterize these minor subclones at diagnosis rather
than discovering their presence at the relapse when it may be already too late to design
So, in conclusion, I'd like to acknowledge a host of people who made this possible. Mike
Wendl has been leading the BASSOVAC project, and Nathan Dees has been pushing the clonality
analysis out the door. A host of people at the Genome Center over here who have contributed
in one way or another, our collaborators who provided data, and expertise, and advise,
and leadership at the Genome Center. And then our funding agencies at the NHGRI and the
NCI, and, of course, The Cancer Genome Atlas. Thanks.
Charles Perou: Lou, you go first.
Lou Staudt: I wonder if there isn't an important implication
in your breast cancer metastasis findings. So if I understand correctly, in the metastasis
you've got both the tumor dominant clone --
Chris Miller: [affirmative]
Lou Staudt: -- with all the 50 percent alleles, and you've
got a tumor subclone.
Chris Miller: [affirmative]
Lou Staudt: So does that predict, then, the metastasis
must not have come from a single cell, but rather a clump of cells that had both the
minor and the major on it, or you're recreating all those mutations in the metastasis --
Chris Miller: Well, so anything -- any mutation that's present
in the founding clone is going to be present in all the subclones as well, but the fact
that it does appear at a lower variant allele frequency does indeed predict that it's not
a single cell that caused that metastasis. That is was a clump of cells containing both
those original ones, and a subset with additional mutations from the subclone, yeah.
Lou Staudt: I'm not an expert in this area, but I know
Chris Miller: I'm not either. [laughs]
Male Speaker: -- there's a lot been done -- a lot said about
individual breast cancer cells being found in the bone marrow and et cetera, and whether
clumps might be more the thing that metastasize.
Chris Miller: Yeah, I don't doubt that single cells may
be capable of that, but in this case it's clearly not one cell.
Lou Staudt: Okay.
Male Speaker: Yes, hi. I have a technical question.
Chris Miller: Sure.
Male Speaker: It seems to me that the selection of the bandwidth
of your kernel density estimate should affect the estimate -- the maximum likelihood estimate
of the number of subclonal populations. Have you looked into that, or how do you choose
Chris Miller: So, we don't actually use the bandwidth when
we're doing the binomial fitting. The bandwidth is clearly smoothing for eye, just to get
the pretty pictures. So we actually just take the raw data and feed it into the algorithm
Male Speaker: I got a question. So, if you were to compare
the clonality analysis coming from exomes versus full genomes, are they -- do they give
you similar answers, the same answer?
Chris Miller: That's very dependent upon the number of variants
that we're finding in these tumors. For example, even some of the whole genomes are, and definitely
in some of the AML exomes where you see a very few mutations, it's very hard to cluster
with only 10 mutations, right? It's very hard to know what's going on there. The exomes
-- so we do have to set a minimum threshold on the number of mutation that we have --
Charles Perou: We could probably do that on -- like the breast
cancer data where there's 20 basal genomes full sequenced and the exomes on those, right?
Chris Miller: Yeah, where we have whole genomes, it's really
easy, because you can include all those tier two and tier three mutations, and get hundreds
of mutations to get much finer resolution on your kind of sub clone architecture. With
just tier one exome stuff, it's a little bit harder, but we can do it, provided that there's
enough mutations in the sample.
Charles Perou: There's enough mutations. Got you.
Chris Miller: Yeah.
Charles Perou: All right. Thank you. So our next speaker
will be Adam Ewing from UC Santa Cruz.