Tip:
Highlight text to annotate it
X
David Neil Hayes: Thank you, Raju. Thanks to the organizers
for the opportunity to talk to you about squamous cell carcinoma of the head and neck from The
Cancer Genome Atlas. Let's see. Okay.
I'm giving this presentation on behalf of many collaborators, the most notable of which
are Dr. Adel El-Naggar from MD Anderson, and Dr. Jennifer Grandis from the University of
Pittsburgh, who are the co-chair of the Head and Neck Cancer Disease Working Group, along
with myself, and many members of the analysis working group. I'm going to try to give credit
where credit is due as I go through the talk, but that may not be possible in every case
just because there's so many contributors. And actually shown here are participants at
the face-to-face meeting that took place at UNC Chapel Hill in September of this year.
So just to point out that folks who are willing to contribute two or three days of their time
to come and work on this data, and I greatly appreciate that.
Okay, so head and neck cancer is an important cancer. It's the number five cancer, the fifth
most common cancer worldwide; 500,000 cases per year, 200,000 deaths. There are parts
of Asia where it's the most common cancer. Usually, that's in the case of nasopharyngeal
carcinoma. In the United States, it's the number sixth most common cancer with 45,000
cases per year, approximately 20,000 deaths per year. The two common risk factors, smoking
with about 80 percent attributable risk, so 80 percent of head and neck cancer is attributable
to smoking, but a rising and well described epidemic of human papillomavirus associated
carcinomas as well.
And with that in mind, I've included a cartoon on the slide, not so much to really go through
the details of how HPV causes head and neck cancer, but just to get some vocabulary out
there, because I'm going to refer to these markers many times. For the most part, we're
talking about HPV type 16. HPV type 16 makes oncoproteins: E6 and E7. E6 targeting p53
and E7 targeting Rb. If you look again at the cartoon, you'll get a sense of some of
the other important players in the HPV infection, and you'll also see these emerge as important
players in head and neck cancer as well. And I'll point particularly to the cycling, particularly
cycling D1, and, again, P16. And it's probably worth mentioning that P16 plays an important
role here, both as a biomarker and as part of the pathophysiology. The biomarker role
is due to the fact that HPV-infected cells express high levels of P16 because of reciprocal
signaling, so immunohistochemistry of P16 is one of the most, if not the most common,
diagnostic, clinical diagnostic tests for HPV infection.
The data that I'm talking about today are the 279 samples that are part of the data
freeze. We have a data freeze so that we can actually do the analyses. There will ultimately
be 500 cases of head/neck cancer included. To be a case, the sample had to have exon
sequencing, tumor SNP chips, RNA sequencing, methylation, and microRNA sequencing. I will
say that there's a lot of other data that -- included in the data freeze that will get
used eventually, including RPPA data. So, protein expression data, but it's not available
in absolutely every sample.
Let me describe the demographics of the patient population. The median age for our patients
was 61. This is a little bit older than the SEER median age in the United States, at 57.
Ten percent of the patients are minorities, mostly African Americans. Twenty percent of
the patients were never smokers, which seems a little bit high. That may be some missing
data, but in any case, that's the data that we've got. Seventy-three percent of the patients
were male. That's about right for the United States. In Europe, you'll see up to 90 percent
of head/neck cancer will be in males. Eleven percent of the patients positive for HPV as
defined by the sequencing, and I'll get into that in a couple of minutes. Sixty-two percent
of the cases are from oral cavity, 26 percent larynx, 11 percent oropharynx, and 1 percent
hypopharynx. Most of the patients are advance stage, with 57 percent having Stage IVa disease.
Head and neck cancer's a little unusual in the staging in that IVa does not mean incurable.
This just means there's a large tumor with a large lymph node or multiple lymph nodes.
Stage IVc is metastatic disease. And about 40 percent of the patients were alive at the
time of the last follow-up.
I will mention one challenge that we struggled with, which is HPV status. Here on the screen
I'm showing seven different ways that a patient could be potentially identified as HPV positive,
and so we're wrestling in the dataset right now with actually defining which case is which,
based on RNA sequencing, DNA sequencing, clinical history, and other factors, and this is important
for reasons that will become clear later.
But I'll just have some conclusions on the cohort that we have. And I think this needs
to be emphasized, that the current data freeze, which is only half of the head/neck cancer
samples that will be available, is already the largest dataset, genomic dataset for head/neck
cancer that I'm aware of that has ever been assembled by a factor of two, for even the
individual components. So these 279 cases is twice the number of expression data that
are available through any other source; more than twice the number of copy number arrays,
et cetera, et cetera, and the data are -- there are clinical data that are available as well,
and, again, the data are all integrated. This is an unbelievable resource. I think of all
the TCGA tumors so far, this is probably the tumor that was the most in need of this contribution,
and we will be hearing about this for a decade or more. This is an incredible resource. There
are some limitations however. This is a surgical cohort. So, the relatively few oropharynx
cases, relatively few HPV positive cases, and a few smaller tumors, so these are the
lower risk tumors. So there are some limitations, but nonetheless a dataset to be quite excited
about.
Now moving on to the DNA data. This is the famous Gaddy Getz figure from the Broad, and
probably everyone at the Broad can make this now, but making an important point, which
is that head/neck cancer has a very high mutation rate, somewhere between one and 10 mutations
per megabase of sequencing. Not quite as high as lung squamous cell carcinoma, but probably
dragged out a little bit by the fact that HPV-positive tumors have lower mutation rates.
This is a fairly mature version of a figure that's really a key deliverable in the marker
paper, and actually, I'm going to go to something that Matthew Meyerson said this morning, which
is that, at this point, this group, the disease working group, there's no way that we can
even begin to scratch the surface on this data. Our goal is to move the TCGA data forward
and into the community, to present the data, to introduce the data, to show what can be
done with the data. Now, we are going to make some novel observations, but I think the main
point is to not get in the way of others analyzing this data, and so that's our goal.
But looking at the significantly mutated gene list, you'll see some -- many common, many
expected players. Number one, CDN2A, or the gene that generates the P16 protein product.
So that's already interesting, because I've already brought in the HPV story, and P16,
and I'm going to get back to this a couple more times. P53, as expected, some perhaps
unexpected genes, so CASP8 is an interesting target, and I'll talk about this in a minute
as well. And another interesting target, HLA-A, one of the MHC class one proteins, I'll get
back to that as well.
Anyone who has spent much time involved in these large sequencing projects will know
that the significantly mutated gene list is a highly parameterized analysis, which means
that if you tweak the parameters a little bit, you can generate vastly different lists
of mutations, and many of those tweaks can be very reasonable. And if you do that, and
you go through the list a few times and consider some different ways to look at the mutated
gene list, one of the observations you'll see is that the significantly mutated genes
is highly overlapping with lung squamous cell carcinoma. In fact, of the top mutated gene
squamous cell carcinoma, only P10 and KEAP1 failed to commonly emerge on the significantly
mutated gene list from head and neck cancer. Although there are KEAP1 and P10 mutations,
it never rises to the level of significantly mutated.
So, I think this is one of the -- I'm going to pause on one of the early key observations,
which is the HPV negative head and neck cancer looks a lot like lung squamous cell carcinoma,
and that's in terms of its mutational landscape, its copy number landscape, expression patterns,
and pathway activations.
Data that I'm not going to show mostly because it's not my data, but we, because of TCGA,
we've been able to get some early looks at it, is that HPV-positive head and neck cancer
looks a lot like other HPV-positive tumors. I think we will see a little bit of that through
the meeting here. But it does justify even more that we need to start thinking about
these tumors in different ways, and so I've just highlighted one of these thoughts here,
which is the idea that some of the key mutations might be different between HPV-positive and
HPV-negative tumors. Here's one example with PIC3CA showing a mutation rate of 35 percent
in HPV-positive samples and 19 percent HPV-negative samples. This is assuming that 34 of the tumors
were HPV positive and 254 negative, and you're starting to see why it's so important that
we get our HPV calls correct. I'm also going to show you later why it's challenging to
get these calls correct.
Just one slide on the whole genome analysis, just to remind us that we have it. We've got
some very interesting cases, but we really haven't had the time yet to develop the whole
genome story. So, I'm really not going to talk about these further today, but there
are approximately 30 whole genomes that have been done for head and neck cancer.
Going into the copy number landscape a little bit, I think this is one of the key observations,
and this, I think, will be a figure in the marker paper showing lung squamous cell carcinoma
copy number landscape. So, this is the genome for chromosome [spelled phonetically] one
through all the autosomes, HPV-negative tumors, HPV-positive tumors, and from 10,000 feet,
you can clearly see that these tumors share many of the same copy number alterations,
universal alterations of losses of chromosome 3p, gains of 3q, alterations in chromosome
eight. But there are some differences, and I'll go through these in some of the subsequent
slides.
Looking at the focal amplifications between head and neck cancer and lung cancer, really
very similar patterns of focal gains, but a couple of exceptions. PDGFRA, for example,
the peak for PDGFRA on chromosome four completely absent in head and neck cancer, but otherwise
largely very similar lists.
Comparing HPV-positive tumors to HPV negative tumors, this is an observation I should have
already given some credit to Andy Cherniack, who generated a lot of these figures and has
been a great collaborator on this project, is that in the HPV positive tumors, really
a striking lack of oncogenes other than PIC3CA. Perhaps a little bit in terms of some CCND1,
Cyclin D1 amplifications, but overwhelmingly PIC3CA, compared to a much deeper selection
of oncogenes in the HPV-negative tumors, and I think this is a novel observation. And again,
gets back to the importance of looking at that mutation rate for PIC3CA and HPV-positive
tumors.
In terms of focal deletions between HPV-positive and negative tumors, one in particular is
striking, and this is a deletion of chromosome 11. So --and this reminds me -- I'm going
to make this conclusion a couple of times -- that the copy number landscape in head/neck
cancer appears to be very rich in terms of defining its biology, perhaps more so than
the mutation spectrum in some cases. One of the challenges with copy number alterations,
even focal events, is that sometimes three, or four, or 10, or 20 potential oncogenes
occur within the amplicon, and so this is one of our challenges, is to find the key
gene within the amplicon. I will point out that TRAF3 does -- its gene expression, and
its copy number track, and its deletion, the red samples are the HPV-positive samples.
So, it's certainly quite intriguing that this could be the target of the chromosome 11 deletion.
Okay, again -- all right. So, you saw this morning that Chad Creighton showed -- clearly,
they spent time in the renal cell carcinoma paper validating the mutations. They're coming
up with a list of the most credible mutations. In the head and neck cancer project, we've
moved -- we've taken a somewhat different approach from having multiple centers called
mutations to using the RNA-seq data to validate the mutations. This is a very powerful technique,
because you have an independent sample, an independent sequencing, independent alignment,
and then you're checking the mutations.
And the way to read the figure is every column is a sample. The height of the blue bar is
the total number of mutations from that sample. The height of the yellow bar is the number
of the blue bar that actually had any coverage in RNA. So it was the mutant base covered
with any RNA whatsoever, even a single read. The red bar is the fraction of samples that
if that mutation was covered, was it validated in RNA. And if you think back to Chad's figure
this morning where essentially all that was happening there was folks using the same DNA
to call mutations, I think you'll see that the RNA confirmation rate compares very favorably
from independent sequencing reactions. And so here we're seeing greater than 80 percent
validation, if the base was covered.
The RNA-seq is an incredibly rich source for structural variants in the transcriptome,
and I'm really not going to have time to get into this much today, other than to give you
a couple of conclusions. One is that at this early point, and this was up for debate and
it needs to be validated, there's really not any convincing evidence that there are recurrent
in-frame gene fusion events. And this is similar to what we saw with lung squamous cell carcinoma.
So these are sort of in-frame oncogenes. However, there's quite convincing evidence that structural
rearrangements in the DNA and the resulting transcripts are functional, more likely in
terms of -- I'm sorry, tumor suppressor gene inactivation and loss, and I think this is
a novel observation that we're going to try to make.
Shifting gears a little bit and thinking about some of the patterns that Lou Staudt showed
us this morning, thinking about the use of expression analysis to identify molecular
subtypes of head and neck cancer, or of any tumor type. I'm going to start with the example
from lung squamous cell carcinoma, the manuscript that was published in September of this year
in Nature, where we described four subtypes of lung squamous cell carcinoma: classical,
primitive, basal, and secretory. There are many stories in these data, but I'll just
pull out one for the illustration today, which is that the classical subtype of lung squamous
cell carcinoma is associated with near universal alterations of KEAP and NERF, and one of the
ways it's going to be identified is by high expression of NFE2L2 in all of the classical
subtypes.
We've performed a similar analysis in head and neck cancer, in samples that were available
from UNC, then validated in TCGA data, and I'll just tell you that we borrowed some of
the names and generated some new names; the names in this case are atypical, classical,
mesenchymal, and basal. Here I'm showing independent validation of the patterns and samples from
UNC and independent TCGA samples. Here I'm showing a centroid validation of these four
subtypes from what's really the marker paper for head and neck cancer subtypes, published
by Christine Shong [spelled phonetically] in 2004. And this analysis performed by Von
Walter [spelled phonetically] shows that the subtypes of head and neck cancer correlate
strongly with those same subtypes from lung cancer. So for the basal subtype of head and
neck cancer, and lung cancer, there's a single unified node. The mesenchymal and the secretory
subtype correspond, and the classical subtypes from the two groups correspond.
Finding expression subtype is certainly interesting, but it's just a novelty until you can propose
a model for that subtype, or what the genomic alteration might be. This is a particularly
exciting one where in the atypical subtype, and I'll just, for the sake of time, I'll
also point out that this is the subtype that's associated with HPV-positive infection, and
so the HPV-positive patients almost all fall in the atypical subtype, have completely absent
amplification of chromosome seven, and most notably, no instances of the focal high-level
amplification at the EGFR locus. And this is true both in data from UNC as well as in
The Cancer Genome Atlas data, again, suggesting that the PIC3CA oncogene in these samples
may be the relevant oncogene.
Again, thinking in a pathway manner, looking at expression of NFE2L2, and again, you'll
see samples from UNC as well as from The Cancer Genome Atlas, universal expression of NFE2L2
in the classical subtype as well as the atypical subtype, but absent in the basal and the mesenchymal.
And this is the same story from head and neck cancer.
I mentioned early on mutations of HLA-A, which are reported in lung squamous cell carcinoma,
which we are also seeing in head and neck cancer. It's a very interesting mutation.
It was probably the most unexpected mutation, which is one reason why we didn't comment
on it very much in the lung squamous paper.
So in this instance, what I'm doing is I'm using the tumor subtypes to explore this mutation,
which is otherwise sort of a curious event, and let me walk you through the figure. In
the top of the figure what's being represented is gene expression, and it turns out that
HLA-A, B, and C are all right next to each other on chromosome six. And they share a
very coordinated gene expression. So, I've just collapsed them for the sake of display.
The same thing is true for copy number alteration, and TAP1 and 2 are also on chromosome six
right next to each other. So, they have sort of a coordinated pattern. In the middle of
the figure, I'm showing the copy number. Here I'm showing mutations of HLA-A, B, and C,
TAP1 and 2, and here I'm showing DNA and RNA detection of HPV.
So in the interest of time, I'll just point out a couple of the patterns. Oh, and one
other thing, as a proxy for lymphocyte infiltrates, we've got expression of CD3 and CD8 as markers
of infiltration into the tumors. And what you'll see in the classical subtype, universal
lack of expression of HLA-A, B, and C, and in large part due to deletions of the gene,
but not universally so. So, for the most part, most of the HLA-A, B, and C mutations occur
in the basal subtype, and these are always -- these are mutually exclusive.
So, I don't have time, I guess, to dwell on the figure too much, but this is one of the
early views of helping us to try to understand a mutation which was otherwise quite curious,
now starting to see some signals that actually there might be pathway activation and signaling
in a coordinated way.
Speaking of pathways, for those who have been involved in TCGA and other large sequencing
projects for the last five to seven years, we spent a lot of time thinking about Ras
signaling, Akt P10. Well, one of the great pleasures of working with the current group
is not only do we have new faces, we've also got new expertise, and so we've really expanded
our thought process in terms of some of the pathways and the targets that we should be
looking at. This is a figure generated by Carter Van Weiss [spelled phonetically]. I'm
not sure if he's here today, but it's who has really been contributing greatly to this
project, pulling out really survival and death pathways, which we have not looked at in our
sequencing projects before.
And again, I'm going to go back to Lou Stout's story this morning, thinking about coordinated
events, those mutations that occur together or in an anti-correlated manner. There's a
lot going on in the slides, so I'm only going to talk to you through one of the stories,
but I mentioned earlier on mutations have Caspase-8 and HRAS. So one of the very curious
findings is that Caspase-8 mutations occur only in the basal and the mesenchymal subtype,
and frequently in conjunction with HRAS mutations. When there is a mutation of HRAS or CASP-8,
there is never an amplification of CCND1, which is the 11q amplicon, which happens to
also be right next to FAD. It's unclear which of those two genes might be the true target
of the 11q13 amplicon, but the pattern is unmistakable.
For those patients that have amplifications of CCN1 or FAD, an expression of those oncogenes,
they have universally low expression of a second -- of genes from a second amplicon
on 11q, 11q22, that with additional death-related oncogenes, YAP1 and BURT2 [spelled phonetically].
So, some patterns emerging, you know, I think these are some of the patterns that we're
going to be evaluating as we move forward with this manuscript. In the interest of time,
I really don't have -- this is just not the time to talk about all of the data types.
I will say we've seen some amazing contributions from British Columbia, as we have in other
tumors with identification of tumor subtypes, based on microRNA, and some of the earliest
looks at differential clinical outcomes within these datasets. Similarly, there's -- if you
have time to come by the poster, I'll show you some great examples of coordinated methylation
gene expression data, particularly for P16, a very interesting story, and also a description
of methylation subtypes by the group -- by the methylation genome characterization centers.
Finally, I think this is my last slide. One of the most exciting observations is that
through unbiased sequencing for the first time being able to -- because it's unbiased,
to detect DNA and RNA that we weren't looking for. And in this case, it's viral RNA. So
what I'm showing here, and this is data that's described in detail in a poster by Matt Wilkerson
in the poster session, is the fraction of patients here on the top row, the fractions
of patients that express some HPV type 16 RNA. What's interesting about this is that
this rate, approximately 20 percent, is far higher than the number that had the clinical
diagnosis of HPV infection, and it's also far higher than you would expect based on
the fact that only 11 percent of these patients have oropharynx tumors. In addition, there
are other viruses in the tumor that are also detected at high levels, and again, I'll refer
you to Matt's poster, but most prominently *** virus. And we have near universal coverage
of the *** genome in at least two of the samples.
We'll get some more insight into viral sequencing in a talk that's given tomorrow from Raju's
group. So a final word, thanks to the contributors, and we look forward to getting this data out
into the public.
[applause]
Raju Kucherlapati: Thank you. One question.
Female Speaker: Yeah, this is [unintelligible] from Arkansas.
So, very nice talk, I mean, relating to the molecular [unintelligible]. I'm very interested
in that, and I'd like to talk with you more later on this topic. I have a question related
to the TCGA sample. Like [unintelligible] that when they do for the computational analyzes
all with the new technology developer. So, is the TCGA, they save sample for later on,
like for further verification or further new technology. So, I mean TCGA, when they prepare
for the sample, do they save extra sample for later on computational verification?
David Neil Hayes: I think the short answer is sometimes. Sometimes
there's extra sample available. Kenna is shaking her head yes, and when there is, the program
team has been very -- and there's an important question, they have made those samples available.
But the samples are ultimately limited.
Female Speaker: Thank you.
Raju Kucherlapati: Thank you. Thank you, Neil.