Tcga - Genomic characterization of squamous cell carcinoma of the head neck - David hayes

David Neil Hayes: Thank you, Raju. Thanks to the organizers for the opportunity to talk to you about squamous cell carcinoma of the head and neck from The Cancer Genome Atlas. Let's see. Okay. I'm giving this presentation on behalf of many collaborators, the most notable of which are Dr. Adel El-Naggar from MD Anderson, and Dr. Jennifer Grandis from the University of Pittsburgh, who are the co-chair of the Head and Neck Cancer Disease Working Group, along with myself, and many members of the analysis working group. I'm going to try to give credit where credit is due as I go through the talk, but that may not be possible in every case just because there's so many contributors. And actually shown here are participants at the face-to-face meeting that took place at UNC Chapel Hill in September of this year. So just to point out that folks who are willing to contribute two or three days of their time to come and work on this data, and I greatly appreciate that. Okay, so head and neck cancer is an important cancer. It's the number five cancer, the fifth most common cancer worldwide; 500,000 cases per year, 200,000 deaths. There are parts of Asia where it's the most common cancer. Usually, that's in the case of nasopharyngeal carcinoma. In the United States, it's the number sixth most common cancer with 45,000 cases per year, approximately 20,000 deaths per year. The two common risk factors, smoking with about 80 percent attributable risk, so 80 percent of head and neck cancer is attributable to smoking, but a rising and well described epidemic of human papillomavirus associated carcinomas as well. And with that in mind, I've included a cartoon on the slide, not so much to really go through the details of how HPV causes head and neck cancer, but just to get some vocabulary out there, because I'm going to refer to these markers many times. For the most part, we're talking about HPV type 16. HPV type 16 makes oncoproteins: E6 and E7. E6 targeting p53 and E7 targeting Rb. If you look again at the cartoon, you'll get a sense of some of the other important players in the HPV infection, and you'll also see these emerge as important players in head and neck cancer as well. And I'll point particularly to the cycling, particularly cycling D1, and, again, P16. And it's probably worth mentioning that P16 plays an important role here, both as a biomarker and as part of the pathophysiology. The biomarker role is due to the fact that HPV-infected cells express high levels of P16 because of reciprocal signaling, so immunohistochemistry of P16 is one of the most, if not the most common, diagnostic, clinical diagnostic tests for HPV infection. The data that I'm talking about today are the 279 samples that are part of the data freeze. We have a data freeze so that we can actually do the analyses. There will ultimately be 500 cases of head/neck cancer included. To be a case, the sample had to have exon sequencing, tumor SNP chips, RNA sequencing, methylation, and microRNA sequencing. I will say that there's a lot of other data that -- included in the data freeze that will get used eventually, including RPPA data. So, protein expression data, but it's not available in absolutely every sample. Let me describe the demographics of the patient population. The median age for our patients was 61. This is a little bit older than the SEER median age in the United States, at 57. Ten percent of the patients are minorities, mostly African Americans. Twenty percent of the patients were never smokers, which seems a little bit high. That may be some missing data, but in any case, that's the data that we've got. Seventy-three percent of the patients were male. That's about right for the United States. In Europe, you'll see up to 90 percent of head/neck cancer will be in males. Eleven percent of the patients positive for HPV as defined by the sequencing, and I'll get into that in a couple of minutes. Sixty-two percent of the cases are from oral cavity, 26 percent larynx, 11 percent oropharynx, and 1 percent hypopharynx. Most of the patients are advance stage, with 57 percent having Stage IVa disease. Head and neck cancer's a little unusual in the staging in that IVa does not mean incurable. This just means there's a large tumor with a large lymph node or multiple lymph nodes. Stage IVc is metastatic disease. And about 40 percent of the patients were alive at the time of the last follow-up. I will mention one challenge that we struggled with, which is HPV status. Here on the screen I'm showing seven different ways that a patient could be potentially identified as HPV positive, and so we're wrestling in the dataset right now with actually defining which case is which, based on RNA sequencing, DNA sequencing, clinical history, and other factors, and this is important for reasons that will become clear later. But I'll just have some conclusions on the cohort that we have. And I think this needs to be emphasized, that the current data freeze, which is only half of the head/neck cancer samples that will be available, is already the largest dataset, genomic dataset for head/neck cancer that I'm aware of that has ever been assembled by a factor of two, for even the individual components. So these 279 cases is twice the number of expression data that are available through any other source; more than twice the number of copy number arrays, et cetera, et cetera, and the data are -- there are clinical data that are available as well, and, again, the data are all integrated. This is an unbelievable resource. I think of all the TCGA tumors so far, this is probably the tumor that was the most in need of this contribution, and we will be hearing about this for a decade or more. This is an incredible resource. There are some limitations however. This is a surgical cohort. So, the relatively few oropharynx cases, relatively few HPV positive cases, and a few smaller tumors, so these are the lower risk tumors. So there are some limitations, but nonetheless a dataset to be quite excited about. Now moving on to the DNA data. This is the famous Gaddy Getz figure from the Broad, and probably everyone at the Broad can make this now, but making an important point, which is that head/neck cancer has a very high mutation rate, somewhere between one and 10 mutations per megabase of sequencing. Not quite as high as lung squamous cell carcinoma, but probably dragged out a little bit by the fact that HPV-positive tumors have lower mutation rates. This is a fairly mature version of a figure that's really a key deliverable in the marker paper, and actually, I'm going to go to something that Matthew Meyerson said this morning, which is that, at this point, this group, the disease working group, there's no way that we can even begin to scratch the surface on this data. Our goal is to move the TCGA data forward and into the community, to present the data, to introduce the data, to show what can be done with the data. Now, we are going to make some novel observations, but I think the main point is to not get in the way of others analyzing this data, and so that's our goal. But looking at the significantly mutated gene list, you'll see some -- many common, many expected players. Number one, CDN2A, or the gene that generates the P16 protein product. So that's already interesting, because I've already brought in the HPV story, and P16, and I'm going to get back to this a couple more times. P53, as expected, some perhaps unexpected genes, so CASP8 is an interesting target, and I'll talk about this in a minute as well. And another interesting target, HLA-A, one of the MHC class one proteins, I'll get back to that as well. Anyone who has spent much time involved in these large sequencing projects will know that the significantly mutated gene list is a highly parameterized analysis, which means that if you tweak the parameters a little bit, you can generate vastly different lists of mutations, and many of those tweaks can be very reasonable. And if you do that, and you go through the list a few times and consider some different ways to look at the mutated gene list, one of the observations you'll see is that the significantly mutated genes is highly overlapping with lung squamous cell carcinoma. In fact, of the top mutated gene squamous cell carcinoma, only P10 and KEAP1 failed to commonly emerge on the significantly mutated gene list from head and neck cancer. Although there are KEAP1 and P10 mutations, it never rises to the level of significantly mutated. So, I think this is one of the -- I'm going to pause on one of the early key observations, which is the HPV negative head and neck cancer looks a lot like lung squamous cell carcinoma, and that's in terms of its mutational landscape, its copy number landscape, expression patterns, and pathway activations. Data that I'm not going to show mostly because it's not my data, but we, because of TCGA, we've been able to get some early looks at it, is that HPV-positive head and neck cancer looks a lot like other HPV-positive tumors. I think we will see a little bit of that through the meeting here. But it does justify even more that we need to start thinking about these tumors in different ways, and so I've just highlighted one of these thoughts here, which is the idea that some of the key mutations might be different between HPV-positive and HPV-negative tumors. Here's one example with PIC3CA showing a mutation rate of 35 percent in HPV-positive samples and 19 percent HPV-negative samples. This is assuming that 34 of the tumors were HPV positive and 254 negative, and you're starting to see why it's so important that we get our HPV calls correct. I'm also going to show you later why it's challenging to get these calls correct. Just one slide on the whole genome analysis, just to remind us that we have it. We've got some very interesting cases, but we really haven't had the time yet to develop the whole genome story. So, I'm really not going to talk about these further today, but there are approximately 30 whole genomes that have been done for head and neck cancer. Going into the copy number landscape a little bit, I think this is one of the key observations, and this, I think, will be a figure in the marker paper showing lung squamous cell carcinoma copy number landscape. So, this is the genome for chromosome [spelled phonetically] one through all the autosomes, HPV-negative tumors, HPV-positive tumors, and from 10,000 feet, you can clearly see that these tumors share many of the same copy number alterations, universal alterations of losses of chromosome 3p, gains of 3q, alterations in chromosome eight. But there are some differences, and I'll go through these in some of the subsequent slides. Looking at the focal amplifications between head and neck cancer and lung cancer, really very similar patterns of focal gains, but a couple of exceptions. PDGFRA, for example, the peak for PDGFRA on chromosome four completely absent in head and neck cancer, but otherwise largely very similar lists. Comparing HPV-positive tumors to HPV negative tumors, this is an observation I should have already given some credit to Andy Cherniack, who generated a lot of these figures and has been a great collaborator on this project, is that in the HPV positive tumors, really a striking lack of oncogenes other than PIC3CA. Perhaps a little bit in terms of some CCND1, Cyclin D1 amplifications, but overwhelmingly PIC3CA, compared to a much deeper selection of oncogenes in the HPV-negative tumors, and I think this is a novel observation. And again, gets back to the importance of looking at that mutation rate for PIC3CA and HPV-positive tumors. In terms of focal deletions between HPV-positive and negative tumors, one in particular is striking, and this is a deletion of chromosome 11. So --and this reminds me -- I'm going to make this conclusion a couple of times -- that the copy number landscape in head/neck cancer appears to be very rich in terms of defining its biology, perhaps more so than the mutation spectrum in some cases. One of the challenges with copy number alterations, even focal events, is that sometimes three, or four, or 10, or 20 potential oncogenes occur within the amplicon, and so this is one of our challenges, is to find the key gene within the amplicon. I will point out that TRAF3 does -- its gene expression, and its copy number track, and its deletion, the red samples are the HPV-positive samples. So, it's certainly quite intriguing that this could be the target of the chromosome 11 deletion. Okay, again -- all right. So, you saw this morning that Chad Creighton showed -- clearly, they spent time in the renal cell carcinoma paper validating the mutations. They're coming up with a list of the most credible mutations. In the head and neck cancer project, we've moved -- we've taken a somewhat different approach from having multiple centers called mutations to using the RNA-seq data to validate the mutations. This is a very powerful technique, because you have an independent sample, an independent sequencing, independent alignment, and then you're checking the mutations. And the way to read the figure is every column is a sample. The height of the blue bar is the total number of mutations from that sample. The height of the yellow bar is the number of the blue bar that actually had any coverage in RNA. So it was the mutant base covered with any RNA whatsoever, even a single read. The red bar is the fraction of samples that if that mutation was covered, was it validated in RNA. And if you think back to Chad's figure this morning where essentially all that was happening there was folks using the same DNA to call mutations, I think you'll see that the RNA confirmation rate compares very favorably from independent sequencing reactions. And so here we're seeing greater than 80 percent validation, if the base was covered. The RNA-seq is an incredibly rich source for structural variants in the transcriptome, and I'm really not going to have time to get into this much today, other than to give you a couple of conclusions. One is that at this early point, and this was up for debate and it needs to be validated, there's really not any convincing evidence that there are recurrent in-frame gene fusion events. And this is similar to what we saw with lung squamous cell carcinoma. So these are sort of in-frame oncogenes. However, there's quite convincing evidence that structural rearrangements in the DNA and the resulting transcripts are functional, more likely in terms of -- I'm sorry, tumor suppressor gene inactivation and loss, and I think this is a novel observation that we're going to try to make. Shifting gears a little bit and thinking about some of the patterns that Lou Staudt showed us this morning, thinking about the use of expression analysis to identify molecular subtypes of head and neck cancer, or of any tumor type. I'm going to start with the example from lung squamous cell carcinoma, the manuscript that was published in September of this year in Nature, where we described four subtypes of lung squamous cell carcinoma: classical, primitive, basal, and secretory. There are many stories in these data, but I'll just pull out one for the illustration today, which is that the classical subtype of lung squamous cell carcinoma is associated with near universal alterations of KEAP and NERF, and one of the ways it's going to be identified is by high expression of NFE2L2 in all of the classical subtypes. We've performed a similar analysis in head and neck cancer, in samples that were available from UNC, then validated in TCGA data, and I'll just tell you that we borrowed some of the names and generated some new names; the names in this case are atypical, classical, mesenchymal, and basal. Here I'm showing independent validation of the patterns and samples from UNC and independent TCGA samples. Here I'm showing a centroid validation of these four subtypes from what's really the marker paper for head and neck cancer subtypes, published by Christine Shong [spelled phonetically] in 2004. And this analysis performed by Von Walter [spelled phonetically] shows that the subtypes of head and neck cancer correlate strongly with those same subtypes from lung cancer. So for the basal subtype of head and neck cancer, and lung cancer, there's a single unified node. The mesenchymal and the secretory subtype correspond, and the classical subtypes from the two groups correspond. Finding expression subtype is certainly interesting, but it's just a novelty until you can propose a model for that subtype, or what the genomic alteration might be. This is a particularly exciting one where in the atypical subtype, and I'll just, for the sake of time, I'll also point out that this is the subtype that's associated with HPV-positive infection, and so the HPV-positive patients almost all fall in the atypical subtype, have completely absent amplification of chromosome seven, and most notably, no instances of the focal high-level amplification at the EGFR locus. And this is true both in data from UNC as well as in The Cancer Genome Atlas data, again, suggesting that the PIC3CA oncogene in these samples may be the relevant oncogene. Again, thinking in a pathway manner, looking at expression of NFE2L2, and again, you'll see samples from UNC as well as from The Cancer Genome Atlas, universal expression of NFE2L2 in the classical subtype as well as the atypical subtype, but absent in the basal and the mesenchymal. And this is the same story from head and neck cancer. I mentioned early on mutations of HLA-A, which are reported in lung squamous cell carcinoma, which we are also seeing in head and neck cancer. It's a very interesting mutation. It was probably the most unexpected mutation, which is one reason why we didn't comment on it very much in the lung squamous paper. So in this instance, what I'm doing is I'm using the tumor subtypes to explore this mutation, which is otherwise sort of a curious event, and let me walk you through the figure. In the top of the figure what's being represented is gene expression, and it turns out that HLA-A, B, and C are all right next to each other on chromosome six. And they share a very coordinated gene expression. So, I've just collapsed them for the sake of display. The same thing is true for copy number alteration, and TAP1 and 2 are also on chromosome six right next to each other. So, they have sort of a coordinated pattern. In the middle of the figure, I'm showing the copy number. Here I'm showing mutations of HLA-A, B, and C, TAP1 and 2, and here I'm showing DNA and RNA detection of HPV. So in the interest of time, I'll just point out a couple of the patterns. Oh, and one other thing, as a proxy for lymphocyte infiltrates, we've got expression of CD3 and CD8 as markers of infiltration into the tumors. And what you'll see in the classical subtype, universal lack of expression of HLA-A, B, and C, and in large part due to deletions of the gene, but not universally so. So, for the most part, most of the HLA-A, B, and C mutations occur in the basal subtype, and these are always -- these are mutually exclusive. So, I don't have time, I guess, to dwell on the figure too much, but this is one of the early views of helping us to try to understand a mutation which was otherwise quite curious, now starting to see some signals that actually there might be pathway activation and signaling in a coordinated way. Speaking of pathways, for those who have been involved in TCGA and other large sequencing projects for the last five to seven years, we spent a lot of time thinking about Ras signaling, Akt P10. Well, one of the great pleasures of working with the current group is not only do we have new faces, we've also got new expertise, and so we've really expanded our thought process in terms of some of the pathways and the targets that we should be looking at. This is a figure generated by Carter Van Weiss [spelled phonetically]. I'm not sure if he's here today, but it's who has really been contributing greatly to this project, pulling out really survival and death pathways, which we have not looked at in our sequencing projects before. And again, I'm going to go back to Lou Stout's story this morning, thinking about coordinated events, those mutations that occur together or in an anti-correlated manner. There's a lot going on in the slides, so I'm only going to talk to you through one of the stories, but I mentioned earlier on mutations have Caspase-8 and HRAS. So one of the very curious findings is that Caspase-8 mutations occur only in the basal and the mesenchymal subtype, and frequently in conjunction with HRAS mutations. When there is a mutation of HRAS or CASP-8, there is never an amplification of CCND1, which is the 11q amplicon, which happens to also be right next to FAD. It's unclear which of those two genes might be the true target of the 11q13 amplicon, but the pattern is unmistakable. For those patients that have amplifications of CCN1 or FAD, an expression of those oncogenes, they have universally low expression of a second -- of genes from a second amplicon on 11q, 11q22, that with additional death-related oncogenes, YAP1 and BURT2 [spelled phonetically]. So, some patterns emerging, you know, I think these are some of the patterns that we're going to be evaluating as we move forward with this manuscript. In the interest of time, I really don't have -- this is just not the time to talk about all of the data types. I will say we've seen some amazing contributions from British Columbia, as we have in other tumors with identification of tumor subtypes, based on microRNA, and some of the earliest looks at differential clinical outcomes within these datasets. Similarly, there's -- if you have time to come by the poster, I'll show you some great examples of coordinated methylation gene expression data, particularly for P16, a very interesting story, and also a description of methylation subtypes by the group -- by the methylation genome characterization centers. Finally, I think this is my last slide. One of the most exciting observations is that through unbiased sequencing for the first time being able to -- because it's unbiased, to detect DNA and RNA that we weren't looking for. And in this case, it's viral RNA. So what I'm showing here, and this is data that's described in detail in a poster by Matt Wilkerson in the poster session, is the fraction of patients here on the top row, the fractions of patients that express some HPV type 16 RNA. What's interesting about this is that this rate, approximately 20 percent, is far higher than the number that had the clinical diagnosis of HPV infection, and it's also far higher than you would expect based on the fact that only 11 percent of these patients have oropharynx tumors. In addition, there are other viruses in the tumor that are also detected at high levels, and again, I'll refer you to Matt's poster, but most prominently *** virus. And we have near universal coverage of the *** genome in at least two of the samples. We'll get some more insight into viral sequencing in a talk that's given tomorrow from Raju's group. So a final word, thanks to the contributors, and we look forward to getting this data out into the public. [applause] Raju Kucherlapati: Thank you. One question. Female Speaker: Yeah, this is [unintelligible] from Arkansas. So, very nice talk, I mean, relating to the molecular [unintelligible]. I'm very interested in that, and I'd like to talk with you more later on this topic. I have a question related to the TCGA sample. Like [unintelligible] that when they do for the computational analyzes all with the new technology developer. So, is the TCGA, they save sample for later on, like for further verification or further new technology. So, I mean TCGA, when they prepare for the sample, do they save extra sample for later on computational verification? David Neil Hayes: I think the short answer is sometimes. Sometimes there's extra sample available. Kenna is shaking her head yes, and when there is, the program team has been very -- and there's an important question, they have made those samples available. But the samples are ultimately limited. Female Speaker: Thank you. Raju Kucherlapati: Thank you. Thank you, Neil.