Tcga Project Update - Brad ozenberger

Rudy Pozzatti: -- have a series of three project updates, the first of which is from Brad Ozenberger, program director at NHGRI, and it's on The Cancer Genome Atlas project. Brad Ozenberger: Yeah, hi, everybody. I -- in preparation for this, I went back and looked to see when we last gave an update on TCGA for you, and it's been almost three years. I went back and pulled the first slide from last time, which was this, when President Obama visited NIH in September of 2009, just three and a half years ago, to announce biomedical research investment through the American Recovery and Investment Act, and that included naming NIH -- or TCGA an NIH signature project and a big bolus of investment, $175 million. And I guess I'm here now three years later to say I think that was a pretty good investment. I think we've used it well, and I'll try to describe that. So just a quick reminder of what TCGA does. The Cancer Genome Atlas, our goal is to do comprehensive genomic analysis on all of the major tumor types in the U.S. And a real key to this is to take each tumor type, each specimen from each research participant, and do all the analyses we can on that tumor specimen. So, doing exome sequencing, RNA-seq, micro-RNA, methylation, and even more recently added some protein analyses; all on the same tissue samples from these participants. To do this requires quite a pipeline that's been built up over the years, a biospecimen core resource that processes all the samples, collects the clinical data, six genome characterization centers, the three genome sequencing centers supported by NHGRI. We've added genome data analysis centers in a large effort to coordinate these data. This is what it looks like on a map, just to remind you. NHGRI's investment in TCGA in terms of funds is through the large-scale sequencing program, the three big centers, Broad, Baylor, and Rick's center at Wash U, and that's our financial commitment to TCGA, is through their genomic sequence and analysis. Just to kind of go through a bit of history and where we're going. TCGA, the genesis of TCGA, was from a report from the NCA -- the NCI Advisory Board back in 2005 that proposed actually a number of -- they were predicting a number of technological advances that were in the works, and actually just part of this report was to design and develop a large cancer genome effort with -- it was suggested with NHGRI. And we set this -- set off on this program really in 2007 to pilot this, starting with glioblastoma and ovarian serous carcinoma, and we knew we had to establish the infrastructure, a pipeline, and feasibility. And we started this, Rick's here, with capillary electrophoresis sequencing, but we all knew that we were going to hit this point. The next generation sequencing was going to come on and really make this project feasible. Of course, with the reduction in costs, if you've probably seen this graph before that is on our website, of course also there's a great increase in capabilities. So, although this was coming, we started before next-gen kicked in, and actually this first GBM report was with just a small gene list with capillary electrophoresis sequencing and analysis and still using gene arrays. But then shortly after this the project expanded. We're now about in just past the mid-point of the main TCGA program. We're now up to 25 tumor types in TCGA. Sample acquisition was really beefed up at the NCI at the beginning of the expansion. We added these genome data analysis centers, which weren't part of the pilot phase to -- it was recognized we needed both. A lot more horsepower in terms of analysis to do the integrative analyses of all the data, as well as lead in a lot of innovation in genomic analysis methods. And a major product of TCGA that have really only started recently are the large benchmark papers that I'll go into a little more detail about. But the goal here for TCGA has been to achieve greater than 10,000 cases examined, and we fully expect to meet this goal by the end of 2014, in more than 20 different tumor types. In fact, we are now beginning to think about what happens after this big -- the major phase of TCGA, beginning to look what happens after 2014, and I'll touch on that a little bit at the end. So just to point out these, the TCGA network papers. These are each -- really, I think of them as historical datasets. These are deep into each one of these tumor types. Our goal is to describe mutations that are found in these -- each tumor type, down to a frequency of 2 to 3 percent, requiring approximately 500 tumors of each type. Again, we started early on with glioblastoma. Again, this was a pilot phase that did not involve next-gen sequencing, but then that was followed in actually just the summer of 2011 with the integrated analysis of ovarian carcinoma. This was the big shift. This involved -- instead of its small gene list, this involved full exome sequencing, RNA-seq, and hundreds of samples of ovarian cancer, and kind of set the standard, then, for where TCGA has gone since. Last summer we published the colon and *** cancer work, the colorectal analysis. In September -- in September, the genomic characterization of squamous cell lung cancer. And then the next week, actually, the comprehensive molecular portrait of human breast tumors, the one that Eric mentioned in his director's report. Again, each of these papers really get a lot of deserved attention. These are, in each case, novel discoveries, and what I particularly -- I'll come back to that. What I really want to go into also is some of the clinical ramifications of each of these papers. So here's where we are today in terms of a project-by-project view. On the Y axis are the total qualified cases -- the number of tumor cases that are -- have been analyzed or are in pipelines. Again, our goal is 500. Breast, we put all the subtypes into a single project and our goal there is a thousand for the breast project. The ones in red have gone through a full data freeze, either they've been published or a number of them have written -- the papers have been written and are currently under review. And then there are a number of other projects now in the pipelines that are -- we have full-fledged analysis groups working, and papers should be coming out later this year. And then there's this tail where accrual is still going on, and these will come -- probably some of these in 2014. A number of projects -- the ones starred here have closed. The accrual has closed because we've exceeded the 500 goal, although we are still accepting African-American specimens to fill out some of the diversity. So all this can be found on a project dashboard I would point you to, on the TCGA website, a project -- we call it the project case overview dashboard. That gives a snapshot of all the data available for each project. Each project is listed. You can see here the number of samples that have been accrued, the number that have qualified and entered analysis, and then for every single data type, all the rows -- this is just a small corner of the dashboard. Every row represents a different data type and you can see in there how much data of each type are available for that tumor project. It's quite a handy overview of TCGA. Just the top line numbers, we have now about 7,500 on our way to the 10,000 -- greater than 10,000 goal. Cases are in the bank, qualified, and most of these are at centers at this point. Greater than 6,000 cases with the full genomic datasets, so this is 6,000 cases with full exome sequencing, RNA-seq, micro RNA-seq, methylation, and clinical data -- as much clinical data as we can get. Again, and I point out also, hundreds of whole genome data files. This number continues to grow and, of course, for every case, this is always in cases, for every case for the genomic sequencing, it's both the tumor genome, as well as the normal genome. And right now on this dashboard you'll find there's data available in our database on 25 different tumor types. So TCGA was set -- our goal is to create a community resource dataset. This would be -- data would be released very quickly and then used by the community as it is, of course, but we also have a large TCGA network that also has worked very hard to integrate the data and provide first looks. So things like cancer stratification by gene expression or methylation patterns, you know, every tumor type. There's a list of significantly mutated genes and how those mutations are distributed across the cohort. Whole genome looks -- look at individual whole genome data. This is a particularly scrambled lung squamous cell carcinoma genome. And then all this is then integrated into a look at the pathways involved in each of these tumor types together. So although the goal -- and certainly thousands of people each day are digging in to the TCGA datasets, our own network is doing a lot of work as well, but I think we didn't fully anticipate when we started the program how quickly data would translate to potential clinical utility, and I just want to briefly go through some -- a few examples. There's just such rich data, and as we learn, as the groups learn to integrate all these data types and really build a picture of what's -- of the foundation of the genesis of these cancers, really reveal something that can translate right to the clinic in many cases. So just to go through a few of these quickly. In GBM, even that very first paper early on, there was an interesting example of -- many GBMs show hypermethylation of the MGMT locus, and these tumors require resistance to standard-of-care therapeutics, and the TCGA data explained how this occurred through shutdown of mismatched repair pathways, and immediately suggested changes to the regimen, treatment regimen, for patients with recurrent GBM tumors. The ovarian work -- it was known in ovarian cancer that the FOXN1 transcriptional factor network was frequently mutated, altered, but now with the full TCGA with hundreds of cases, this was a very high percentage; 87 percent of tumors showed some alteration in this pathway, not always in the FOXN1 gene itself, but all these peripheral additional nodes that feed into it, suggesting, perhaps, a common target for ovarian cancer. But on the inverse also we -- the TCGA group identified the full spectrum of frequently amplified genes were delineated. These are a number in the dozens, but, of course, each individual tumor has a different gene, or two genes, or three genes that are amplified and would be predicted to help drive the disease, and, you know, really points to the fact that we need a customized treatment for each individual tumor. Colorectal. First, colorectal started as two projects, colon carcinoma and *** carcinoma, but it was quickly confirmed that, in fact, the molecular genomic underpinnings of these diseases show that it's a single disease, so we immediately merged these into the colorectal project. It's just one disease. And integrative analyses showed, again, similar to the FOXN1 story in ovarian, the prominence of the WNT-signaling alteration and promise of inhibitors in this pathway. Breast. Tumors of the basal subtype were found to have the same genomic signatures, in a large sense, as the ovarian serous tumors. These are poor prognosis, aggressive tumors, and we can see -- and this shows copy number data, ovarian versus the basal over here. You can see the similarities in copy number, but not just in copy number, but other genomic analyses as well. You can see this similarity. And already, ovarian clinical trials are being adjusted to test these compounds for efficaciousness also in breast basal-type tumors. Importantly, also, in this paper, the clinically-defined HER2-positive tumors. It was known that there's always a substantial proportion of HER2-positive tumors that don't respond to the normal EGFR inhibitors, and, in fact, in closer analysis of the TCGA data, they could easily divide the HER2 positive into two different genomic subtypes, and one that is predicted to respond to the EGFR inhibitors and one which wouldn't, and it shows an important marker that would adjust the therapy for those patients with that marker. Lung squamous. Lung squamous cell carcinomas are over 25 percent of lung tumors in the U.S., but, in fact, it has been very -- rather poorly described genomically. So this was the first real hard look at genomic -- the genome of lung squamous, and identified a number of interesting targets. Importantly also, it identified markers that showed similar underpinnings to lung adenocarcinoma, and in speaking with a clinical trialist in lung cancer, they were immediately going to test some of their compounds from a lung adeno trial in lung squamous that had the appropriate mutations where -- that suggest it might work. A couple of papers that aren't out yet, but will be shortly, kidney clear cell carcinoma. Again, this is one of these cases where it's known that SWI/SNF chromatin remodeling complexes sometimes mutate in these genes, but again, in the TCGA data, we can now show that this is a majority of these tumors, in fact, and there's a lot of interest in therapeutic compounds that modulate this pathway and potentially modulate this disease. And then endometrial in a bit of a sort of unexpected finding, 25 percent of endometrial tumors again share this hallmark, these markers of ovarian serous carcinoma. Here now, just like the previous slide, serous ovarian tumors and copy number, the basal breast is mentioned, and also serous -- these endometrial subtype we call serous-like now, and these are associated with an increased risk of recurrence, and now we have a better handle from this work on what the genetic mutations are that drive this. So I just wanted to give those few examples. And it's -- although, again, it wasn't our first goal to get these data right to patients where it might make a difference, but certainly it's happening on a more rapid timeframe than I might have expected. So clearly what TCGA is driving at, you know, of course now if you -- the cancer diagnostics, it's mostly from pathologists reviewing slides. Of course this will still be important, but certainly we're starting to see now the increased emphasis on genomic analysis in oncology companies like Foundation and New York Genome Center and others. TCGA is really providing a lot of the foundation to drive this personalized therapy. I don't like that word, but individualized therapy in cancer. So looking forward, again, we've had a number of papers that came out last fall: colorectal, breast, lung squamous. Coming soon, these are under review: kidney clear cell, endometrial, and AML. And a number of other projects that I would hope would be out before the end of the year, and followed by some big ones, such as prostate and melanoma that'll follow that. TCGA has created this atlas of mutations. You know, really, I think, been successful in understanding -- beginning to understand the biology of cancer through this project, this compendium atlas of mutations that drive these cancers. New drivers have been identified, and, like I said, already changing clinical practice in some of these diseases. Also, you know, I don't think anybody would argue that there's now firmly established that we need to think about each patient's tumor as a unique disease. And I'm happy to say all the major pharmaceutical companies have pipelines into the TCGA data now and are using these data on a continual basis to drive therapeutic advances as well. I want to point out that it hasn't just been about the biology of cancer that I think is part of TCGA's success, but also the driving of technology. You know, the pole of the TCGA program has driven the development of cancer genome analysis methods. This is a real flagship project. But many new analysis and informatics tools adopted -- are being adopted to all fields of genomic research, of course not just applicable only to cancer. So, in the next phase, we are just a couple of years out and on a good projectory, but we do think TCGA will wind down. There will be some final analyses, certainly, for a year or two afterwards. Eric mentioned a workshop that we had in end of November, I think. NCI and NHGRI are working closely together, and separately, to develop some new initiatives. Certainly we want to continue this approach. There's still -- even with TCGA, there are still many mutations, you know, as we go deeper into these tumors that aren't fully explained, and certainly there still needs to be some atlas development in cancer. And then more importantly, I think, we're looking hard at moving more towards the clinical trial area to begin to investigate now the genetic underpinnings, for example, metastasis and response to therapy, that's going to require us to really get a little closer to the clinical trial areas to get these specimens and get these data. All right. So with that, I'm going to close. Just acknowledge -- Heidi Sophia and Lindsey Lund work with me every day on TCGA, and Mark Guyer, still a real key part of the team. And I want to acknowledge Jane Peterson and Peter Goode who are involved in many years of the early stage. And this is the NCI team. They have a full office for TCGA led by the dynamo Kenna Shaw, if you've encountered her. And then they have the new Center for Cancer Genomics at NCI that we're working with co-directed by Stephen Chanock and Lou Staudt. With that I'll stop. Any questions? Yeah, Jill. Female Speaker: Brad, do you want to say anything about how the ICGC project complements TCGA and what they've done so far? Brad Ozenberger: Yeah. Yeah, I neglected to mention that. TCGA is a major player -- a major part of ICGC. It's the bulk of the data in ICGC. Yeah, we've always -- we were kind of in front of them, of course, but we've been very pleased to see that a lot of large projects in Spain and Italy, of course in the U.K., have been catching up and contributing greatly. We meet at least once a year, and there have been a number of coordinated efforts in certain tumors, prostate as an example, where one group looked at very -- at tumors that only occur in young men and somebody else is looking at tumors that are refractory to therapy, and so we've done a good job of synergizing across that consortium and it continues to be something that -- sorry, continue to be very important. There's a -- they have their own database run out of the University of Toronto -- the Ontario Institute for Cancer Research with Tom Hudson. And we work very closely with getting TCGA data into there. Yeah, Mark. Mark Guyer: Yeah, I just wanted to add on your point about community involvement, that the analysis groups have become much bigger than any of the -- than the TCGA-funded groups. The project has been really good about bringing in wider participation by the community in the analyses. So I don’t know if you want to amplify on that. Brad Ozenberger: Yeah. So each -- around each of these tumors, of course, a big analysis group forms. We designate a PI within TCGA to kind of be a leader, and then usually there's a disease, a specific disease expert, too, that they kind of co-chair the analysis. Then we invite experts in each disease to come in and contribute, and so, yeah, if you know of people who are interested in a particular tumor on the list, please have them contact us and we can get them involved. Yeah, Pilar. Pilar Ossorio: This is just an informational question from somebody who hasn't been keeping up with TCGA. What's the difference in the work between what the analysis centers do and the genome characterization centers? Are the genome characterization centers mostly about structure, is that -- Brad Ozenberger: No. They're data -- so the genome characterization centers are data generators. So, they're doing the RNA analyses, SNP/ChIP array, the things that aren't done by bulk genomic sequencing. And the genome data analysis centers are strictly computational. Yeah. Joseph Ecker: I know that TCGA has had methylation, for example, and that would be genetic mark as part of the program, but I wonder whether or not there were plans or discussion about including, you know, histone modifications. I mean, many of the genes have been identified and a number of them have been turned out to be epigenetic modifiers in some way, and I'm wondering if there's any plan -- certainly ACR has been talking about this as a workshop report on trying to gather groups together with an interest in supporting epigenetic analysis of those same tumors, which I think would add another dimension to the data. Brad Ozenberger: Yeah. There are residual tissues that remain in the bank, and we actually want to try to make those available, although there isn't, you know, a spec sheet on the website on how to do that yet, or we don't really know yet, but we have begun some protein analyses, mostly in the, you know, phosphoprotein ChIPs and that sort of thing. But yeah, right now the histone modifiers are really not part of the project. Lon Cardon: So, I'm [inaudible] by these immediate clinical translation findings and I'm wondering, as those are discovered and as they're going presumably to clinical trials with maybe existing therapies but new indication, are you using the infrastructure that you've got for TCGA to analyze pre- and post-tumor, given treatment today? Or is that an opportunity that one could grasp? Brad Ozenberger: Yeah. I think it's an opportunity. It's -- certainly, NCI is making a lot of movements towards making all their clinical trials genomically enabled, but yeah, really, that's more in NCI's court and they certainly see the value of that, but yeah, we're -- it's kind of making steps, incremental steps towards that. But TCGA itself, those are all de-identified, those samples. Lon Cardon: No, no. I understand. It was more infrastructure. You've got the teams for analysis for data collection for standards, I presume. Brad Ozenberger: Yeah. There's a lot -- that's actually a point for looking at the future is, yeah, we realize we've got this big infrastructure built, and so those are some of the sort of things we're looking at now to see if we can build on that and take advantage of it. Davis [spelled phonetically]. Male Speaker: Another forward-looking question. You mentioned metastasis and treatment resistance as possible themes for future phases, and could you just clarify for me what the degree of stratification of the tumors that have already been analyzed is? If you say 500 for a particular cancer type, is that all primary or does that already include a mixture of primary, metastatic, the failed to respond to treatment? Brad Ozenberger: These are -- TCGA is all primary tumors. There are a few cases where we have additional samples from the same patient, but these are all primary, so we did not design it in a way to really go after those questions. Male Speaker: And how about the issue of tumor heterogeneity within a primary -- what's the level of multiple analysis of what might be a tumor looking like mini tumor? Brad Ozenberger: Again, we have a few things. We're actually talking about doing a pilot in that, because we have tissue cases where we can know we're at least millimeters apart, maybe a centimeter, but no, there's been -- we really don't have -- didn't do the accrual in such a way that we can take samples that are far apart geographically or anything like that. So it's -- we actually do the heterogeneity simply through one sample and going deep into the sequence to try to understand it, but that's all. Richard Wilson: You know, there's more and more of that popping into TCGA as we figure out how to do it. So the AML dataset, for example, there's extensive analysis of heterogeneity in all of those primary tumors. There's also a number of samples, breast, I think, where there are trios or there is primary tumor adjacent, nonmalignant, and a blood normal to get some idea of what we see in the adjacent tissue in terms of new mutations. Howard McLeod: Field effects. Richard Wilson: The so-called field effects, right. So it's in there and I think it's maturing along with our ability to really do those kinds of analyses. Brad Ozenberger: But some of this will have to wait until the next phase. Howard McLeod: As you're going towards that, I think some -- both the last two questions are heading towards some of the technical challenges. You mentioned there was some technical development, but you know, for example, the -- over the last 11 years we've been sampling blood in a possible [spelled phonetically] tumor from all of the NCI clinical trial studies for the cooperative group I'm involved in. We have blood on almost 90 percent of the patients, about 40,000 patients worth. We have fixed Male Speaker: Right. Howard McLeod: -- on about 30 percent, and we have fresh tumor almost none. And it's not because of trying, it's because of the culture of way tissue is handled, not necessarily that it's a bad thing, but just that -- so you could either Don Quixote and try to get people to freeze the tumor, and that'll come eventually, or you can really push on the technology for handling the fixed stuff and all that. I know every center has their magic way of doing it, but I'm not sure that any -- I believe any of them, including the ones from our own center. Brad Ozenberger: We -- TCGA took an attitude of "no platform left behind," so if we don't get good quality RNA from a tissue that tissue doesn't -- that sample doesn't qualify for TCGA. For example, we've done now a lot of work with FFPE tissue and, of course, the sequencers can do a pretty good job of getting exome from those. Sometimes the RNA is much more difficult, but -- so we're looking at now, you know, in the next iteration, you know, maybe sometimes we don't have the RNA data or it's not as good quality and try to do it anyway, but yeah, we really are looking at FFPE tissue as being very important for the future. Rudy Pozzatti: I probably need to cut this off. Male Speaker: Sorry, we ran out -- Brad Ozenberger: No, that's okay. Rudy Pozzatti: Brad, it's a very interesting topic and fabulous work. Brad will be around if people want to follow up. So, Simona, could you please come forward?