Tip:
Highlight text to annotate it
X
Rudy Pozzatti: -- have a series of three project updates,
the first of which is from Brad Ozenberger, program director at NHGRI, and it's on The
Cancer Genome Atlas project.
Brad Ozenberger: Yeah, hi, everybody. I -- in preparation for
this, I went back and looked to see when we last gave an update on TCGA for you, and it's
been almost three years. I went back and pulled the first slide from last time, which was
this, when President Obama visited NIH in September of 2009, just three and a half years
ago, to announce biomedical research investment through the American Recovery and Investment
Act, and that included naming NIH -- or TCGA an NIH signature project and a big bolus of
investment, $175 million. And I guess I'm here now three years later to say I think
that was a pretty good investment. I think we've used it well, and I'll try to describe
that.
So just a quick reminder of what TCGA does. The Cancer Genome Atlas, our goal is to do
comprehensive genomic analysis on all of the major tumor types in the U.S. And a real key
to this is to take each tumor type, each specimen from each research participant, and do all
the analyses we can on that tumor specimen. So, doing exome sequencing, RNA-seq, micro-RNA,
methylation, and even more recently added some protein analyses; all on the same tissue
samples from these participants. To do this requires quite a pipeline that's been built
up over the years, a biospecimen core resource that processes all the samples, collects the
clinical data, six genome characterization centers, the three genome sequencing centers
supported by NHGRI. We've added genome data analysis centers in a large effort to coordinate
these data.
This is what it looks like on a map, just to remind you.
NHGRI's investment in TCGA in terms of funds is through the large-scale sequencing program,
the three big centers, Broad, Baylor, and Rick's center at Wash U, and that's our financial
commitment to TCGA, is through their genomic sequence and analysis.
Just to kind of go through a bit of history and where we're going. TCGA, the genesis of
TCGA, was from a report from the NCA -- the NCI Advisory Board back in 2005 that proposed
actually a number of -- they were predicting a number of technological advances that were
in the works, and actually just part of this report was to design and develop a large cancer
genome effort with -- it was suggested with NHGRI. And we set this -- set off on this
program really in 2007 to pilot this, starting with glioblastoma and ovarian serous carcinoma,
and we knew we had to establish the infrastructure, a pipeline, and feasibility.
And we started this, Rick's here, with capillary electrophoresis sequencing, but we all knew
that we were going to hit this point. The next generation sequencing was going to come
on and really make this project feasible. Of course, with the reduction in costs, if
you've probably seen this graph before that is on our website, of course also there's
a great increase in capabilities.
So, although this was coming, we started before next-gen kicked in, and actually this first
GBM report was with just a small gene list with capillary electrophoresis sequencing
and analysis and still using gene arrays. But then shortly after this the project expanded.
We're now about in just past the mid-point of the main TCGA program. We're now up to
25 tumor types in TCGA. Sample acquisition was really beefed up at the NCI at the beginning
of the expansion. We added these genome data analysis centers, which weren't part of the
pilot phase to -- it was recognized we needed both. A lot more horsepower in terms of analysis
to do the integrative analyses of all the data, as well as lead in a lot of innovation
in genomic analysis methods.
And a major product of TCGA that have really only started recently are the large benchmark
papers that I'll go into a little more detail about. But the goal here for TCGA has been
to achieve greater than 10,000 cases examined, and we fully expect to meet this goal by the
end of 2014, in more than 20 different tumor types. In fact, we are now beginning to think
about what happens after this big -- the major phase of TCGA, beginning to look what happens
after 2014, and I'll touch on that a little bit at the end.
So just to point out these, the TCGA network papers. These are each -- really, I think
of them as historical datasets. These are deep into each one of these tumor types. Our
goal is to describe mutations that are found in these -- each tumor type, down to a frequency
of 2 to 3 percent, requiring approximately 500 tumors of each type.
Again, we started early on with glioblastoma. Again, this was a pilot phase that did not
involve next-gen sequencing, but then that was followed in actually just the summer of
2011 with the integrated analysis of ovarian carcinoma. This was the big shift. This involved
-- instead of its small gene list, this involved full exome sequencing, RNA-seq, and hundreds
of samples of ovarian cancer, and kind of set the standard, then, for where TCGA has
gone since.
Last summer we published the colon and *** cancer work, the colorectal analysis. In September
-- in September, the genomic characterization of squamous cell lung cancer. And then the
next week, actually, the comprehensive molecular portrait of human breast tumors, the one that
Eric mentioned in his director's report. Again, each of these papers really get a lot of deserved
attention. These are, in each case, novel discoveries, and what I particularly -- I'll
come back to that. What I really want to go into also is some of the clinical ramifications
of each of these papers.
So here's where we are today in terms of a project-by-project view. On the Y axis are
the total qualified cases -- the number of tumor cases that are -- have been analyzed
or are in pipelines. Again, our goal is 500. Breast, we put all the subtypes into a single
project and our goal there is a thousand for the breast project. The ones in red have gone
through a full data freeze, either they've been published or a number of them have written
-- the papers have been written and are currently under review. And then there are a number
of other projects now in the pipelines that are -- we have full-fledged analysis groups
working, and papers should be coming out later this year. And then there's this tail where
accrual is still going on, and these will come -- probably some of these in 2014. A
number of projects -- the ones starred here have closed. The accrual has closed because
we've exceeded the 500 goal, although we are still accepting African-American specimens
to fill out some of the diversity.
So all this can be found on a project dashboard I would point you to, on the TCGA website,
a project -- we call it the project case overview dashboard. That gives a snapshot of all the
data available for each project. Each project is listed. You can see here the number of
samples that have been accrued, the number that have qualified and entered analysis,
and then for every single data type, all the rows -- this is just a small corner of the
dashboard. Every row represents a different data type and you can see in there how much
data of each type are available for that tumor project. It's quite a handy overview of TCGA.
Just the top line numbers, we have now about 7,500 on our way to the 10,000 -- greater
than 10,000 goal. Cases are in the bank, qualified, and most of these are at centers at this point.
Greater than 6,000 cases with the full genomic datasets, so this is 6,000 cases with full
exome sequencing, RNA-seq, micro RNA-seq, methylation, and clinical data -- as much
clinical data as we can get. Again, and I point out also, hundreds of whole genome data
files. This number continues to grow and, of course, for every case, this is always
in cases, for every case for the genomic sequencing, it's both the tumor genome, as well as the
normal genome. And right now on this dashboard you'll find there's data available in our
database on 25 different tumor types.
So TCGA was set -- our goal is to create a community resource dataset. This would be
-- data would be released very quickly and then used by the community as it is, of course,
but we also have a large TCGA network that also has worked very hard to integrate the
data and provide first looks. So things like cancer stratification by gene expression or
methylation patterns, you know, every tumor type. There's a list of significantly mutated
genes and how those mutations are distributed across the cohort. Whole genome looks -- look
at individual whole genome data. This is a particularly scrambled lung squamous cell
carcinoma genome. And then all this is then integrated into a look at the pathways involved
in each of these tumor types together.
So although the goal -- and certainly thousands of people each day are digging in to the TCGA
datasets, our own network is doing a lot of work as well, but I think we didn't fully
anticipate when we started the program how quickly data would translate to potential
clinical utility, and I just want to briefly go through some -- a few examples. There's
just such rich data, and as we learn, as the groups learn to integrate all these data types
and really build a picture of what's -- of the foundation of the genesis of these cancers,
really reveal something that can translate right to the clinic in many cases.
So just to go through a few of these quickly. In GBM, even that very first paper early on,
there was an interesting example of -- many GBMs show hypermethylation of the MGMT locus,
and these tumors require resistance to standard-of-care therapeutics, and the TCGA data explained
how this occurred through shutdown of mismatched repair pathways, and immediately suggested
changes to the regimen, treatment regimen, for patients with recurrent GBM tumors.
The ovarian work -- it was known in ovarian cancer that the FOXN1 transcriptional factor
network was frequently mutated, altered, but now with the full TCGA with hundreds of cases,
this was a very high percentage; 87 percent of tumors showed some alteration in this pathway,
not always in the FOXN1 gene itself, but all these peripheral additional nodes that feed
into it, suggesting, perhaps, a common target for ovarian cancer.
But on the inverse also we -- the TCGA group identified the full spectrum of frequently
amplified genes were delineated. These are a number in the dozens, but, of course, each
individual tumor has a different gene, or two genes, or three genes that are amplified
and would be predicted to help drive the disease, and, you know, really points to the fact that
we need a customized treatment for each individual tumor.
Colorectal. First, colorectal started as two projects, colon carcinoma and *** carcinoma,
but it was quickly confirmed that, in fact, the molecular genomic underpinnings of these
diseases show that it's a single disease, so we immediately merged these into the colorectal
project. It's just one disease. And integrative analyses showed, again, similar to the FOXN1
story in ovarian, the prominence of the WNT-signaling alteration and promise of inhibitors in this
pathway.
Breast. Tumors of the basal subtype were found to have the same genomic signatures, in a
large sense, as the ovarian serous tumors. These are poor prognosis, aggressive tumors,
and we can see -- and this shows copy number data, ovarian versus the basal over here.
You can see the similarities in copy number, but not just in copy number, but other genomic
analyses as well. You can see this similarity. And already, ovarian clinical trials are being
adjusted to test these compounds for efficaciousness also in breast basal-type tumors.
Importantly, also, in this paper, the clinically-defined HER2-positive tumors. It was known that there's
always a substantial proportion of HER2-positive tumors that don't respond to the normal EGFR
inhibitors, and, in fact, in closer analysis of the TCGA data, they could easily divide
the HER2 positive into two different genomic subtypes, and one that is predicted to respond
to the EGFR inhibitors and one which wouldn't, and it shows an important marker that would
adjust the therapy for those patients with that marker.
Lung squamous. Lung squamous cell carcinomas are over 25 percent of lung tumors in the
U.S., but, in fact, it has been very -- rather poorly described genomically. So this was
the first real hard look at genomic -- the genome of lung squamous, and identified a
number of interesting targets. Importantly also, it identified markers that showed similar
underpinnings to lung adenocarcinoma, and in speaking with a clinical trialist in lung
cancer, they were immediately going to test some of their compounds from a lung adeno
trial in lung squamous that had the appropriate mutations where -- that suggest it might work.
A couple of papers that aren't out yet, but will be shortly, kidney clear cell carcinoma.
Again, this is one of these cases where it's known that SWI/SNF chromatin remodeling complexes
sometimes mutate in these genes, but again, in the TCGA data, we can now show that this
is a majority of these tumors, in fact, and there's a lot of interest in therapeutic compounds
that modulate this pathway and potentially modulate this disease.
And then endometrial in a bit of a sort of unexpected finding, 25 percent of endometrial
tumors again share this hallmark, these markers of ovarian serous carcinoma. Here now, just
like the previous slide, serous ovarian tumors and copy number, the basal breast is mentioned,
and also serous -- these endometrial subtype we call serous-like now, and these are associated
with an increased risk of recurrence, and now we have a better handle from this work
on what the genetic mutations are that drive this.
So I just wanted to give those few examples. And it's -- although, again, it wasn't our
first goal to get these data right to patients where it might make a difference, but certainly
it's happening on a more rapid timeframe than I might have expected.
So clearly what TCGA is driving at, you know, of course now if you -- the cancer diagnostics,
it's mostly from pathologists reviewing slides. Of course this will still be important, but
certainly we're starting to see now the increased emphasis on genomic analysis in oncology companies
like Foundation and New York Genome Center and others. TCGA is really providing a lot
of the foundation to drive this personalized therapy. I don't like that word, but individualized
therapy in cancer.
So looking forward, again, we've had a number of papers that came out last fall: colorectal,
breast, lung squamous. Coming soon, these are under review: kidney clear cell, endometrial,
and AML. And a number of other projects that I would hope would be out before the end of
the year, and followed by some big ones, such as prostate and melanoma that'll follow that.
TCGA has created this atlas of mutations. You know, really, I think, been successful
in understanding -- beginning to understand the biology of cancer through this project,
this compendium atlas of mutations that drive these cancers. New drivers have been identified,
and, like I said, already changing clinical practice in some of these diseases. Also,
you know, I don't think anybody would argue that there's now firmly established that we
need to think about each patient's tumor as a unique disease. And I'm happy to say all
the major pharmaceutical companies have pipelines into the TCGA data now and are using these
data on a continual basis to drive therapeutic advances as well.
I want to point out that it hasn't just been about the biology of cancer that I think is
part of TCGA's success, but also the driving of technology. You know, the pole of the TCGA
program has driven the development of cancer genome analysis methods. This is a real flagship
project. But many new analysis and informatics tools adopted -- are being adopted to all
fields of genomic research, of course not just applicable only to cancer.
So, in the next phase, we are just a couple of years out and on a good projectory, but
we do think TCGA will wind down. There will be some final analyses, certainly, for a year
or two afterwards. Eric mentioned a workshop that we had in end of November, I think. NCI
and NHGRI are working closely together, and separately, to develop some new initiatives.
Certainly we want to continue this approach. There's still -- even with TCGA, there are
still many mutations, you know, as we go deeper into these tumors that aren't fully explained,
and certainly there still needs to be some atlas development in cancer.
And then more importantly, I think, we're looking hard at moving more towards the clinical
trial area to begin to investigate now the genetic underpinnings, for example, metastasis
and response to therapy, that's going to require us to really get a little closer to the clinical
trial areas to get these specimens and get these data.
All right. So with that, I'm going to close. Just acknowledge -- Heidi Sophia and Lindsey
Lund work with me every day on TCGA, and Mark Guyer, still a real key part of the team.
And I want to acknowledge Jane Peterson and Peter Goode who are involved in many years
of the early stage. And this is the NCI team. They have a full office for TCGA led by the
dynamo Kenna Shaw, if you've encountered her. And then they have the new Center for Cancer
Genomics at NCI that we're working with co-directed by Stephen Chanock and Lou Staudt.
With that I'll stop. Any questions? Yeah, Jill.
Female Speaker: Brad, do you want to say anything about how
the ICGC project complements TCGA and what they've done so far?
Brad Ozenberger: Yeah. Yeah, I neglected to mention that. TCGA
is a major player -- a major part of ICGC. It's the bulk of the data in ICGC. Yeah, we've
always -- we were kind of in front of them, of course, but we've been very pleased to
see that a lot of large projects in Spain and Italy, of course in the U.K., have been
catching up and contributing greatly. We meet at least once a year, and there have been
a number of coordinated efforts in certain tumors, prostate as an example, where one
group looked at very -- at tumors that only occur in young men and somebody else is looking
at tumors that are refractory to therapy, and so we've done a good job of synergizing
across that consortium and it continues to be something that -- sorry, continue to be
very important. There's a -- they have their own database run out of the University of
Toronto -- the Ontario Institute for Cancer Research with Tom Hudson. And we work very
closely with getting TCGA data into there. Yeah, Mark.
Mark Guyer: Yeah, I just wanted to add on your point about
community involvement, that the analysis groups have become much bigger than any of the -- than
the TCGA-funded groups. The project has been really good about bringing in wider participation
by the community in the analyses. So I don’t know if you want to amplify on that.
Brad Ozenberger: Yeah. So each -- around each of these tumors,
of course, a big analysis group forms. We designate a PI within TCGA to kind of be a
leader, and then usually there's a disease, a specific disease expert, too, that they
kind of co-chair the analysis. Then we invite experts in each disease to come in and contribute,
and so, yeah, if you know of people who are interested in a particular tumor on the list,
please have them contact us and we can get them involved. Yeah, Pilar.
Pilar Ossorio: This is just an informational question from
somebody who hasn't been keeping up with TCGA. What's the difference in the work between
what the analysis centers do and the genome characterization centers? Are the genome characterization
centers mostly about structure, is that --
Brad Ozenberger: No. They're data -- so the genome characterization
centers are data generators. So, they're doing the RNA analyses, SNP/ChIP array, the things
that aren't done by bulk genomic sequencing. And the genome data analysis centers are strictly
computational. Yeah.
Joseph Ecker: I know that TCGA has had methylation, for
example, and that would be genetic mark as part of the program, but I wonder whether
or not there were plans or discussion about including, you know, histone modifications.
I mean, many of the genes have been identified and a number of them have been turned out
to be epigenetic modifiers in some way, and I'm wondering if there's any plan -- certainly
ACR has been talking about this as a workshop report on trying to gather groups together
with an interest in supporting epigenetic analysis of those same tumors, which I think
would add another dimension to the data.
Brad Ozenberger: Yeah. There are residual tissues that remain
in the bank, and we actually want to try to make those available, although there isn't,
you know, a spec sheet on the website on how to do that yet, or we don't really know yet,
but we have begun some protein analyses, mostly in the, you know, phosphoprotein ChIPs and
that sort of thing. But yeah, right now the histone modifiers are really not part of the
project.
Lon Cardon: So, I'm [inaudible] by these immediate clinical
translation findings and I'm wondering, as those are discovered and as they're going
presumably to clinical trials with maybe existing therapies but new indication, are you using
the infrastructure that you've got for TCGA to analyze pre- and post-tumor, given treatment
today? Or is that an opportunity that one could grasp?
Brad Ozenberger: Yeah. I think it's an opportunity. It's -- certainly,
NCI is making a lot of movements towards making all their clinical trials genomically enabled,
but yeah, really, that's more in NCI's court and they certainly see the value of that,
but yeah, we're -- it's kind of making steps, incremental steps towards that. But TCGA itself,
those are all de-identified, those samples.
Lon Cardon: No, no. I understand. It was more infrastructure.
You've got the teams for analysis for data collection for standards, I presume.
Brad Ozenberger: Yeah. There's a lot -- that's actually a point
for looking at the future is, yeah, we realize we've got this big infrastructure built, and
so those are some of the sort of things we're looking at now to see if we can build on that
and take advantage of it. Davis [spelled phonetically].
Male Speaker: Another forward-looking question. You mentioned
metastasis and treatment resistance as possible themes for future phases, and could you just
clarify for me what the degree of stratification of the tumors that have already been analyzed
is? If you say 500 for a particular cancer type, is that all primary or does that already
include a mixture of primary, metastatic, the failed to respond to treatment?
Brad Ozenberger: These are -- TCGA is all primary tumors. There
are a few cases where we have additional samples from the same patient, but these are all primary,
so we did not design it in a way to really go after those questions.
Male Speaker: And how about the issue of tumor heterogeneity
within a primary -- what's the level of multiple analysis of what might be a tumor looking
like mini tumor?
Brad Ozenberger: Again, we have a few things. We're actually
talking about doing a pilot in that, because we have tissue cases where we can know we're
at least millimeters apart, maybe a centimeter, but no, there's been -- we really don't have
-- didn't do the accrual in such a way that we can take samples that are far apart geographically
or anything like that. So it's -- we actually do the heterogeneity simply through one sample
and going deep into the sequence to try to understand it, but that's all.
Richard Wilson: You know, there's more and more of that popping
into TCGA as we figure out how to do it. So the AML dataset, for example, there's extensive
analysis of heterogeneity in all of those primary tumors. There's also a number of samples,
breast, I think, where there are trios or there is primary tumor adjacent, nonmalignant,
and a blood normal to get some idea of what we see in the adjacent tissue in terms of
new mutations.
Howard McLeod: Field effects.
Richard Wilson: The so-called field effects, right. So it's
in there and I think it's maturing along with our ability to really do those kinds of analyses.
Brad Ozenberger: But some of this will have to wait until the
next phase.
Howard McLeod: As you're going towards that, I think some
-- both the last two questions are heading towards some of the technical challenges.
You mentioned there was some technical development, but you know, for example, the -- over the
last 11 years we've been sampling blood in a possible [spelled phonetically] tumor from
all of the NCI clinical trial studies for the cooperative group I'm involved in. We
have blood on almost 90 percent of the patients, about 40,000 patients worth. We have fixed
Male Speaker: Right.
Howard McLeod: -- on about 30 percent, and we have fresh
tumor almost none. And it's not because of trying, it's because of the culture of way
tissue is handled, not necessarily that it's a bad thing, but just that -- so you could
either Don Quixote and try to get people to freeze the tumor, and that'll come eventually,
or you can really push on the technology for handling the fixed stuff and all that. I know
every center has their magic way of doing it, but I'm not sure that any -- I believe
any of them, including the ones from our own center.
Brad Ozenberger: We -- TCGA took an attitude of "no platform
left behind," so if we don't get good quality RNA from a tissue that tissue doesn't -- that
sample doesn't qualify for TCGA. For example, we've done now a lot of work with FFPE tissue
and, of course, the sequencers can do a pretty good job of getting exome from those. Sometimes
the RNA is much more difficult, but -- so we're looking at now, you know, in the next
iteration, you know, maybe sometimes we don't have the RNA data or it's not as good quality
and try to do it anyway, but yeah, we really are looking at FFPE tissue as being very important
for the future.
Rudy Pozzatti: I probably need to cut this off.
Male Speaker: Sorry, we ran out --
Brad Ozenberger: No, that's okay.
Rudy Pozzatti: Brad, it's a very interesting topic and fabulous
work. Brad will be around if people want to follow up. So, Simona, could you please come
forward?