Tip:
Highlight text to annotate it
X
08 TCGA 112712 Andrew Munqall
Andrew Munqall: Thank you, Richard, and thank the local organizers
for inviting me to talk at this prestigious event. So, I'm actually going to tell you
about a -- essentially, it's a cancer that's already published as far as the marker paper
is concerned, but I will be talking about two new datasets that we've added quite recently,
those of the transcriptome sequencing, and that's both the messenger RNA sequencing and
also microRNA sequencing. Previously, these things had been studied at the microarray
level.
So just a little background about the high-grade serous ovarian cancer cohort that the TCGA
consortium has collected. A little background. Most deaths from -- are the result of this
advanced stage high-grade serous ovarian cancer, about 70 percent of all ovarian cancer patients.
And the TCGA group had published last year this marker paper in which a cohort of 489
tumors were studied primarily at the expression level for messenger RNA and microRNA, DNA
copy number evaluations, as well as DNA methylation. And additionally, you've heard already from
the Broad Institute, this 316 cases of tumor and normal exome sequencing to complement
the dataset.
Fundamental messages coming from that marker paper included the diseases defined by and
characterized by very simple mutational spectrum, in which TP53 mutation is predominant, in
almost 96. So, almost all patients have these TP53 mutations, and also, characteristically
high frequency of somatic copy number alterations, both focal gains and focal losses. That was
in stark contrast to previous glioblastoma multiforme study in which there was very low
copy number.
The aims of this study that I'll tell you about in the next 10 minutes are essentially
the transcriptome sequencing, and to use this transcriptome sequencing, primarily the RNA
sequence, to define subtypes, importantly structural variants that could not be established
well with microarray-based technologies, and alternative spliced transcripts to name but
a few.
So the dataset is described in this slide. We received 490 tumor total RNA samples from
the Biospecimen Core Resource Repository. These samples had been collected from 15 different
tissue source sites across the world, and we were able to generate RNA sequence libraries
for sequencing of 420 of those, all of which have been submitted to the Cancer Genome Hub
and the Data Coordinating Center. Of these 420, 300 were what we deemed high quality
expression datasets that passed very stringent quality control metrics. Those expression
profiles have likewise been submitted to the DCC. A further 485 samples have microRNA sequences,
again, submitted and publically available. And then the preliminary analyses that we
performed on these datasets are listed at the bottom of the slide here, include unsupervised
consensus clustering to identify subtypes. I'll talk to you a little about that. The
microarray anti-correlations with the gene isoform expression. I'll very briefly touch
on that. And then a little more on fusion identification using two platforms, our in-house
Trans-ABySS, and then the University of Chicago's fusion-finder algorithm.
So to touch on subtypes. In this slide I'm showing Figure 2 from the ovarian marker paper
published last year. In this study there were four different subtypes defined, corresponding
here differentiated, immunoreactive, mesenchymal, and proliferative subtypes. When we perform
similar unsupervised cluster analysis using our sequence-based expression profiles, from
300 tumors we identify potentially two additional groupings, and this NMF cluster is illustrated
here with both cophenetic scores showing high values here and the average silhouette widths
also supporting that there may be additional clusters to the four previously published.
Of course, if we then look for the correspondence of the samples within this new six-cluster
solution to the existing four, we see four discrete clusters that map almost identically
to those prior published. These are our clusters four, one, two, and five, but then additionally
we see these two slightly smaller clusters, cluster six and cluster three, for which the
samples don't map to a single pre-defined locus. And so this adds some support to the
fact that there may be additional subtypes within this data that we're seeing through
the sequencing work. Those are the two additional ones there.
We can do -- perform the same analysis for microRNA sequencing. Again, in the consortium
publication, three robust microRNA clusters were identified. We also see reasonably robust
evidence for six clusters, and here we're putting some of the top driver microRNA signatures
onto each of these clusters. Many of them are familiar to those of you working on pan-cancer
and multiple different tumor types. But in this case, unlike the RNA sequencing data,
we see very little correspondence between the novel cluster solution and those existing
previously, and clearly, we need to dig deeper into these analyses to identify and perhaps
add P values to these Bezier curves to identify whether there are enrichments between certain
clusters.
With these expression signatures in hand, we can turn to ask questions such as the interplay
between microRNA and messenger RNA, and here just to give an example is a relationship
that was actually published by Chad Creighton this last year between this microRNA-29a and
the locus DNMT3A DNA methyltransferase gene. What we're showing here are the expression-based
for each of the six subclusters that we've identified for RNA sequencing, the expression
of DNMT3A in each of those clusters, and we can see, for example, in the gray cluster
increasing RNA expression. Conversely, if we look at this bottom plot, this is the expression
profile for the microRNA-29a and we see decreasing expression corresponding, so anti-correlated
with the RNA. But we only see this trend in cases for which the microRNA binding site
is present in our isoform. And this is where the sequencing gives us additional resolution
that may not be captured in microarray experiments. So an example shown, the three top isoforms
of this gene all contain a microRNA binding site; the shorter isoform is absent and has
no expression correlation with that of the microRNA.
Turning now to the gene fusion detection within this cohort. We've applied our in-house assembly
and analysis pipeline, TransABySS, to all 420 cases, and identified about 4,300 candidate
fusions. In the absence of total RNA remaining, total RNA for verification, we've turned to
orthogonal approaches, and we have been working with Kevin White's group at the University
of Chicago. Their group has been running UC-fusion-finder on this same cohort. And looking only at the
intersection, we identify approximately 1,500 such gene fusions called by both platforms.
Of these 1,500, 64 are recurrent; that is, present in two or more cases.
And the distribution of these is very interesting. So -- and really in stark contrast to other
studies, such as the acute myeloid leukemia study. In ovarian we see a high degree of
duplication, and this is consistent with the findings in the marker paper of copy number
-- focal copy number gains and losses. And so many around this Circos plot, those arcs
that are linked in the same chromosome block are essentially the result of duplication,
and there were very few cases of translocation for ovarian, but you can see the density of
the recurrent gene fusions. That's all that's being plotted here. In contrast with AML,
the very many more in-frame fusions indicated by the green color, and the thickness of these
bars as corresponding to the level of recurrence. So, many more highly prevalent fusion events
and the result largely of translocations within AML. So very stark differences.
If we tease apart the ovarian fusion events into both in-frame and out-of-frame, we identify
the most recurrent in-frame events in this chart, and the colors here indicate events
that are seen in the Mitelman database of chromosome aberrations in cancer. So those
in purple are known fusion events, where both gene partners have been previously reported.
Those highlighted in green are where a single gene member of that gene fusion partner have
been previously reported. And then the remainder in gray are entirely novel gene fusion constructs
identified through our analysis. To draw your attention to -- there's a single case of TFG-GPR128,
which is a known polymorphism within the database of genomic variants.
So the most highly prevalent gene fusion event we have is -- in-frame is this MECOM, or MDS1
and EVI1 complex locus. And this was observed to be focally amplified in over 20 percent
of the ovarian tumors in the TCGA early report. Of interest, MECOM is a target of a couple
of FDA-approved therapeutic compounds listed here. And like I said, we've identified these
in-frame fusions in approximately 3 percent of this cohort. Primarily, the diffusion events
fuse the exon1 of MECOM to an entire transcript of a novel partner. And as cartooned in this
slide, MECOM and the partner genes, as a result of the duplication events, are present on
chromosome three band, q26.2, and we have a fusion between this exon1 of MECOM, and
in this particular case, in which six patient samples contain the fusion, we have the entire
transcript locus for this leucine-rich repeat containing protein.
Of interest, the 5' end of MECOM contains a 12 amino acid signature sequence, which
has previously been shown to recruit MAP kinases, SMAD3, and SUV39H1, and so transcriptional
corepressors and the like.
So we've now taken the gene fusion partners for all 1,500 events and identified pathways
which may be linked to these genes. So, of the 2,500 unique genes, we see an enrichment
within the COSMIC database, 105 of these genes are seen in the cancer census as causally
implicated with cancer. Some of the pathways listed on this slide are familiar to many
of you. If we then remove these 105 genes from the total set, and the one remaining
pathway is the ubiquitin-mediated proteolysis, and so certainly this warrants further investigation.
So to summarize, we've generated mRNA-seq and microRNA-seq for 420 and 485 of these
TCGA ovarian samples. Unsupervised clustering of the expression profiles identifies potentially
additional sample groupings, and an exploration of putative microRNA and mRNA interactions
identify significant expression anti-correlations, including the example I provided that was
previously published.
In contrast to other cancers, AML being an example, duplication is the primary rearrangement
leading to gene fusions and is consistent with the TCGA publication. And MECOM fusions
are the most recurrent in-frame events that we've identified within this tumor type.
So ongoing work includes the identification of recurrent partial tandem duplications and
the internal tandem duplications, and my colleague, Lucas Swanson, is here with poster number
106. I encourage you to visit. Further pursuit of this MECOM, especially in light of the
therapeutic target, is warranted, and, of course, differential expression and a discriminatory
gene analysis, and further integration with existing and novel TCGA datasets is in the
pipeline.
So I thank you for your attention. I thank my colleagues at the B.C. Cancer Agency Genome
Science Center, and I'll happily take any questions. Thank you.
[applause]
Richard Gibbs: Time for a quick question or two. So the correlative
observation of the large number of fusions with the overall level of genomic rearrangement
in ovarian, at what point do you say there is a strong causative association, you know,
the genome is rearranged because that disease wants to see more fusions? Where, you know,
where do you [inaudible] --
Andrew Munqall: It's key, I believe, that TP53 is mutated
in almost all, if not all, of these cases, and so genome rearrangement is clearly an
integral part of this disease and quite different to many of the other tumor types we see. Whether
the transcription fusions -- I mean, I think it must be looked at the pathway analysis,
because the highest recurrency we've seen is still relatively low at around 3 percent,
and so whether it's a combinatorial driving of the disease needs further exploration.
Richard Gibbs: Thank you. Well, we'd better move on. We've
got --
[applause]
[end of transcript]
NHGRI/NCI: 08 TCGA 112712 Andrew Munqall 4 12/13/12
Prepared by National Capitol Captioning 200 N. Glebe Rd. #1016
(703) 243-9696 Arlington, VA 22203