Tip:
Highlight text to annotate it
X
Lee Cooper: Good morning. So I'm Lee Cooper; I'm from
the Center for Comprehensive Informatics at Emory University and today I'm going to be
talking about some of our morphometric analysis of glioblastomas and correlates between morphometry,
patient outcome and molecular data. So -- let me get the laser pointer here. So our group
is sort of imaging centric. We are contributors of tissue to TCGA, but we also consume the
data, so our science PI, Dan Bratt [spelled phonetically], is a neuropathologist and we
also have and associate, David Guttmann [spelled phonetically], who leads up a group of neuroradiologists
who are examining radiology data, so most of our questions focused around the idea that,
you know, you observe something in an image, how does that relate to patient outcome or
molecular status or response to therapy? Something like that.
So TCGA is really a unique data set in the sense that you have a large number of samples
where you have histology that's linked with patient outcome and linked with molecular
data. So if you're not familiar, we have these new devices; they're proliferating more, so
it's called a slide scanner. And basically, you put 200 or so slides in and overnight
it produces high resolution images for you. So these images typically have, you know,
more than a billion pixels and they're at 20x and you can see the tissue in clear detail
all the way across. So within TCGA, we have scans of frozen tissue, so each chunk, they
take a top slice and a bottom slice and that's done for quality control, so those scans are
out there. There are also scans of diagnostic block permanent sections, and so that's mostly
what we use for our sort of automated image analysis. These are at 20x magnification.
If there's somebody out there from the TCJ, if you're interested in doing 40x, we'd really
like to have that, if that's possible.
So one of the other things that we have besides these images are, you know, there are teams
of pathologists who basically look at them and rate criteria, so the basic things are
percent tumor nuclei, percent necrosis, but I know in GBM, they rated a lot of histological
criteria, so, is there a presence of gemistocytes, is there an oligo [spelled phonetically] component,
quite a few categories, lymphocytic infiltration, things like that.
So, why would you want to analyze histology in glioblastoma? Well, it turns out that glioblastoma
is very heterogeneous in terms of the way it looks, so there are a lot of sort of discreet
cell types that show up, you have large cells, you have gemistocytic components, but you
know, as another part of the story on the left here, what you can see is that you know,
even though GBM is a grade for astrocytoma, frequently we see, you know, oligodendricytic
[spelled phonetically] components and there are also cells that are sort of in between
an astro and an oligo type cell that don't really fit into any kind of discreet category.
So, a lot of this stuff is not understood, while some of this, you know, component type
cells are linked to specific genetic alterations.
The whole thing is not clearly understood, so what we're getting at here is to try and
see, you know, are there any kind of clustering of the morphology of GBMs. If we can describe
it using some sort of algorithm, do patients cluster in terms of their morphology? And
then the obvious question to ask after that would be if they do cluster, what are the
links between these clusters and outcome molecular data, et cetera? So this is just a 5,000 foot
view of the sort of pipeline we've come up with. We have several layers involved in this,
but the general idea is that we use image analysis to capture some description of the
cells in a whole slide image that belongs to a patient and from those descriptions,
we calculate a morphology signature for the patient. And then what we do is to cluster
these morphology signatures so that you're essentially clustering the patients into different
groups and once you have those groups, you can do all kinds of correlative analysis,
looking at outcome, look at, you know, significant differences in expression, et cetera.
So I'm going to go into each one of these components in a little more detail. So really,
the core of the analysis is this image analysis component, so Jim Kong [spelled phonetically]
in our group has developed a system that goes into these slides and circles every single
nucleus and then defines, you know -- so the nucleus is circled in red and then he defines
a high confidence area of cytoplasm, since we don't have any kind of membrane marker
and these are glial cells. And then what he does is to describe these cells using a set
of features that capture the shape, the standing [spelled phonetically] characteristics, texture,
things like that, and so each cell gets its own description and these things all are stored
in a database for, you know, ease of use. And then what we do is for each patient, we
calculate a morphology signal by just taking the arithmetic mean of their cells, so basically
what you're looking at is using these descriptors, what does the average patient's cell look
like?
Once we have these patient morphology profiles, we pass them into the clustering engine, so,
you know, because of the nature of processing slides and et cetera, there's normalization
that needs to be applied. We also do feature selection to eliminate redundant features
that are not informative. We use the consensus clustering method, then, to get a really robust
clustering of the patients together and then we can do all kinds of visualization and [unintelligible]
spaces, et cetera.
So once we have those cluster labels for the patients that are driven by morphology, what
we can do is just to follow sort of normal pattern for an integrative type analysis.
We look at survival; we look at relationship between morphology clusters and molecularly
defined clusters, or, you know, classifications like the [unintelligible] classifications
of GBM or the G-CIMP phenotype. We also want to check against our expert pathologists and
the ratings they've provided and see is there any enrichment of certain components they
can describe, like small cells or gemistocytic components, et cetera, and we check against
a limited set of recognized genetic alterations.
From there we pass into a whole genome analysis where we do, you know, deep analysis looking
across the genome for differences in expression among the clusters, differences in copy number
methylation, et cetera. So, an analysis of 200 million nuclei from the TCGA data from
162 GBMs, we found three clusters and we named these clusters after the functions of genes
that are associated with them, so we have the cell cycle cluster on the left, the chromatin
modifying cluster in the middle and the protein biosynthesis cluster on the right, and so
these groups are prognostically significant. The chromatin modifying cluster has a worse
outcome if you compare that to the other two groups, it's statistically significant.
So the next thing we did, now that we have these clusters, is we need some type of visualization,
so for each patient, we picked their cell that's closest to their morphology signature,
so this is sort of the average looking cell for each patient, and we put these into groups,
and so based on our pathologist's feedback, there are some differences. The cell cycle
cluster is more hyperchromatic, it's darker, it also has a slightly larger size. The chromatin
modifying cluster has more basophilic cytoplasm so it's kind of speckled, has the least intensely
stained nuclei. And then the protein biosynthesis cluster is kind of a mixture of the two, it's
sort of somewhere in between, less distinguished. So we validated this finding in a separate
set of GBMs we obtained from our collaborators at Henry Ford, so we just looked at, you know,
the clustering, again, doing a de novo clustering using the selected features from before and,
you know, immediately we recognized there's a cell cycle cluster and the chromatin modifying
cluster. There's also a third -- the PB cluster doesn't, you know, immediately appear; there's
some kind of mixed component in between. And the survival trends remain the same as they
did in the TCGA data sets, so this is encouraging.
So now, just to go on to the associations -- we looked at, you know, several different
things, so -- we looked at association with molecular subtypes and we found some things
that are mildly significant, but nothing really definitive. The same goes for the ratings
of pathology, so there's some small cell enrichment in the cell cycle cluster, some lymphocyte
enrichment in the chromatin modifying cluster, but it's not really anything that's so significant,
so specific to those things that it's definitive. The same goes for the genetics, so we wanted
to dig a little deeper. This just drives that point home a little more, so you can see,
for each cluster, each cluster as a bar here, you can see the distributions of the [unintelligible]
subtypes among these clusters, so there is some variation but it's pretty close to uniform.
So that doesn't really explain what we're observing here, but when we looked at the
genome-wide analysis, we did find some significant results, so there are quite a few genes that
are differentially expressed between these groups. I mean, we've subjected those to all
kinds of ontology and pathway analysis. One thing I would note is that this chromatin
modifying cluster does have the most hypermethylated samples, so there are 244 genes that are hypermethylated
there compared to the other clusters. So that's a good validation, too.
So one of the interesting things about the go [spelled phonetically] analysis of these
gene sets is that the nuclear lumen was the most highly enriched term in all of those,
so we're analyzing nuclear morphology and the genes that we pulled out when we compare
these groups, the most highly enriched term is related to nucleus. So other terms that
were enriched were, of course, the names for the clusters; that's where the cluster names
come from, but also things that you would imagine could affect shape, like, you know,
M phase or DNA repair. We also subjected these lists to an IP analysis, we found differences
in cancer-related pathways, so one of the clusters we have ATM and TP-53 damaged checkpoint
activation differences and kappaB pathway, when signaling P-10 and AKT signaling, so.
So our conclusion is that, you know, maybe these clusters are not definitive, but it
seems that there really is signal within these images that relates to molecular status and
also patient outcome. So one of the things we're working on is to develop some more complex
models to account for some of the heterogeneity that's in these samples, always with the risk
of, you know, not wanting to overfit things, so we're developing more complex models so
that we can answer questions better, correlate things better and have more specific results.
And I just want to thank the TCGA for providing a terrific data set.
Here's some of our collaborators in our group. There's Joe [unintelligible], our director,
Dan Bratt is here, David Guttmann will be giving a talk later. If you can stick around,
he'll be doing the radiology portion of this. And I also want to thank our collaborators
at Henry Ford for providing slides for us, so that's Lisa Scarpachi [spelled phonetically]
and Tom Mickleson [spelled phonetically]. I'll take your questions.
[applause]
Male Speaker: Very nice presentation. Comment first. It's
really nice to see a relationship back to traditional pathology and where it's going
in the future. There's a lot of information in the H and E that was discovered hundreds
of years ago --
Lee Cooper: [affirmative]
Male Speaker: -- that we're finally extracting. My question
is, have you looked at the relationship between tumor nuclei and stromal cell components by
different categories? Andy Beck has done some nice work recently in breast cancer --
Lee Cooper: Right.
Male Speaker: -- using that type of analysis. Are you pursuing
the same thing in GBM?
Lee Cooper: So, I'm not a pathologist so I can't really
comment about stroma [spelled phonetically] and the role in GBM specifically, but I'm
familiar with Andrew Beck's work and we are looking at a more, you know, you can say more
complex description of structure, et cetera, instead of just focusing on individual cells,
so it'll be nice to know, you know, who lives close to the blood vessels and, you know,
what's happening around the necrosis, et cetera. So we just need to sort of boost up some of
our algorithms to be able to do those kinds of things.
Male Speaker: Hi. Are you considering a supervised approach,
finding individual features more correlative in prognosis, for example?
Lee Cooper: So, it's interesting. When you correlate these
features with prognosis, any one feature does not, you know, come out as significant, but
when you do clustering analysis and you're in higher dimensions, they do seem to segregate
in a way that provides some prognostic significance, so we'd be interested in other methods where
we can do sort of a more interesting regression type analysis. Maybe those features would
pop out.
Female Speaker: I'll just let you know that we learned from
you guys, so back in February of this year we actually changed the requirement to 40x.
Lee Cooper: Oh great. Thank you. That's good news.
Male Speaker: Okay. Thanks, Lee.
Lee Cooper: Thanks.