Morphologic Analysis of Glioblastoma Identifies Morphology - Driven clusters and... - Lee cooper

Lee Cooper: Good morning. So I'm Lee Cooper; I'm from the Center for Comprehensive Informatics at Emory University and today I'm going to be talking about some of our morphometric analysis of glioblastomas and correlates between morphometry, patient outcome and molecular data. So -- let me get the laser pointer here. So our group is sort of imaging centric. We are contributors of tissue to TCGA, but we also consume the data, so our science PI, Dan Bratt [spelled phonetically], is a neuropathologist and we also have and associate, David Guttmann [spelled phonetically], who leads up a group of neuroradiologists who are examining radiology data, so most of our questions focused around the idea that, you know, you observe something in an image, how does that relate to patient outcome or molecular status or response to therapy? Something like that. So TCGA is really a unique data set in the sense that you have a large number of samples where you have histology that's linked with patient outcome and linked with molecular data. So if you're not familiar, we have these new devices; they're proliferating more, so it's called a slide scanner. And basically, you put 200 or so slides in and overnight it produces high resolution images for you. So these images typically have, you know, more than a billion pixels and they're at 20x and you can see the tissue in clear detail all the way across. So within TCGA, we have scans of frozen tissue, so each chunk, they take a top slice and a bottom slice and that's done for quality control, so those scans are out there. There are also scans of diagnostic block permanent sections, and so that's mostly what we use for our sort of automated image analysis. These are at 20x magnification. If there's somebody out there from the TCJ, if you're interested in doing 40x, we'd really like to have that, if that's possible. So one of the other things that we have besides these images are, you know, there are teams of pathologists who basically look at them and rate criteria, so the basic things are percent tumor nuclei, percent necrosis, but I know in GBM, they rated a lot of histological criteria, so, is there a presence of gemistocytes, is there an oligo [spelled phonetically] component, quite a few categories, lymphocytic infiltration, things like that. So, why would you want to analyze histology in glioblastoma? Well, it turns out that glioblastoma is very heterogeneous in terms of the way it looks, so there are a lot of sort of discreet cell types that show up, you have large cells, you have gemistocytic components, but you know, as another part of the story on the left here, what you can see is that you know, even though GBM is a grade for astrocytoma, frequently we see, you know, oligodendricytic [spelled phonetically] components and there are also cells that are sort of in between an astro and an oligo type cell that don't really fit into any kind of discreet category. So, a lot of this stuff is not understood, while some of this, you know, component type cells are linked to specific genetic alterations. The whole thing is not clearly understood, so what we're getting at here is to try and see, you know, are there any kind of clustering of the morphology of GBMs. If we can describe it using some sort of algorithm, do patients cluster in terms of their morphology? And then the obvious question to ask after that would be if they do cluster, what are the links between these clusters and outcome molecular data, et cetera? So this is just a 5,000 foot view of the sort of pipeline we've come up with. We have several layers involved in this, but the general idea is that we use image analysis to capture some description of the cells in a whole slide image that belongs to a patient and from those descriptions, we calculate a morphology signature for the patient. And then what we do is to cluster these morphology signatures so that you're essentially clustering the patients into different groups and once you have those groups, you can do all kinds of correlative analysis, looking at outcome, look at, you know, significant differences in expression, et cetera. So I'm going to go into each one of these components in a little more detail. So really, the core of the analysis is this image analysis component, so Jim Kong [spelled phonetically] in our group has developed a system that goes into these slides and circles every single nucleus and then defines, you know -- so the nucleus is circled in red and then he defines a high confidence area of cytoplasm, since we don't have any kind of membrane marker and these are glial cells. And then what he does is to describe these cells using a set of features that capture the shape, the standing [spelled phonetically] characteristics, texture, things like that, and so each cell gets its own description and these things all are stored in a database for, you know, ease of use. And then what we do is for each patient, we calculate a morphology signal by just taking the arithmetic mean of their cells, so basically what you're looking at is using these descriptors, what does the average patient's cell look like? Once we have these patient morphology profiles, we pass them into the clustering engine, so, you know, because of the nature of processing slides and et cetera, there's normalization that needs to be applied. We also do feature selection to eliminate redundant features that are not informative. We use the consensus clustering method, then, to get a really robust clustering of the patients together and then we can do all kinds of visualization and [unintelligible] spaces, et cetera. So once we have those cluster labels for the patients that are driven by morphology, what we can do is just to follow sort of normal pattern for an integrative type analysis. We look at survival; we look at relationship between morphology clusters and molecularly defined clusters, or, you know, classifications like the [unintelligible] classifications of GBM or the G-CIMP phenotype. We also want to check against our expert pathologists and the ratings they've provided and see is there any enrichment of certain components they can describe, like small cells or gemistocytic components, et cetera, and we check against a limited set of recognized genetic alterations. From there we pass into a whole genome analysis where we do, you know, deep analysis looking across the genome for differences in expression among the clusters, differences in copy number methylation, et cetera. So, an analysis of 200 million nuclei from the TCGA data from 162 GBMs, we found three clusters and we named these clusters after the functions of genes that are associated with them, so we have the cell cycle cluster on the left, the chromatin modifying cluster in the middle and the protein biosynthesis cluster on the right, and so these groups are prognostically significant. The chromatin modifying cluster has a worse outcome if you compare that to the other two groups, it's statistically significant. So the next thing we did, now that we have these clusters, is we need some type of visualization, so for each patient, we picked their cell that's closest to their morphology signature, so this is sort of the average looking cell for each patient, and we put these into groups, and so based on our pathologist's feedback, there are some differences. The cell cycle cluster is more hyperchromatic, it's darker, it also has a slightly larger size. The chromatin modifying cluster has more basophilic cytoplasm so it's kind of speckled, has the least intensely stained nuclei. And then the protein biosynthesis cluster is kind of a mixture of the two, it's sort of somewhere in between, less distinguished. So we validated this finding in a separate set of GBMs we obtained from our collaborators at Henry Ford, so we just looked at, you know, the clustering, again, doing a de novo clustering using the selected features from before and, you know, immediately we recognized there's a cell cycle cluster and the chromatin modifying cluster. There's also a third -- the PB cluster doesn't, you know, immediately appear; there's some kind of mixed component in between. And the survival trends remain the same as they did in the TCGA data sets, so this is encouraging. So now, just to go on to the associations -- we looked at, you know, several different things, so -- we looked at association with molecular subtypes and we found some things that are mildly significant, but nothing really definitive. The same goes for the ratings of pathology, so there's some small cell enrichment in the cell cycle cluster, some lymphocyte enrichment in the chromatin modifying cluster, but it's not really anything that's so significant, so specific to those things that it's definitive. The same goes for the genetics, so we wanted to dig a little deeper. This just drives that point home a little more, so you can see, for each cluster, each cluster as a bar here, you can see the distributions of the [unintelligible] subtypes among these clusters, so there is some variation but it's pretty close to uniform. So that doesn't really explain what we're observing here, but when we looked at the genome-wide analysis, we did find some significant results, so there are quite a few genes that are differentially expressed between these groups. I mean, we've subjected those to all kinds of ontology and pathway analysis. One thing I would note is that this chromatin modifying cluster does have the most hypermethylated samples, so there are 244 genes that are hypermethylated there compared to the other clusters. So that's a good validation, too. So one of the interesting things about the go [spelled phonetically] analysis of these gene sets is that the nuclear lumen was the most highly enriched term in all of those, so we're analyzing nuclear morphology and the genes that we pulled out when we compare these groups, the most highly enriched term is related to nucleus. So other terms that were enriched were, of course, the names for the clusters; that's where the cluster names come from, but also things that you would imagine could affect shape, like, you know, M phase or DNA repair. We also subjected these lists to an IP analysis, we found differences in cancer-related pathways, so one of the clusters we have ATM and TP-53 damaged checkpoint activation differences and kappaB pathway, when signaling P-10 and AKT signaling, so. So our conclusion is that, you know, maybe these clusters are not definitive, but it seems that there really is signal within these images that relates to molecular status and also patient outcome. So one of the things we're working on is to develop some more complex models to account for some of the heterogeneity that's in these samples, always with the risk of, you know, not wanting to overfit things, so we're developing more complex models so that we can answer questions better, correlate things better and have more specific results. And I just want to thank the TCGA for providing a terrific data set. Here's some of our collaborators in our group. There's Joe [unintelligible], our director, Dan Bratt is here, David Guttmann will be giving a talk later. If you can stick around, he'll be doing the radiology portion of this. And I also want to thank our collaborators at Henry Ford for providing slides for us, so that's Lisa Scarpachi [spelled phonetically] and Tom Mickleson [spelled phonetically]. I'll take your questions. [applause] Male Speaker: Very nice presentation. Comment first. It's really nice to see a relationship back to traditional pathology and where it's going in the future. There's a lot of information in the H and E that was discovered hundreds of years ago -- Lee Cooper: [affirmative] Male Speaker: -- that we're finally extracting. My question is, have you looked at the relationship between tumor nuclei and stromal cell components by different categories? Andy Beck has done some nice work recently in breast cancer -- Lee Cooper: Right. Male Speaker: -- using that type of analysis. Are you pursuing the same thing in GBM? Lee Cooper: So, I'm not a pathologist so I can't really comment about stroma [spelled phonetically] and the role in GBM specifically, but I'm familiar with Andrew Beck's work and we are looking at a more, you know, you can say more complex description of structure, et cetera, instead of just focusing on individual cells, so it'll be nice to know, you know, who lives close to the blood vessels and, you know, what's happening around the necrosis, et cetera. So we just need to sort of boost up some of our algorithms to be able to do those kinds of things. Male Speaker: Hi. Are you considering a supervised approach, finding individual features more correlative in prognosis, for example? Lee Cooper: So, it's interesting. When you correlate these features with prognosis, any one feature does not, you know, come out as significant, but when you do clustering analysis and you're in higher dimensions, they do seem to segregate in a way that provides some prognostic significance, so we'd be interested in other methods where we can do sort of a more interesting regression type analysis. Maybe those features would pop out. Female Speaker: I'll just let you know that we learned from you guys, so back in February of this year we actually changed the requirement to 40x. Lee Cooper: Oh great. Thank you. That's good news. Male Speaker: Okay. Thanks, Lee. Lee Cooper: Thanks.