Patient - Specific pathway analysis using paradigm identifies key activities... - Josh stuart

Josh Stewart: I want to thank the organizers for inviting me. So I'm going to talk about our method called Paradigm which integrates multiple types of data on patient samples for inferring what's going on in these cancers. So as folks know in TCGA, we generate lots and lots of data and it's often referred to as a flood. The Broad calls their system Firehose [spelled phonetically] for an appropriate reason. And my point is that when you participate in these projects you often want to do lots of different types of comparisons from comparing expressions and methylation to figure out why something's not expressed or, you know, looking at the copy number and expression and methylation all together. This quickly gets out of control. You have lots of combinations of things you want to look at and it can be overwhelming. More importantly, when you're thinking about a gene and trying to figure out what's going on with that gene is it active, is it not active, you've got all these different pieces of data telling you different things, you feel like you're at this stop light and you don't know whether to go or not. Well, at least that's how I feel and this is often how many of us feel [laughs]. This is what it makes you want to do. This is your brain on all this types of data. So our particular approach is to say let's use a knowledge based approach and the analogy I like to use is you're kind of like a detective or a car mechanic in this example and each patient is a different accident, let's say. And something different went wrong and some things are more serious than others here. And if you could try to do data mining on these car wrecks, if somebody handed you a ream of data, how fast the car was going, the direction, what people were saying. Some of it's relevant; some of it's not relevant. You're going to be better off if you use knowledge about how the system works and I like car talk so I'm showing Click and Clack here, right. People call them because they know a lot about cars and can figure out and diagnose the problem. And this cartoon shows a radiator running off and the mechanics looking in the engine and saying, "I know what the problem with the car is, that you don't have a radiator." Now you laugh, but with this data set, you know, it took a little bit of knowledge in this case to know what was missing. So in cellular systems obviously we have put together at least some of the circuitry and the machines inside cells and so we should use those. And I'm going to show you a system that defines a computational model to represent these types of systems and we benefit from all these efforts out there and there are many I didn't list, that's the ellipses at the end, we've drawn from Reactome Kegg, BioCarta, NCI PID, many different institutions and our favorite of course combines all of them Pathway Commons from Memorial Sloan-Kettering. And so we try to suck in all that data to learn something about what's going on in a cell. So to motivate why we want to do this beyond just data fusion just think of a simple example, we've got a transcription factor and you're looking at the expression of the transcription factor and there's let's say three different transcription factors shown here. You know, you've got two that have high expressions shown in red and one that's lower expression. And we know that expression isn't everything and so it's almost a teleological argument, but how do you figure out whether something's working or not? How would you figure out that an enzyme's working? You know, even if you had magic goggles and you could look inside a cell and see that it's bumping around and moving in a cell and chewing things up, you're going to look at its secondary effects. You're going to look, did it actually metabolize substrate? Or did it, it's a kinase, did it actually phosphorylate a target? And for a transcription factor, is it turning on its targets? Right, and so that secondary evidence tells you something about the activity of the transcription factor so in this case you assume or you infer that the transcription factor's on and that might confirm your expression evidence. Another case you might see that oh, well, the targets aren't doing anything downstream of the factor and in this case you would think it's off either the post-transcriptionally or even translationally. We didn't activate this protein or it's not localizing correctly or there's a mutation that stopped blocking its function or its co-activators, right, aren't around. On the reverse, you could have a low level of expression of factor and yet it's still enough to have potent transcriptional activity. So you want to look around the neighborhood, is the argument here, to figure out what's going on in these things. And one more, so that's one piece of the -- of the puzzle is to look at neighbors. And the other idea too is, you know, in this previous example we infer that the factor was on because of its downstream targets. But suppose I ask Gady [spelled phonetically] to give me JISTIC plots now this is a different type of data, copy number data, and all of these just serendipitously, all the targets are amplified now. And so I could explain a way that those over expression via amplification, and so I'm less likely now to think that the factor's on. Maybe I'm still -- maybe I still think, you know, over my prior expectation it's on. But it's not as high anymore because I have another piece to explain, the up-regulation of those targets via assist regulation type of machinery. So to model all those -- two pieces of information were also standing on the shoulders of giants here. There's been lots of development in the '80s and '90s and even currently by seminal work in the field from Judea Pearl and Heckerman in the early '90s and more recently by Daphne Koller and Nir Friedman and Aranci Gal [spelled phonetically]. There's lots of people in this list and I would recommend folks read this really nice review article by Nir Friedman in Science in 2004, so it's getting dated, but it's still a very nice read. So these Bayesian networks and probabilistic graphical models that they describe give us a very nice way of modeling lots of different data and dependencies and we can -- we can learn something from data where we might have had a knowledge bottleneck before. And so just a simple example here, let's go back to the diagram we had from the -- from the nice work from Sloan-Kettering and the GBM study and we have a oncogene MDM2 that is known to inhibit p53. So there are two parts to this system that we model, one has to do with the regulation of MDM2s activity and the other part has to do with the interaction between it and p53. And just as a quick toy [spelled phonetically] example, the model that we have, so when you see our activities for genes it's actually a little bit more of a rich representation that looks something like the central dogma for a gene, right? You could -- you have a certain number of copies in the genome, you can express it, you can have a certain level of protein and a certain activity in that protein and all these variables are beliefs that you infer from data and these little black boxes show you constraints that help you infer those beliefs from data or from other beliefs in the system. And you can propagate this information to infer something about a higher level thing like apoptosis or activities for these genes. And that's what we use downstream for our downstream analysis. And so the big picture looks like we take a cohort of patients, various types of data, run it through our pathway models and then we produce one matrix that we can now do analysis on. So we don't have to think about all these different modalities anymore, we can just think about is the gene active in the sample and provide this new matrix for analysis. So for the ovarian study, the obvious signature here from the paradigm analysis was this FOXM1 signature so when we zoomed in on this, all the patients pretty much had a up-regulation of this known mitotic regulator, FOXM1. The slightly more interesting story about it is that it has two isoforms and one part feeds into proliferation, the other part feeds into DNA repair and there's a lot of disruptions in the genome and I know all the ovarian samples, they're getting constituent activity signaling through like ATM and ATR, turning on genes like FOXM1 that if they're not being spliced correctly are promoting two different, very opposite kinds of things that you want to happen in a cell, both you know, this proliferation switch and this DNA repair switch. SO FOXM1 also regulated BRCA2 for example. So, very interesting story surrounding FOXM1, if you take the pathway activities and you try to define subtypes for the ovarian samples then the good news there was that we could actually start seeing a delineation of meaningful subtypes so this purple cluster shows you that they have slightly better survival patterns than the rest of the patients. We've recently worked on the colorectal paper led by Rajinder Kaul [spelled phonetically] and David Wheeler [spelled phonetically] and in this case the story isn't so much FOXM1 but activated MIC throughout and that's an interesting piece of information. As we see in the mutation data and other types of genomic perturbations when TFG-beta signaling pathway genes are mutated and those all impinge on this mis-regulation of MIC and that also bears out in the pathway analysis. And so one other type of analysis that we're doing with the pathways is we can take two groups of samples or patients and look for markers of one subtype versus another say, and then hone in on sub-networks that are markers for a particular cohort. And we're working on this for the luminal basal comparison. So in the breast cancer model and just to show you, this is the closest we get to the dreaded hairball, but you can -- you can see that you know there's so blue is more expressed or more active than luminal. And you can see the expected sort of ER signaling pathways and then you have some other intriguing pathways among the proliferative ones for basal shown in red like F1-alpha, for example. So, the way we can use that hairball is to do something like a master regulators analysis like Andrea Califano [spelled phonetically] likes to do with ARACNE. You can look upstream in this example of a -- of a basal marker such as FOXM1, like I showed and sort of by chain of reasoning, up the regulation hierarchy you see that there's a polo kinase. And so the prediction there is that basal cells will be more sensitive to a polo kinase inhibitor. And this actually pans out in a cell line model shown in Joe Gray's [spelled phonetically] lab with his cell lines. So this plot here shows you sensitivity to a polo kinase inhibitor for basal and claudent [spelled phonetically] lows contrasted against those in luminal cells. And the reverse is true as well. You can look up a marker for luminal, like a luminal hub and in this case it was an HDAC. And so the prediction is that an HDAC inhibitor would be more sensitive in luminal cells and that's what turns out to happen in these cell line models. And you saw a nice example yesterday from Sam Ng [spelled phonetically]. Just to go through that real quick because I wanted to show you one more result that Sam didn't have time to show. So he's developed a clever method where you can run our pathway analysis twice. One where you connect the gene downstream, to its downstream targets, infer an activity for it, another where you connect it to its upstream targets and infer an activity. And just look at the difference to get what he calls the discrepancy in the activities that are inferred. And he showed you an example, sort of a positive control for Rb. You can see that the mutated cases, he's seeing a lower discrepancy which corresponds to a loss of function event. And he showed you the pathway surrounding these things. So we've tried this for a few positive controls and he showed you p53. And you can kind of squint and see that for the cases in red around the circle plot, the tick marks are patients, sorry, I didn't mention that, you can see a lower activity being inferred. And so I asked Sam late last night actually, "Can you please run this for the lung squamous results?" And as you saw before he was predicting for NFE2L2 this known oncogenic gene that he's getting a positive discrepancy. And there are 30 mutations in CDKN2A and consistent with, you know, other deletions, homozygous deletions in CDKN2A, he's predicting loss of function. So that's interesting. But now the power is, and these are sort of for more frequent like events, but you can now start actually drilling into some of these more lower frequency events and there are some intriguing stories I think in there. But and I wanted to just point out that some of its highest scoring discrepant genes now are not the most frequent, right? So you have a -- you actually have a HIF, a hypoxy-inducing factor up here in seven samples. Why would that be? And among these up here are going to be possible new targets that you could go after for your drugable genome, for example. So we even have a map, kinase-kinase up there that might be worthwhile. And on the other end of the spectrum, there are some other loss of function events that we would, might want to pay attention to. So you might ask, "What do you do if you don't have good pathway models for genes? How can you infer activity? Or do these mutations mean anything?" You can plot them against clinical information. And so this is just sort of an overview of -- you can show some phenotypic information against these pathway activities and infer a connection between mutations or phenotypes. And just really quickly, since I'm almost out of time, we've done this for -- piloted this in the colorectal study and you can cluster the mutations based on these signatures and you can see you can look up that APC and p53 tend to have the same correlations in the colorectal study, for example. And it confirms that APC mutations are correlated with MIC activity, in this case anti-correlated with the repressed targets of MIC. And on the other end of the spectrum you have TGF-beta pathway mutations, so those cluster together. And in the middle you have RTK and PI3 kinase pathway mutation. So, the obvious idea here is if you have a mutation in gene X and it has a -- and it looks like it's associated with the same activities in different, in possible different patients, perhaps it's also acting in the same pathway based on this type of association analysis that Ted's [spelled phonetically] doing. And so I'm basically out of time. I'm going to skip to the end. Obviously we want to use these to look across multiple cancers. The pathway activities give us a way to do that and we're working on pan cancer analysis, a basal comparison to ovarian for the breast work and so one. So I hope I showed you that we have a nice model for integrating a lot of different data sets. We use knowledge about pathways. We're trying to expand that with predicted interactions now. We can stratify patients with that, find predictive sub-networks and so on and use it to predict hopefully more of these rarer mutations. And the beliefs allow -- the inferences allow us to connect cancers across different data sets. And hopefully, the last slide that I just skipped there, it was just trying to make a point that we can connect subtypes together, maybe get a clue about therapies. So, I wanted to just say a special thank you to the Broad team here. They've got PARADIGM working and Firehose [spelled phonetically] and this is not a trivial feat. And a lot of these big network methods, by the way take a lot of CPU time to run so this is really nice that it's going to put the results in the hands of public actually. And so you don't have -- you don't have to go off and implement these yourself. And this is my group that worked on the integration analysis. I've highlighted the work of the folks circled there, especially Sam Ng who you saw speak earlier. And this is work in collaboration with David Haussler who actually heads the whole team and Chris Benz and Jane Ju [spelled phonetically] ran a tutorial yesterday and she runs the engineering staff. So thank you and I'll take any questions. Sorry I went a couple minutes over. [applause] Male Speaker: Time for one quick question for Josh. Josh Stewart: Crystal clear. Male Speaker: No, okay, well I'm sure he'd be happy to take it up over coffee if something emerges. So thank you, Josh.