Tip:
Highlight text to annotate it
X
Male Speaker: Please.
Sam Ng: Okay. So, first of all, we want to be able
to identify driver mutations, and, in particular, we would like to know which mutations were
gain of function and loss of function in order to better understand disease, mechanism, and
treatment. There are many mutations out there that we don't know much about. They tend to
be these lower prevalence mutations, but they're very much recurrent. And I'm going to talk
about a method that I've been developing to infer gain of function and loss of function
using an integrated pathway approach.
As a quick introduction to paradigm, what paradigm does is it takes in functional genomics
data, such as copy number expression as well as sets of pathways in order to infer gene
level activities, and kind of a quick example, if we were to infer higher activity of, say,
MDM2 with expression data, and a copy number, we would therefore predict less activity for
p53.
And loss of function and gain of function in terms of pathways, kind of what we would
expect is that for loss of function events, there, the genes regulating it might be active
in trying to turn up its activity, but downstream, the gene is not functional, so the downstream
targets are off, and the opposite would be true in the case of a gain of function.
So, I'm working on, my approach is called discrepancy analysis for now, but what it
does is it tries to leverage the difference between the signal upstream to downstream
in order to infer these mutations, and the way I do that is I run paradigm in two modes,
one where I run it based on the downstream information, and one with upstream, and then
I infer a discrepancy between the activities.
Now, I'm going to give an example of a real mutation, RB1 and GBM. This is a loss of function
mutation, and this is a circle plot. And what it's showing is that the center ring shows
you with the black ticks, those are mutated samples of RB1, and the rings outside of those
expression, the inferred upstream activity by paradigm, the inferred downstream by paradigm,
and then the discrepancy score, which is just the difference between the inferred downstream
and upstream activity. And what you will notice is that when RB1 is mutated, paradigm is inferring
higher activity in the mutated case for the upstream information. But, downstream, it's
inferred that it's inactive, and this leads to a negative discrepancy score, tracking
with the mutations.
Now, if you want to go back to the original data, I'm showing the network used to infer
this RB1 mutation, if you look upstream, RB1's activators, such as CDKN2A through C are more
active in the mutant case, and the inhibitors are less active. So, you can see the more
red tracking CDKN2A with the black, and the opposite for, say, CCND2, and downstream we're
inferring a lower activity.
So, now, for each sample, we have computed a discrepancy score, but we have two populations,
the mutant case and the non-mutant samples. Showing the discrepancy scores between the
two groups, you can see there's pretty significant significance in the discrepancy scores for
the mutant and the non-mutant samples. Shown in red are the mutant samples. So, a more
negative discrepancy score is more indicative of a loss of function. I computed T statistics
between the two distributions, and that's what I'm calling a signal score.
The next thing you want to ask is now that I've called a possible loss of function, how
significant is this signal score? And, to ask that question, what I do is I have a background
model in which I'm keeping the same networked apology [spelled phonetically] but permuting
the genes used to infer the mutation. So, the network is the same except the data provided
is permuted out of all the 20,000 genes in the data set.
And, given this background model run this time, this, shown here is for 100 times. I
have a background distribution of signal scores, and I compute, I compare my observed value
against it, and it's pretty significant as a loss of function call.
Now, I'm going to show another example, but this time a gain of function of NFE2L2, or
NRF2 in lung or squamous cell cancer, and this time you see that there's higher downstream
activity tracking with the mutations, and that leads to a positive discrepancy score.
Shown here is the NRF2 network. You can see keep one, it normally aids in the degradation
of NFE2L2, but it's more active and the mutant case, and it's not repressing NRF2, because
you can see that downstream, such as NQO1, the expression readout is quite strong, tracking
with the mutations.
Again, when you look at the difference between the two distributions of discrepancy scores
for the non-mutant and the mutated samples, there's a pretty significant difference between
the two, and, again, with the background model, it's very significant.
So, the next thing I want to do is show that my approach was both sensitive as well as
specific, and in order to do this I ran on a set of potential passenger mutations from
colorectal cancer, and the way I selected those genes was I used genes that had MutSig
Q values greater than .5. and, it turns out that since those genes aren't that well annotated
in our pathways, there are four genes that had enough pathway information to run my analysis
on, and shown here is one of them, PRKDC, and you can see is there isn't significant
difference between the discrepancy scores of the non-mutated and the mutated samples.
So, in order to summarize my method, I've shown that discrepancy analysis can differentiate
between gain of function and loss of function mutations using pathway information, and I've
also shown that this works in the case of RB1 and GBM and NRF2 and lung squamous as
well as showing a negative control.
So, this approach has potential in that if we can potentially identify gain of function
mutations by running my analysis on all mutations in a cancer cohort, if we identify novel gain
of functions, this could provide insight for drug treatment. And, actually, that went pretty
quick.
[applause]
Male Speaker: We have time for some questions. Chris?
Male Questioner: Hi. This gain of function, loss of function,
there's an interesting subclass, perhaps a minority, with your switcher function.
Sam Ng: Right.
Male Questioner: [unintelligible] actually is a very nice example
of that. Can you extend your method to get at those?
Sam Ng: Potentially on a, that might be able to be
done on a case by case. It would be difficult to do a switch of function since we run our
analysis on fixed pathways that we have annotations for, so predicting switch of function would
be difficult.
Male Speaker: Please.
Male Questioner: So, a very interesting approach. Did I catch
you right, or did you say that all but four of the genes that are annotated in one of
the, in your paradigm networks, or the networks that are inputs the paradigm, had a Q value
of less than 0.5.
Sam Ng: Greater than. So, our pathways are fairly
cancer-enriched, so, I mean, not well studied genes that will have not significant P-values
in MutSig are less likely to be our pathways, and we're always expanding our pathways to
include more genes. So, later, our coverage may be greater.
Male Questioner: So, you think that just represents some kind
of literature bias, or --
Sam Ng: Right.
Male Questioner: -- or that's actual biology that you're discovering?
Sam Ng: It's most likely a literature bias. Our pathways
currently have about 25 percent of the genome.
Male Speaker: Linda?
Female Questioner: If I understand you correctly, you are collapsing
all the mutation onto one gene and consider it as a whole in assessing whether there's
a gain or loss function or neutral event? Is that true?
Sam Ng: What do you mean exactly by collapsing?
Female Questioner: Well, most of the mutation we see are not
hotspot, you know, always exactly the same --
Sam Ng: Oh, okay, right.
Female Questioner: -- mutation in a gene, so are you collecting
all the mutation that occurs on a gene --
Sam Ng: Right.
Female Questioner: -- and treat them as one entity in your analysis?
Sam Ng: Right. So, all the mutations that are not
silent mutations --
Female Questioner: Right.
Sam Ng: -- I am collapsing them together in this analysis,
but you could potentially imagine that I could split up mutations by their different domains
as well and just run them separately. I can treat them as separate mutations.
Female Questioner: I think that would be useful, because there's
certainly a lot of, not a lot, enough functional data to show that different amino acid changes
in the same gene can have opposing effect.
Sam Ng: Right. I completely agree. Thank you.
Male Speaker: Next, Doc [phonetic].
Male Questioner: So, when you are calculating this discrepancy
score, are you differentiating between activating or inhibitory relationship that the gene might
have with its downstream partners?
Sam Ng: Right. So, the paradigm model can handle activating
and inhibiting links, so, I mean, the logic is just kind of switched. When you see it,
in the case of RB1, I was showing that RB1 had inhibitors, and you can see that the logic
is switched for those genes. When the RB1 was mutated, those repressors were less active.
Male Questioner: Right. Sam, I wonder if you could comment
on why the distribution of the discrepancy's course in the mutated samples is bimodal?
Sam Ng: In RB1, you noticed that.
Male Questioner: Well, I think in both.
Sam Ng: Oh, okay.
Male Questioner: IT was, I think it was in both, but in RB1,
it was more pronounced.
Sam Ng: Okay. One possible explanation is though that
mutation could have been a silent mutation even though, I mean, it could have been a
neutral mutation even though it was not silent.
Male Questioner: Okay. Thank you.
Male Speaker: Okay. Thanks a lot.
Sam Ng: Thank you.
[applause]