Lessons Learned from 24 Completely Sequenced Aml Genomes - Timothy ley

Good morning. I’m speaking on behalf of my colleagues at the Genome Institute, Rick Wilson, Elaine Mardis and my colleagues in Genomics of AML PPG, but I also will give you a brief update at the end about kind of the score card of the AML project at TCGA. So with AML -- let’s see if this works. Not advancing. Okay, a few brief words about this disease and its genomics. So we don’t know much about it in terms of initiating mutations except for patients who have canonical mutations that I’ll show you in just a minute. For most patients with this disease, when we started this project very, very little was known about initiating mutations. One very nice feature of the disease is that the oncologists control the samples. The tumor tissue is very easy to access and access repeatedly and most of the samples are relatively free of contaminating normal cells without any additional purification. Another feature that we loved is we started sequencing whole genomes with the fact that many of these genomes are diploid. And also riffing on lessons of the past, low resolution genomic screening that is cytogenetics has been used for 30 years to classify these patients and to make treatment decisions. And all of us as clinicians who take care of these patients use this idea, the very important idea that favorable risk cases who have these canonical mutations, at least three of them, can be treated lightly up front and they’re going to do relatively well, by that I mean five year survivals of 50 percent. That some patients with complex cytogenetics have adverse risk, those patients need to be transplanted in first remission or they will die. But unfortunately about two thirds of our cases have intermediate risk. And these patients we don’t know what to do with, we don’t necessarily need new drugs for these cases but we need to know what to do with them. So we need better classification markers, biomarkers if you will, to separate these people into good and poor risk. In the first AML genome we sequenced, we found the major classifier of intermediate risk which is mutations in a gene called DNMT3A. So I’m not going to talk about that because the data has now been validated by many, many centers and it is the most important classifier as a single gene of intermediate risk and it tells us, I think, who to transplant. But I will tell you about some of the conundrums we faced as we’ve sequenced these genomes. In the first couple of genomes that were done, we encountered what I will call the founding conundrum. And that is there are hundreds of mutations per AML genome. And because we validated all of them with deep digital sequencing, we found out that all the mutations are in all the cells. That’s a problem. It suggests that they may have all risen simultaneously or that if they’re important for the timber, if they arose because they are all needed, that you would have to have hundreds of relevant mutations per case. Both of those seem impossible so how does it happen? This is the model that we prefer and it’s experimentally tractable and I think it’s now been proven. So the idea is that hematopoietic stem cells, the cell of origin for this tumor is a cell that lives for your entire life. But it spends most of your life in G0, divides perhaps once a week, once every few weeks, you only have about a hundred thousand of these in your body. These cells accumulate about 14 mutations per year. So that as you age a number of random innocuous mutations accumulate in these cells, until one fateful day when a true initiating mutation causes a cell to have an advantage, probably a very small one. That cell then begins to experiment with other mutations, progression mutations and once in a while unfortunately a progression mutation cooperates with this initiating mutation that causes the head. This actually explains most of our data. It explains why all the mutations have the same recount frequency, why they’re all the same cells because they’re simply captured by the act of cloning. The initiating event clones that critical cell and then all the mutations come forward. So when you sequence this genome, not only do you sequence the two relevant mutations, four and five, but the hundreds of mutations that are previously accumulated in that cell prior to its transformation. So, there’s a central question of course in this disease and most cancers. How many mutations does it take to cause the disease? So our approach to this was to take 24 genomes and sequence them completely but we selected these cases to be of two different types. One the M3 subtype of AML which is initiated by a very well-known fusion protein created by a translocation called PML-RARalpha versus cases with undifferentiated AML, or M1 AML, that have normal karyotype. We know little or nothing about these cases. So the idea is very simple. We have one that is caused by a very well-defined initiating mutation and we know it’s initiating because you can put it in a mouse and it causes a phenocopy of the disease and one where you know nothing. So based on this prediction that I just made about how the mutations would arise, we predicted that the total mutations per genome would be the same because most of them arose before the initiating event. They are simply background, benign mutations that are present in the stem cell. That most of the mutations would be random and irrelevant, that the M1 cases would have nalA mutations that would never be seen in M3 and that the cases would have common mutations which would be relevant for progression. Make sense? So what do we find? The key, how many recurring mutations per genome, that should give us the answer about how many mutations are needed to initiate M1 and how many are needed to progress. You can think about your predictions in the next few minutes and I’ll show you the answer at the end. Twenty-four genome pairs completely sequenced. Ten thousand mutations total found, average of 421 per genome, 308 mutations and 286 unique genes. About 10 per genome with translational consequences, about 10 in the exome in each case. We looked at these mutations and 66 additional M1 and 43 additional M3 cases. There are 21 recurrently mutated genes. In M3 there was only one, PML-RARalpha. In M1 we found 10 recurrent mutations and I’ll show you what they are in a minute. And 11 mutations were common to the two subsets. The total number of mutations by tier fit the prediction. They’re exactly the same in tier one, the coding region. Exactly the same for the two subtypes in tier two, the conserved region of the genome with potential regulatory function. And exactly the same in tier three and exactly the same for total numbers of mutations fits the predictions that they had to arise prior to initiation. If you plot them by genome space, they fit exactly as random events that occurred in genome space in tier one, tier two and tier three. The r-value for the M1 and M3 cases are both exactly one. These are random mutations that occurred prior to transformation in the stem cells. One thing we learned with deep digital sequencing is that this disease is clonal. Every AML case -- so I’ll show you more about this at the very end -- have founding clones where all the mutations occur in every cell. And many cases have subclones that are derived from the founding clone. And this is very, very important as we begin to think about setting relapsed AML and the number of clones in each of the cases is basically identical. Here are the recurrently mutated genes, you all are custom to seeing, this is the bookkeeping. These are the M3 cases, that’s PML-RARalpha on top. Here it is cooperating with FLT3 mutations, these are the M1 cases. So you can see that there are a large number of mutations, these are the ones that I already spoke of, the 10 that occur only in the M1 cases are very, very rarely in M3. And then these are the mutations that occur basically in both subsets, the so-called progression mutations that could cooperate with the founding or initiating mutations. One interesting hit that we go from this analysis was finding that all four members of the cohesin complex are recurrently mutated in AML. This complex is important for holding sister chromatid together, sister chromatids together and is to organize during S phase. And every gene that’s a member of this complex is recurrently mutated in AML, only in the M1 variety. So, how many mutations? Well, as I told you there are the same number of tier one mutations in these two kinds of genomes. If you look at the recurrent mutations with translational consequences and the 24 fully-sequenced cases, the number is zero to six for M1 and one or two for PML-RARalpha. And we extend this to an additional 107 cases, the number stays zero to seven and one to three. That’s how many mutations it takes to cause these diseases. So in summary, for this part of the talk PML-RARalpha is the initiating mutation for all of these M3 cases that are sequenced. There are a cluster of mutations that tend to occur together, NPM1, DNMT3A, this classifying mutation I told you about and IDH1. And then these are the other mutations that appear to contribute to initiation of M1. There are 10 mutations that are held in common in these two subtypes. And these are clearly mutations that are important, not for defining the subtype, but important for progression. So I just want to give a brief scorecard on the AML project that is being done by TCGA. This is just bookkeeping but there are some very interesting things that are about to come forward and there’s an incredibly rich database that is about to be explored experimentally. So we have 50 whole genome sequenced cases that are complete and completely validated, the ones I just told you about and another 26 with normal karyotype AML from any FAB subtype of the disease. Another 150 cases had exome parasequence at the Broad, transcriptomes were sequenced from these same cases in British Columbia by Mark O’Meara’s group and 192 methylation rates have been done from this set by Peter Laird and Tim Triche at USC. We’ve also just finished sequencing from among this set the 50 cases that have primary refractory or early relapse disease which will add richly to our understanding of this worst of the worst subset of the patients. The cases that we chose represent AML as a disease. As I told you, about half the cases have these intermediate risk findings, about 20 percent have translocations associated with good risk and about 20 percent have these poor risk cytogenetic abnormalities. So it’s a fair sampling of this disease as it occurs in the real world. These are just the data on tier one, two and three mutations in the patients with normal karyotype AML versus the APL or the M3 subtype. You can see as I told you, the numbers of mutations are about the same. The thing that determines the total number of mutations per genome, you should be able to predict? Male Speaker: Age. Tim Ley: Age. There are a number of new, recurrently mutated genes this is a partial list of the recurrent mutations that are present in up to three percent of cases. Many of the names of these genes are new, not all of them are completely validated -- this work will be done within about a month. And then you can see that there are patterns that begin to develop in terms of mutual exclusivity that are being explored by Ben Raphael, I’ll put a plug in for his talk this afternoon, who will be showing you data about patterns of exclusivity. There are beautiful data that have come from Vancouver where they’ve used RNA-seq to find a number of well-known translocations for this disease. And remarkably, the number of private translocations that create novel fusion proteins in many genes that are well-known to us but have not been previously identified as translocation partners in AML. Many of these are in-frame fusions that create novel proteins with novel functions and many of them are in genes that have never been seen before mutated in AML, like for example, DNMT3B. Finally, in work that Tim Triche and Peter Laird have done, they have done a beautiful job of assembling the methylome data for these on that Illumina 450K array. With 192 cases you can see these gorgeous patterns that seem to privatize individual groups of AML cases. I think all of us predicted that there will be very, very significant mutations that would predict these individual patterns of methylation. And in fact, that DNMT3A would be the primary predictor, along with the mutations Peter spoke about in IDH1 and 2 and TET2, which occur in about a quarter of cases of AML. As we went and looked at the classifying mutations, basically none of them classify these methylation phenotypes. There is one cluster of mutations that occurs together with DNMT3A, FLT3, IDH1 and 2 and TET2, the common mutations here were the hypomethylation phenotype but if you look up this line, all of these mutations occur in combinations. And none of them predicts this phenotype, so there’s much to learn. The last thing I want to say is that there’s much more to the digital bookkeeping that exists when you look at this kind of data. Deep digital sequencing is a clinical tool. It tells us a great amount about the biology of this disease, by looking at deep digital data. We’ve been able to deduce the clonal evolution of AML at relapse. As I told you, many of these mutations occur in the cell that is transformed. Mutations in many genes contribute to the initiation and progression and then subclones arise from these founding clones in most cases of AML. These subclones have different mutations and different behaviors after therapy. Some of them completely disappear with therapy. Others come forward achieving quantal births of mutations that clearly contribute to relapse. Understanding this clonal behavior at relapse will be extremely important in terms of predicting responses to drugs and patients and of course defining new therapeutic approaches because what we have to do in this disease is remove the founding clone to cure the patients. Because every time we look, the founding clone remerges. I just want to thank our patients, without them there is no study. Our funders, including Al Siteman who sequenced the -- who funded the sequencing of the first cancer genome while no one else was very interested. And finally Rick, Elaine and Li Ding who lead the work in the genome institute at Wash U. And my colleagues on the Genomics of AML program project grant, most particularly John DiPersio who leads our oncology group. Thank you. [applause] Male Speaker: Thank you, Tim. We have time for a few questions, Eric. Male Speaker: Great, great talk. Just fantastic, I had two specific technical questions. Tim Ley: [affirmative] Male Speaker: When you say you know how many mutations there are, you mean you have a lower bound? Tim Ley: Yes. Male Speaker: Right. Tim Ley: Exactly. Male Speaker: Because there could be more -- Tim Ley: There could be more -- Male Speaker: -- than the number that had been looked at so far. Tim Ley: But this sets the floor. Male Speaker: This sets a floor, good -- Tim Ley: Okay -- Male Speaker: -- I wanted to check. And the other was just a tiny thing, I couldn’t help but noticing on the list the CUB and the sushi domain protein and the mucins [spelled phonetically] come up -- Tim Ley: That’s right -- Male Speaker: -- and both of those are in this class of late replicating probably -- Tim Ley: Absolutely -- Male Speaker: -- [unintelligible] genes. Tim Ley: Yep, absolutely correct. But there are many more that aren’t. Male Speaker: [inaudible] Tim Ley: Right. Male Speaker: [inaudible] Tim Ley: Yeah, absolutely. Male Speaker: [inaudible] Tim Ley: Yeah. [laughter] Tim Ley: They’re all over the place and so is titin [spelled phonetically], right. What is odd though that if you simply apply a significantly mutated test for these genes which is very rigorous in terms of things and taking size into account -- Male Speaker: They don’t go away -- Tim Ley: -- they don’t go away. So, I think the reason I leave them on these slides even though my informatics colleagues tell me, “Take them off, they’re not significant,” is I have an open mind. They don’t occur in every case. They should, based on their size. They sure as hell don’t. A lot of small genes are recurrently mutated and they show up again and again and a lot of big genes like olfactory receptors [spelled phonetically] never show up in these cases. So it’s hard to know what it all means. Yeah. On big families, that’s what I meant to say. Big, big gene families that you might expect to be recurrently mutated don’t show up at all. Male Speaker: Matthew. Male Speaker: Tim, great talk. I was struck by the PML-RARalpha co-mutation with FLT3 -- Tim Ley: [affirmative] Male Speaker: -- and wondering about the FLT3 wild type set. Whether from gene expression or other evidence you can get any clue for what might be a parallel driver mutation to FLT3. Tim Ley: Critical question. We’ve looked very carefully thinking that there must be other tyrosine kinases that substitute for FLT3 in these cases since the combination is so common, does not exist. There are clearly other cooperating mutations in these cases that aren’t a member of this class. So there’s distinct heterogeneity even among PML-RARalpha but it does not explain outcomes. We don’t have the answer, but it’s an important question. Male Speaker: Okay, thank you Tim. We have 15 short minutes for coffee break, so --