Mutations - Great and small

Well, good evening ladies and gentlemen, especially if this is your first visit to the Royal Society, and a particular welcome to this evening's Francis Crick award lecture. I should introduce myself, I'm Jean Thomas, the biological secretary of the Royal Society and I have one duty to do, which is to ask you to please switch off your mobile phones because the lecture is being webcast. The Francis Crick lecture given annually. Preference is given to genetics, molecular biology and neurobiology and also to fundamental theoretical work which is the hallmark of Crick's science. The lectureship was endowed by Francis Crick's friend, Sydney Brenner, also a friend of the Royal Society and the first lecture was given in 2003. The recipient this year is Dr Matthew Hurles, group leader at the Welcome Trust Sanger Institute at Kingston, just outside Cambridge. Matthew was undergraduate in Oxford then went to Leicester to do his PhD with Mark Joblin on the population genetics of human Y chromosome polymorphisms, and for those who don't know, the Y chromosome is the male specific chromosome. During his subsequent work in Cambridge on population genetics and molecular evolution he established the molecular mechanism underlying a recurrent deletion of part of the Y chromosome which causing male infertility. Ten years ago mat join the sanger institute in Cambridge where he is currently leading efforts to apply genome – and to understand the factors influencing the rates of mutation. He was chosen to be the recipient of the 2013 Francis Crick award for his outstanding contributions to understanding structural variation in the human genome, the mechanisms that caused this variation and its medical and evolutionary consequences. So ladies and gentlemen, I'm very happy to present Dr Matthew Hurles to deliver his lecture on the interesting title, ‘Mutations: Great and small’. [applause] Well, thank you very much Jean. It is a very great honour to give this talk, and it was also a very great honour to meet Francis Crick as I did as an undergraduate when he came to give a talk at the Oxford union and I was writing some very poor articles for the student newspaper at the time on science, and had the pleasure of speaking with him so before this talk I tried to look through pictures of Francis Crick, most of them involve him looking at a double helical structure. But the photo I remember was this one here, drink in hand, very convivial, being somewhat gossipy about other scientists in the room, and so it was a great pleasure to speak with him and he was very pleased when I saw that the Oxford newspaper doesn't have any online archives so I can't show you and you cannot see the rubbish that I wrote about what he spoke. So there is two things I really want to talk to you about today, are the work that we've been doing on gains and losses of DNA that we all have in genomes and the secondly in which mutations as we pass on our DNA from one generation to the next, from parent to child. The human genome can be seen in a bird's eye view down a microscope and what you see here are the 23 pairs of chromosomes that all of us have and there are two very special chromosomes down the bottom here, the X chromosome and the rather stubby portion of the genome that I did my PhD on. There are a few things that one can tell just from this view of the human genome. The first is we can tell it is human. There are 46 here, our closest relatives, the chimps and gorillas, they have 48 chromosomes, but the banding pattern of the chromosomes is recognisably human. The second thing that we can tell is that this is a male, that the presence of the Y chromosome which determines maleness, all of the men in this room have an X chromosome and a Y chromosome and all of the women have two X chromosomes. The third thing that we can tell about this individual is really from what is not there. This individual does not have downs syndrome or Edward syndrome or PA T A L syndrome, the three human diseases that are the largest mutations that we know about where there is a whole extra copy of a chromosome, which we would be able to see at this kind of bird's eye view. Those are pretty much the largest mutations that we see, and all that we can really glean about this genome from the bird's eye view. Historically we've view, the bird's eye view and the worm's eye view. This is what the 3 billion letters of DNA in the human genome look like close up. There are four of them, A, C, T and Gs, and it is the order of these that is the code for life. This particular snippet here is not actually a random set of A, C, T and Gs, this is actually a portion of the gene fox P2, which was described by a previous Crick lecture, and actually, one thing that's notable about this gene is even though you are looking at a small portion of the gene you can tell that this comes from a human as well, because there are two key spelling differences of single letters that humans have that are also specific to humans that other mammalian species don't have. The question exists, what are we missing by looking at the genome from just these he two views, the top down bird's eye view and the worm's eye view. By looking at the analogy of geography we can look at what we might be missing. These two pictures here are pictures of two cities in the UK. One is the city Cambridge, where I now live, where I did my post doctoral training, and the other is the city Leicester where I did my PhD studies. Cambridge is renowned for its architecture, it is a beautiful city, Leicester, when I first went to Leicester I read the ‘lets go’ guide to Britain and its description of Leicester's architecture was that it unfortunately suffered – and industrial decline. But that's not immediately obvious from this view here. Close up, this is, on the left is a snapshot of the building that I did my PhD in and on the right I did my Post Doc in. One can hazard a guess from this worm's eye view of geography which of these is the city with the beautiful architecture and which is not. But you may not be very confident about doing that. That's because between these two scales we're missing all of the architecture, how are these components put together and with these two views we're probably missing the architecture of the genome as well. What I introduced to you in the beginning was two opposite ends of the scale in which a genome, all the DNA in a cell, can differ between individuals. So on the left-hand side you can see hopefully in red, there is a single base change that differentiates the top sequence from the bottom sequence. That's the smallest type of change we can have. And on the right-hand side, we've got an addition of an entire chromosome. That's the biggest kind of change, as we see in downs syndrome, but in between we've got, there is actually a continuous distribution of variation between these scales. So at the bottom end, there are losses of a few bases, gains of a few bases, but with the worm's eye view that I showed you before, we can see this. What we cannot really see at either of the scales is the kind of variation where there are large segments of DNA that are lost. In this example, or gained in this example. And this is the type of architecture of DNA that we're missing when we look at it down a microscope, or if we look at the individual sequence of the bases close up. So if we think about any type of variation, we need to think about what is its likely impact? Do we really care if we've lost segment or gained segments of DNA? Well of course our chromosomes contain a linear combination of genes, and those genes are in code, the proteins which are actually the molecules in the cells that do all the work, and this flow of information in unidirectionally from genes to proteins, is what Francis Crick called the central dogma of molecular biology. But these genes do not occupy most of the genome. Actually, there are about 2,000 genes that we're aware of in the human genome that code for proteins and they only occupy between 1 and 2% of the genome. 98 to 9 will% of the genome does not code for genes. We're still in the very earliest stages of creeping out of our ignorance of what these sequences that are not genes, what they do. What we do know is not all genes are turned on in every cell of the body and that's why the cells are different. Each cell expresses a different set of genes, and the sequences that lie between the genes are clearly involved in the regulation of those genes, deciding which genes to turn off when in what cells. So if we think a little bit about how this form of variation that we haven't really been able to ascertain previously, these large segments that are gained and lost, if we think about what kind of effect they might have on the genes, well, it is reasonably obvious that if we took a cartoon of a chromosome on the left with three genes in here, then we might have variance that removed an entire copy of a gene, the loss of gene B in this example, or we might have variance that gave us an additional copy of gene B, as in this example. There are other forms of variation which I'm not going to be talking about, which don't change the numbers of genes, but change the orientation of genes, and I'm going to be focusing on the gains and losses of DNA that collectively are called copy number variation. Now, the mechanisms that generate this copy number variation, they are actually blind to where the genes are in the genome, so I have shown you examples of removal or gains of copy of a gene, but equally, there could be removals and gains of other portions of the genome, and it is not simply that one can lose or gain existing functions in the genome, but one could potentially generate new genes, for example, here, the bottom is the deletion which is making a new gene, it is the hybrid of gene B and gene C, so there is potential not just to toggle on and off existing forms of function, but actually generate new forms of function, and we can see in the evolutionary history of our genome that many of the genes in our genome have undergone this kind of process. So in thinking about this talk, I was very much reminded of a quote from Sydney Brenner who you heard about earlier, and Sydney, ever since I learned this quote, it has really stuck with me, progress and science depends on new techniques, new discoveries, and new ideas, probably in that order, and the key thing is that last segment, because science is often portrayed as a very hypothesis testing kind of approach. You think of an idea and you go out and do the experiment, but that's to underplay the value of exploring the new advice that's on nature that new technologies give you. I think the examples that I'm going to tell you about today very much exemplify what Sydney was talking about. Because we were very fortunate that new techniques came along about ten years ago which enabled us to probe the genome to look for the segments of DNA that might be gained or lost. This technology involved this picture on the left, is what is known as a micro array, so it is a glass slide, a microscope glass slide which has small portions of our DNA spotted on it, and the colour of those spots is telling us whether there is more or less of that segment of DNA in a given individual. And most of them are yellow, which tells you that most of us, for most of our genome, have exactly the same amount of DNA, but there are a few spots in green or red which are regions which have been gained or lost. By applying these technologies and applying them to looking at normal individuals, normal populations, we could make new discoveries. This picture is a map of the gains and losses of DNA that we've found in normal individuals, so you will hopefully recognise the chromosomes here, and every blue line tagged on to those chromosomes is a gain or loss of a DNA in the first map that we generated of these losses. This was in collaboration with my long-time collaborator Nigel Carter who has since retired. He really drove this new technology, and together we made these new discoveries. So the important thing about this is it suddenly gave us an insight into the fact that actually, everyone in this room does not have the same genome. We don't even have the same number of genes. We probably vary by several hundred genes between us, each of us not having some, and having others. And this was much more extensive than we previously thought. Some of these variants are very common in the population and some are very rare, but there are hundreds of genes that are affected. So when we discovered this, we were somewhat disturbed to see the headline in the newspaper, the book of life is rewritten. Slightly disturbed about this because this suggested that we had done some kind of genome engineering. The book of life had done nothing at all, it just sat there, we had just re-read it and our understanding of it had got better. So it was striking that there were all these differences, that we're somewhat more different from each other than we thought, but the question remains, should we really care about them, are they biologically important? Now one can investigate this biological importance at different scales of biology. One can look at cells, one can look at individuals, and one can look at populations. So by looking at cells, I don't have time to go into all of it, but one can essentially say I know this cell does not have a copy of this gene, or only has one copy of the gene rather than two copies, because we should have two copies of every gene, one inherited from either parent. And one could go in and say well, do we see less of that protein that that gene encodes in that cell? Typically we do but not always. If we have extra copies of that gene do we see more copies of that protein? And typically we do but not all the time. So there is clearly a molecular change that results in the cell from that change in the DNA. But we still don't know whether the cell, or even the organism cares about that molecular change. There are molecular differences between all of us. What we really care about as individuals is, are these going to affect our health, and so we can – I will talk a little bit later about investigating whether this type of variation plays an important role in disease. But first, talk a little bit about populations. So, if a type of variation is important, then we should see it under natural selection, under Darwinian selection, winowing out harmful mutations and potentially increasing the frequency of beneficial mutations. So what happens if we look at CNVs in this way? So, we can think about purifying selection, which is the selection that's winnowing out the harmful mutations and we can think about positive selection that's amplifying the beneficial mutations. In this context it is interesting to think about those copy number variants, deletions and duplications that affect genes on the right, and those that fall outside of genes on the left. Do we see a difference between the two? Because know genes are important. If we don't see a difference in the distribution of these variants between the genes and the non-gene regions then we're probably thinking that biology doesn't really care about this form of variation. But when we look at the effect of this purifying selection, we can see that it is acting very strongly, because remember, the mechanisms that generate this variation are blind to where the genes are, so these variants are being generated all the time but the variants that fall within genes, they are getting removed from the population by natural selection. And we see there are many fewer, although there are many hundreds of variants that affect genes, there are many fewer than we would expect and the ones that we do are typically less common in the population because they are being pushed, kept down by natural selection. If we look on the other side, positive selection, well, most variants in the human genome, if they change function, change it in a negative sense. That's because we have a highly evolved DNA. It is pretty good at doing its job, evolution over billions of years have managed to achieve that, so most changes that you make are not going to be beneficial, so purifying selection dominates but we do find examples for both copy number variants that affect genes and copy number variants that fall outside of genes which show clear signatures of being selected positively by natural selection. Now, we can see those signatures but we don't necessarily know why that is. There is one really nice example which our collaborator, Charles Lee, worked up, for a particular type of genic copy number variant, so the gene that digests starch, and that exists in multiple copies. What Charles showed was that populations this have prehistorically been eating high starch diets have more copies of that gene, and populations that historically have had low starch diets have fewer copies of that gene, and presumably, there has been positive selection for those additional copies to drive the digestion of starch in saliva. So, moving on, then, to diseases. Well, we can broadly break down diseases where genetics plays a fundamental role into common diseases, such as diabetes, coronary artery disease, and – or rare diseases such as cystic fibrosis. I will take common diseases in turn first. Common diseases, we know, genetics plays a role, but it plays a role in concert with the environment. So the genome and the hamburger. And what we know about the genetics of common disease is that the genetic variants that influence our risk of those common diseases are spread throughout the genome, for any one disease that we might think about. There are many tens, probably hundreds of variants that influence our risk of getting Type 2 diabetes, for example, but each one of those variants only has a very modest effect on our risk. Maybe increasing it by 10% or 20%. Certainly not causing diabetes, and those modest effects, those subtle effect, we can see those effects if we look at very large numbers of individuals, and so we can only really detect these effects if these variants are common in the population. So there is a fairly standard way of trying to identify if a variant affect the risk of a common disease, and that's just to take a set of individuals with the disease, and a set of individuals without the disease, and just look at the frequency of that variant, between those two individuals. And so, one can simply, in this case, label the individuals who have this variant in both the patients, and the controls, and in this situation, as in most times we do this kind of experiment, we see there is absolutely no difference. This particular variant doesn't influence the risk of diabetes. So as a large-scale UK-wide consortium, we investigated whether thousands of copy number variants from the maps that we made in normal populations, whether they influenced the risk of eight different common diseases, and we found very little. We found, identified four copy number variants that influenced one of those eight different diseases. And all four of those were already known previously. So we were left with a rather surprising conclusion that despite the fact that these variants are very large, and they remove large segment of DNA, very, very few of them actually influence our common disease risk. But it is a different story when we come to the genetics of rare diseases. So the genetics of rare diseases, there is no hamburger on this picture, because the genetics of rare diseases is driven by very strong mutations that in and of themselves are sufficient to cause those disorders. Broadly they can be classified into two types. Those where just a mutation in a single copy of the gene is sufficient to cause the disease, and those where, like cystic fibrosis, you need both copies of the gene to be damaged before the disease occurs. So, in collaboration, so there are many different rare diseases, there are thousands of rare diseases. It has been estimated though, that although each one is individual ly rare, about one in fifteen, one in twenty of us has a rare disorder. So it is cumulatively quite a big impact. We focused just on one study in collaboration with Saddak Farouk in Cambridge who has been working on genetics of onset obesity in children. Obesity is a common trait, but the extremeness in the early onset of the patients that she works with is actually a much rarer trait. We applied exactly the same kind of technologies that I showed you to generate those maps to patients that she works with and we found that there was one particular region of a chromosome where, of chromosome 16 in this case, where we found two different types of deletion. Losses of DNA in these families. We found a small one, whereabout 200,000 of these letters of DNA were lost, and we found a big one whereabout 1.7 million of these letters were lost. If we looked at the families of those individuals, then that small deletion that we observed, we observed in children, in families, where there were multiple individuals with extreme obesity, they often have siblings, and they had at least one parent who was morbidly obese, whereas the very large event that we observed that overlapped that, we only found in individuals where they were the only person in their family to be morbidly obese. So we think that there are particular genes within this interval that these two deletions share in common that have very important for the sensing of when we're hungry, and when we've had enough. And so we looked at these CNVs and we tracked how they passed down through the families. What we observed was, and that is shown with these green lines here, is that in the families where there were multiple individuals who were morbidly obese, every individual carried this deletion that was morbidly obese, but in the families where there were – the individual was the only one affected, these were new mutations that the child had that neither parent had, and that explained why these individuals were the only members of the family that were obese. Now, we cannot necessarily treat these particular diseases at the moment, but a diagnosis is very powerful in these families, and in this particular example it was extremely powerful because the families at the top, several members of those families, several children in those families, have been taken into social Ccare because they have been regarded as case of parental neglect. You've got a morbidly obese parent, often morbidly obese sibling and you've got another child become morbidly obese. These kids were then given back to the families, so it could be treated but it obviously had a major impact on those families' lives. This is just one example of one rare disorder where we found that these copy number variants were important. But the types of copy number variants we're talking about here are not common. They are very rare. In fact, extremely rare. Fewer than one in a thousand individuals will have this, one of these rearrangements. So, and that's what we see in common with many different rare disorders CNVs are not necessarily the only cause, but they account for an appreciable portion of many of the rare diseases. So, why do we see this difference? Why are rare gains and losses of DNA important in rare disorders, but pretty unimportant in common diseases? So I think the reason is probably just down to the sheer weight of numbers. So if we think about common diseases, so remember these are – the genetics of common disease, we can only look at the common variants, those that are a subtle effect on the risk of disease, and if we look at the – all the common variants in the people in this room, we’ll see that the vast majority of them are not CNVs, for every common CNV we've got a thousand other common variants of other types. And all of those have been through the filter of natural selection. They have only become common in the population because they are not highly harmful to us. And so, it is perhaps unsurprising that when we look for common variants that influence common disease, CNVs do not play a major role, but if we look in rare disorders, if we look at rare variants that essentially knock out an entire gene, then actually, copy number variants probably account for one in ten of those. And that underscores why probably 10% of rare disorders are caused by copy number variants. So we see this interesting dichotomy, CNVs are clearly biologically important but it is really the rare ones that are playing the major role in human disease. So I want to move on to the second part of my talk now, which is about mutation rates. So I alluded initially in that obesity study to how new mutations can be important for disease, but new mutations occur in all of us. All of us have mutations in our DNA that our parents did not pass on to us. And we need to think a little bit about the journey that DNA takes, as it goes from one generation to the next. So, this, you may not recognise it, is you, at a very early stage in your life, when you were just a single cell, after the fertilised egg. And we know that as you pass on your genes to the next generation, you pass it on through your ***, or through your eggs. How does the DNA get from that original single cell down to the *** and the eggs? What happens is you get early development, you get division of these cells, copying of the DNA each time the cell divides because each cell has the same DNA component as each other, until we reach the primordial germ cell, and here the paths of men and women split. So to create eggs, there are an additional set of genome copies, but essentially all the eggs that a woman has in her life are in her ovaries when she is born. And all that happens during the menstrual cycle is that one of those just matures with – and so that means that every single egg that a woman produces, the DNA in that egg has been copied the same number of times, rough low about 30 times. But it is a different story with men. Men produce *** throughout the course of their life, post puberty. And so what you see here is at puberty, a particular special form of cells called [inaudible term] and these turn over severe sixteen days in the test ease, so 23 times each year, the DNA is getting copied. That means that the *** of a 40 year old man has DNA that has been copied probably twice as much? Maybe three times as much as the *** of a 20 year old man. So if we think that this copying process, like all copying processes, is prone to error and maybe that's where mutations come from, then it suggest not only that most new mutations might come from dads rather than mums, but also that the number of new mutations might increase as a dad gets older. And that hypothesis was made a long time ago, because you can make hypothesis as soon as you understood this journey of cells from one generation to the next, but we can use the tools that we have to measure mutation rates and see if this is really correct. So there is a number of different strategies one could take to measure mutation rate. In concept, they are all very simple. All you are really doing is comparing the DNA of one generation to previous generations. So on the left-hand side, you have comparing the genomes and the *** of a man to his – to the rest of his DNA, in the middle you have the obvious example of sequencing the DNA of a child and comparing to their parents, but equally one could look at the mutations that have happened over evolutionary time by taking two species that have had a common ancestor. That's a relatively straightforward experiment to do and it highlighted when the chimpanzee with genome was sequenced compared to the human genome, some very intriguing observations. Focusing on this evolutionary approach first, what was observed was look specifically at the single letter changes and the differences between humans and chimps. What was observed was that the number of differences you see between humans and chimps depends on which chromosome you are looking at. So at the top here, you've got the X chromosome, in the middle, you've got all the other chromosomes bar the Y chromosome down the bottom. And if you compare the humans and chimps genomes, what you see is on the X chromosome there is a difference between humans and chimps, about one every hundred letters of DNA, but at the bottom, the Y chromosome, there is a difference about one every 50 bases, so this suggests the Y chromosome has been mutating twice as fasts as the X chromosome. And the other chromosomes are in between. Now, if we think a little bit about how these chromosomes are passed on from one generation to the next, this fits with the hypothesis that I mentioned before, that most mutations come through the male line. Because mothers pass their X chromosome on to their daughters, and to their sons. Fathers only pass their X chromosomes on to their daughters, and that means the X chromosome's journey through evolution spends two-thirds of its time in the female genome line and one-third in the male genome line. If we compare that to other chromosomes, the non-sex chromosomes here, they spend an equal amount of time in the male and female line, but the Y is only ever passed on from father to son and it has the greatest divergence between humans and chimps, so this very much fits with the idea of males being more mutagenic than females. So through this kind of study, and others, we have the picture of what the average genome might look like, so your genome here, were we to sequence it now, we would find there would be about three-and-a-half million variants that differentiate your genome from the person next to you, and indeed from the referenced human genome. If we focus just on the genes, we would find a much smaller number, we would see about ten thousand variants in those genes that would affect the protein that those genes produce. So a much smaller number, and if we focused specifically on how many of those three-and-a-half million variants are new mutations, we would find that each one of us has somewhere in the region of 50 to a hundred new mutations in our genome that our parents didn't have. Now most of these mutations are the single spelling errors, single base changes. And what we know is that from the studies comparing humans and chimps and others, we know that the mutation rate for a given letter of DNA is about one in a hundred million generations. Now that's pretty good. Given all the number of times that DNA has been copied going from one generation to the next, and only one in a hundred million bases is being mutated. But a hundred million is a big number. But it is a lot smaller than the number of humans on the planet today, 7 billion. And what that means is actually every single base in the human genome, in the reference sequence, has mutated tens of times in the humans that are currently living on the planet today. Now, if we think about other forms of variation, so the gains and losses that I mentioned to you before, then actually these occur at a much lower rate, so it is probably only one in 20 of us in this room will have a deletion or a duplication of a segment of DNA that's longer than maybe, say, a thousand bases, that's new that our parents didn't have. And we can understand a bit more about that process by taking a different strategy for measuring mutation rates that I mentioned before. Comparing the *** of a man to the rest of his DNA. And what we need to do this, because each mutation process is vanishingly rare, the types of deletions that I'm going to describe to you are – mutate a bit more rapidly than a single base, it is more like one in a hundred thousand, but we have to design assays that are essentially capable of picking out the one *** that has a deletion against a background of 99,999,000 *** that have no deletion whatsoever. A really fantastic Post Doc who worked in the lab, Dan Turner, actually designed eight of these different assays to look at different portions of the genome, and I'm going to describe to you what he found. He looked at four regions of the genome where we know that there are deletions or duplications that cause genetic disorders. And he designed assays for each one of these to work out how fast these are occurring in a man's ***. What he found was that the most rapid one was occurring about one in 24,000 ***, this deletion here, the slowest one, this duplication here was about one in a million ***. Now, the individual *** donors that Dan looked at were just normal men, drawn from the population. They didn't have a genetic disorder, that means that all of the men in this room have *** in that I ever test ease now swimming around with these deletions and duplications in them. But we could also determine something of medical relevance from this, because we could compare the frequency with which we see these mutations in ***, with the frequency of the disorders that they cause, and if we looked at two disorders where we think they are pretty well diagnosed, these disorders, and we have a good sense of how frequently they occur in the population, then they actually agree very well with the mutation rates that we identified from ***. But if we looked at two other disorders, here shown in green, actually, what we observed was the mutation rate we observed in *** was considerably higher than the frequency with which these disorders are being diagnosed so we hypothesised that actually these are being under diagnosed, and subsequent studies have shown that to be the case. We also investigated this very rare event here, which had never previously been reported in humans, but we thought probably, from our understanding of the mutation process, would exist, and because this duplication is very similar to other duplications that we know cause disorders, we predicted it would cause a particular type of developmental disorder. And subsequent to us publishing this, Jim Lupski's group have gone on to show that's the case. What it shows is if you have a good understanding of the biology of mutation, you can predict diseases that you haven't observed yet. We also wanted to think about mutation as a biological process itself. A bit like height. We know that height is influenced by environment, we know that height is influenced by genetics. Is it the matter for you take rate is also influenced by mutation rate and genetics we looked at multiple different *** donors for one particular rearrangement, this fairly rapid deletion that I mentioned before. And we saw quite a lot of variation. But these are all quite rare events. You have to sift through millions of *** to find a few tens of events, so is this variation just random or not. So we thought one way to answer this is by looking at *** from twins. So if we look at identical twins, and they have very similar mutation rates, that suggests there is something systematic that's influencing those mutation rates but if they have dissimilar mutation rates that suggests that genes and possibly a shared environment is not so important. So we measured the rate in the first twin and the second twin of an identical pair and if there is no relationship between the two we would expect to see just a flat line. No obvious relationship, if the first twin has a high mutation rate the second twin could have a low mutation rate. What we actually observed was the opposite. If one twin had a high mutation rate the other twin had a high mutation rate. Now, it is quite hard getting *** from twins. Harder than you would think. Quite often because one of them has had a vasectomy. And it is impossible to get *** from twins that are all the same age. So we've got pairs of twins that are different ages. So if age is an important determinant of mutation rate, then maybe that's just explaining why some of these twins have very similar mutation rates. So we looked at that, but actually, what we observed, was there is absolutely no relationship between the age of a *** donor and the rate of deletions that occur. So this is a different mutation process, from the one I described previously of single base changes, and this one doesn't appear to have a relationship with age. But if we want to understand the mutation process that's most prevalent, those that's generating the single base mutations that I mentioned before, then we need to do the very obvious experiment of looking at children and comparing them to their parents. So, we've done this, and it is a quite simple experiment, what you essentially do is you have mother, father and child, you have a sequence DNA sequencing machine, hopefully one with a nice neon light on, and you just look for the new mutations that the child has that the mother or father doesn't have, and then you need to use a few genetic tricks to try and work out which ones came from dad and which ones came from mum. And the first two children that we looked at in this way, as part of a largescale international consortium called the thousand genomes project, these cartoons represent an awful lot of work, but essentially in the first child that we looked at, we could, indeed, see most of the mutations, and each one of these dashed lines indicates a mutation, most came from the dad, and a small number came from mum. And then we looked at the second child, and we actually found that more of them came from mum, and fewer from dad, so there is quite a lot of variation between both of the mothers and how many mutations, new mutations arose on the chromosomes that they passed on, and between the fathers. So we thought we need to try and explore this variation. Why is there this much variation? Well, some of it could occur just purely by chance. Because we're looking at relatively low numbers of events, 50, so if the average number of mutations, in my *** at this moment in time was 50, not every single one of those *** would have 50, some would have 40, some would have 60, and if you think about the mutation process and how it works, broadly speaking, it is quite reasonable to expect quite a lot of variation between the ***, even between one man at one point in time. So what we wanted to do, was to try and get around this variation, was to look at families, where the same mum and dad had had kids over quite a period of time. This is all published data. So we looked at three families that had had four children over more than a decade's worth of time, and the simple idea was we wanted to compare the mutations that each child got from mum and dad and see whether it changes between the oldest and the youngest child. And what we observed from these three families, so each one of these lines is the number of mutations we saw in the children in each one of those three families, as the father got older. So you can see in each one of these families, there is a clear relationship between the age of the father, and the number of new single base changes that we see in the children. But strikingly, it’s not the same pattern between the families. So these two families here, every year that the father gets older, his *** appeared to be acquiring three mutations per year, whereas this family here, it is fewer than one-and-a-half mutations per year. And other people have estimated this in large numbers of samples, and they have estimated a population average about two new mutations every year is about what we expect to see, but this evidence suggests that it may he not actually be the same between families or between individuals, so we need to do more work to understand whether genetic or environmental factors that are influencing these mutations. So, we also have been able to look at whether, what fraction of work was from the mum and what fraction was from the dad and as we expected most are from the dad, so that hypothesis that more mutations would be from dad, and that there would be more mutations as dad got older really seems to hold true for these single base changes here, but it doesn't hold true for these – the particular mechanism of gains and losses of DNA that I showed you before. So why do we see this different effect of the father's age between these two mutations processes in well, to understand that, we have to go back to the journey that DNA takes as it moves through a generation. So, if we look at this, this process here, these small changes of a single letter, then these can actually occur at any stage in this, because these can occur at any time that DNA is copied as it goes from one cellular generation to the next. Whereas what I didn't tell you about the mechanism that generates this type of variation, is that it only occurs in a very specific cell division that only occurs once during the maturation of an egg, and once during the maturation of a ***, and so this type of variation doesn't have the same paternal age effect, and it also doesn't necessarily come more from dad. So there are actually different mutation processes, each one with their own properties that we need to understand, rather than a one size fits all. So just in the last few moments I just want to talk a little bit about the disease impact of these new mutations. I gave you a couple of examples earlier of those families that had children with extreme early onset obesity, but new mutations are increasingly being recognised as a cause of rare disorders, especially rare developmental disorders, and I and my colleagues and some of them are here, Carolyn Write and Helen Firth, have been working on a project called the deciphering development will all disorders project, which is a collaboration with the entirety of the NHS, and the genetic services within Ireland, to try and see if we can use the kind of technologies that we have access to, those arrays that I showed you, the sequencing machines that we have access to and the NHS doesn't, and try and use them to diagnose children that the NHS cannot currently diagnose. So the nature of this kind of clinical problem is that most of the time the child comes in to a clinical genetic centre with a severe developmental disorder, the parents are perfectly healthy, and so we can ask the question, are new mutations one of the reasons why we see most of the time the child is the only one affected. And if we understood what is the genetic architecture of these different disorders, because there is a whole set of different disorders involved here, then we better inform the NHS. What kind of technologies they ought to implement to cost effectively diagnose these children. Now recognising that many of these children, there won't be cures available for these disorders, but they can have a massive impact on families, in terms of informing them about how – what the risk is of having a second child with the same disease, and potentially offering them pre-implantation genetic diagnosis, to avoid having that disease in the second child, but also many of these children are misdiagnosed and they are on inappropriate treatments, so the families are really desperate for a diagnosis, and we're working very hard to try and use these new technologies to provide them. So the ultimate aim of this project is to recruit into the study 12,000 families, each with a child with a rare developmental disorder. And thus far, having analysed the first thousand of those families, we can diagnose about 20% of these children, just with our current understanding of what kinds of mutations can cause disease, and we're finding that most of those diagnoses are new mutations, as we might predict, and the reason we can pick those up is because in this study we're looking at the DNA of the children, alongside the DNA of their parents. So most of these mutations are new and most of them are the small, spelling kind of errors of single base changes that I mentioned before. So, the question then arises, we've had these new technologies, we've made new discoveries, what new ideas stem from this? Well, and this is somewhat provocative, we can think now with our current understand, what would it take to minimize the morbidity caused by these new mutations? What kind of things could we do as a society, or should we consider doing as a society that might enable us to minimize them? Well, the first thing that we understand, many of these disorders are caused by new single base changes, and we know that older men have more single base changes than younger men. So actually, one simple thing we could do would be to all donate *** to ourselves aged 18 or 20, freeze it, and only then use those *** to conceive children, and of course, the problem with this is we cannot just target this at individuals at risk, because all of us have new mutations and all of our children have new mutations, so it is not possible to identify who is going to be at risk, it has to be a population wide strategy. Because the number that I mentioned before, a hundred million, one in a hundred million generations, that's a large number, smaller than the number of the people on the planet, it is also smaller than the number of *** in the test ease of every man in the womb, what that means is that in the *** of every man in the room, are *** carrying every single disease mutation that we know about. And that means it is purely a matter of chance whether we go on to have children with developmental disorders caused by these new mutations so not possible to identify those that are at higher risk. Other than, of course, parental age, but of course one doesn't have a time machine to go back and get *** from one's 18 year old once one has decided to have children aged 40. So the other approach that one could potentially take is using prenatal screening. Now, we already screen prenatally for developmental disorders, and we use, we do this using ultra sound, and at 20 weeks there is a scan that is offered to parents, most parents take this up, but this will only pick up developmental disorders that manifest themselves as some kind of large-scale structural change within the fetus that can be picked up by ultra sound. It won't pick up for example whether that individual might have seizures for the rest of their lives or never be able to speak. It won't capture those functional deficits. It is also very hard to counsel parents about what to do in that situation because when there is a structural problem it could be that that structural problem comes along with all kinds of other intellectual disabilities, or it could be that that structural problem is just in isolation and it could be easily repaired during the first few years of the child's life who would then go on to have a happy life and currently parents who have these scans that reveal that there is a structural problem with the fetus, have a very difficult decision to make, and potentially, we can make that decision more accurate or give them more precise information, if we added genetic screening into this, but this clearly is something not everyone will be comfortable to do, and it is not the job of scientists to tell society what society should do, but it is the responsibility of scientists to say what society could do, based on our current understanding of what is causing these sometimes devastating disorders. So these are relatively new ideas that need discussion and debate, and that kind of squares the circle of what Sydney Brenner was talking about, was that new technologies begat new discoveries, begat new ideas. And recognising that there are many things that we do today that are historically or prehistorically would have been regarded as being absolutely abhorrent, and we take completely for granted, and this gives me a shameless opportunity to show my favourite Raymond Briggs cartoon as he proposes to his father that perhaps floppy trousers rather than these chiseled stone pants that his father is wearing might be a more appropriate way of dressing. So I think Sydney was very much right, there is a lot of value in scientific progress to exploring what new technologies enable you to see about the world that you didn't previously appreciate, and then deriving from those new ideas that then begat further progress. And with that, I would like to thank all of you for listening, I would especially like to thank many of the people I didn't have time to thank in the talk that we've worked with over the course of the years, all of what I have described to you is very much a teamwork between different people, between clinicians, between researchers, between computer scientists, and it is only that I'm standing here in front of you that the hard work of all of those people. I mentioned a few of them, as I went along, but there are many others, some in this room to whom I apologise in advance. And it is also, I think, extremely important that we thank the families who contribute to these studies, both the families who are perfectly healthy and volunteer for the research that we do on understanding mutation rates, and the families that are desperate for a diagnosis and volunteer to be part of studies like the DVD study that I mentioned before. So, last month we tried to give back to these families, in some small way through charity bike rides, we organised nine around the country, with the clinicians, and the researchers involved in the project, and between us, we cycled about 4,000 miles that weekend, which is pretty much the equivalent of Lands End to John O'Groats, back to Lands End, back to John O'Groats and back again. So these two family support groups, Swan UK, that's syndromes without a name, and unique, and completely invaluable job of working with these families to help them negotiate this tricky path they are trying to find a diagnosis for their children, and the prize money for this lecture that Jean will hopefully give me in a moment will be going to this fund and I recognise that not all of you can read that, but I have a whole set of leaflets that I will be putting outside, that contain that URL, and I would like to encourage you to support these charities because they are very worthwhile, what they are doing for these families, and so with that I would just like to say thank you to you again and be happy to take any questions or comments. [applause] We can thank him properly later but Mat said he would take questions so if you have a question, put your hand up. So here. Yes? Thank you very much for a wonderful talk. Chromosome abnormalities. You didn't really talk about them. As far as downs syndrome is concerned, I believe it is the mother's age that is more important. I was wondering if I could comment on that kind of abnormality. Yes. So that's very much well-recognised that downs syndrome and the other [Inaudible] do increase with mother's age and they do increase in this kind of linear way with father's age but they occur in a kind of S shape so it ramps up dramatically after 35. That's extremely well known and it is interesting to reflect on the fact that the way in which we orchestrate prenatal screening in this country takes account of the mother's age. It recognises that epidemiological relationship. There is no similar equivalent of taking account of father's age, for example. And yet it is an open question as to whether a father's age and new mutations of single base changes is actually more detrimental than ma turity in all ages in causing these chromosome all abnormalities, but that's hopefully something we'll be able to answer in the next couple of years. Are there any environmental factors that can affect the rate of mutation? That's a very good question. So, people have looked very hard, and found precisely there are no recognised factors, environmental factors that increase germline mutation rates as they pass on, but it is quite difficult work to do, because the mutation rates are quite rare. There are several, tens of factors that are known in mouse studies that increase mutation rates, and so we would assume, but they are the kinds of experiments that one cannot do on humans, certainly not ethically, anyway. And so it is highly likely that there are environmental factors, but we cannot do those experiments, so we just have to rely on natural experiments of, often of our quote-unquote, "ingenuity", things like the Chernobyl accident or nuclear test sites or exposures to other types of environmental mutagens accidently. The nature of those experiments, it is often very difficult to do them because we don't really know what dose people received because it is not a controlled experiment. So there are likely to be environmental mutagens, and we don't know what are. Any more questions? Yes? Here. Thanks. Where you are doing the prenatal screening, presumably you will only suggest rejection of a fetus or an embryo which has a recognised lesion, which is going to cause a disease. You won't just ask for an embryo that has got a lot of changes which don't go to particular areas that you recognise to be thrown away? Because if you do that, aren't you reducing, in the long-term, the genetic variant of the human population, and its ability to evolve? So, I mean, firstly, I guess I would say that the way in which, and I think this is the right way in which prenatal screening is done now, is that it is non-directive. Parents get to choose, it is up to the parent to choose what the results are, and scientists, doctors, no-one should really be telling them what to do, and all we can be doing is giving them the most accurate information to make those choices. I think the point that you made, I think given the sensitivity of prenatal diagnosis, if it were to be implemented, it would have to be absolutely rock solid evidence that we knew that that variant was really going to cause a completely deleterious change, but that's my view. And we already know from prenatal screening that there will be, you know, fetuses with heart defects that parents will decide to keep, and fetuses with heart defects that parents will decide to terminate, and that's their choice. I think that non-directive view of this is very important. And I guess as to your second question about evolution, I'm not too worried about that. I think there is plenty of new mutations being produced all the time, every single base is being mutated in the genome as we speak. So there is plenty of fertile material for evolution. Is there one? Yes. I have a question about sequencing. When you say something is something like cc T, can it also be read as GGA because that's the other side? Yes. So typically, we're only showing one of the two strands of DNA. So which ones do you show first? When you give them out and publish them, which comes first? It depends exactly on the nature of the publication, so it depends if your gene goes that way along the chromosome or that way along the chromosome. But typically, if you don't have an imposed direction we tend to go from left to right. Well, which of the two halves do you take? A or T? We take the one that is – so you can imagine, one DNA strand going in that direction and the other going in that direction, we tend to take that top strand and report it. Okay. I think we're passed our finishing time, so we should stop there. But I want to thank Mat for a lecture that was packed full of information, *** up-to-date, and thought provoking, and I think that is reflected in the sort of reflective attitude of the audience now on the sort of questions that we've had. I think it has been a splendid lecture, and in order to recognise that, he is going to get a cheque, but first of all, he is going to get a nice certificate, so thank you very much. [applause] a very nice medal, the Francis Crick medal. [applause] and as he said, he is going to donate to charity, which I think is wonderful. Thanks.