The Human Genome And Individualized Medicine - David valle

David Valle: Thank you, Greg and Bob, and it's a pleasure to be here. It's -- I've had the pleasure of a long collaboration with people at NHGRI. We recently actually have instituted a combined fellowship program in genetics with the intramural program at NHGRI and of course now Suburban and Johns Hopkins have joined together so we have lots to celebrate and to look forward to in terms of continued interactions. And so I'm pleased to be here to have a chance to talk to you about my favorite subject. I'm also pleased to be the leadoff speaker in this series, which Greg and Bob have put together, which looks quite good. And so I hope to sort of set the stage for many of the people that come later. Please interrupt me if I'm not making myself clear or if you want clarification on some point or if I'm talking about stuff that's old hat, tell me to move ahead and not spend time on it. So I'd love to make this as informal and as interactive as possible. So what I would like to talk about is the human genome and what I call individualized medicine. And so we could start off by saying, what is individualized medicine? I'm going to put this upright. And so I turn to Francis for guidance here, Francis Collins, and Francis said back in 2005 that at its most basic, personalized medicine refers to using information about a person's genetic makeup to tailor strategies for the detection, treatment or prevention of disease. And I think that sums it up quite well. The only place that I disagree with Francis is that I much prefer that term individualized as opposed to personalized and I'm going to take the prerogative here, since I have the podium, to tell you why. So personal, if you look in the dictionary, has two meanings, really. It has relating to someone's private life, a sense of intimacy. And the other is relating to one person or particular individual. Now medicine has always been personal. As physicians, we are allowed to ask people questions that no one else asks them. We are allowed to examine them in ways that no one else examines them. So medicine has always been personal, in my view. But it has not been individualized. And so that's what we're looking at, a way to consider each person as an individual and to adjust out thinking about the patient to account for their individual strengths and weaknesses. So for that reason and for others which I won't go into, I urge you to consider thinking of this as individualized medicine. Now, why would we consider the topic of the individualized medicine now? Why is it so much in the press and in the news and so forth? And that's particularly worth asking because modern medicine really has had enormous successes. So we've had a dramatic prolongation of the lifestyle, of the lifespan, and a dramatic improved quality of life. So medicine has been doing a good job. On the other hand, there are ongoing concerns. Many diseases have an increasing incidence. There's an unacceptable frequency of adverse events, adverse therapeutic events. We all hear daily about the cost of medicine continuing to go up. And if you go talk to patients and all of you do all the time, they usually say they want two things from their physician. They want a physician who's smart, has good knowledge, but they also want a physician who cares about them as an individual, as a person. So I think we have an opportunity to move medicine from a very successful level to a new plateau and I think the way we will do that is by individualizing medical care. So to put it in a different way, I like to think about a particular disease and the one I would mention is type 2 diabetes. As you know, its incidence is increasing throughout the industrialized world. And it's intertwined with the increasing incidence of obesity and it's a chronic illness with an array of complications, microvascular complications and macrovascular complications. But suppose a member of your family or a friend of yours, a close friend of yours has type 2 diabetes, would you like to know the prognosis and response to existing treatments for the average patient with type 2 diabetes? Or would you like to know as precisely as possible, the specific features, prognosis and response to therapy for your loved one or your close acquaintance? So, or even better, could we imagine knowing ahead of time who is at high risk for type 2 diabetes action prevent the illness from ever occurring? So that's the goal here, is to try to identify the individual strengths and weaknesses of our patients and as much as possible prevent them from ever getting sick. But if they do get sick, then to individualize our counseling and our treatment and optimize it as quickly as possible. So in that regard, I sort of think there are two characteristics of modern medicine, current medicine. The first is, in medical school -- and currently I think we have been trained to perform what I call an average medicine. Now by average, I don't mean in the pejorative sense that it's mediocre, I mean that we think about, when we make a diagnosis we think about what is appropriate for the average patient with that diagnosis. And part of that comes from the way we're trained and I call that aspect of our training the classic case mentality. So this is -- the little boy on the left is a patient that I saw and his sister and they both have a genetic syndrome that's characterized by some abnormal physical findings. And in the days gone by what would happen is the family would come to the clinic, we would take a family history and history of the present illness, a physical exam, some X-rays and routine laboratory tests, and then whoever had seen the patient would get together and say well, I think it's a case of this or I think it's a case of that. And usually what would happen, at least at Hopkins, would be the person with the most grey hair or the least hair would finally make a pronouncement, that I think this is a case of whatever. And then our thinking would become constrained. We would start thinking of the patient as an example of a particular disease, rather than thinking of the patient as an individual who happens to have this set of problems. And at the time, in terms of the tools that we had available, that was the way we had to practice medicine. Now the other aspect of medical practice in the 20th century, is what I call trial and error medicine. That is to say that, as you all know, what we do is we see a patient, we make a diagnosis, we think about what kind of interventions we would like to make, we make some baseline measurements, we make the intervention then we follow the patient and repeat the baseline measurements and ask where that intervention, has the patient -- is the patient during better? Is it staying the same, or is it in fact worse? If they're saying the same or worse then perhaps we'll change that intervention and alter it and do something else. So this is sort of a trial and error kind of medicine. So the goal in the future would be to be able to predict with a fair degree of accuracy what would be the best treatment for this patient without having to go through this trial and error sort of set of protocols. So, thinking about the patient and -- as an individual and so I think the more experience physicians have, they come to learn, despite the fact that in medical school, classically, you are sort of taught, like, this is what happens with a case of this, this is what happens with a case of that. As you get more and more experience, you begin to realize that no two patients, even those patients with the same diagnoses, behave exactly alike in terms of their complications, their response to treatment and so forth. So that's a lesson that we tend to learn by experience in the trenches. And this point was actually emphasized by Oswei Temkin who was a professor of the history of medicine at Hopkins, now deceased, but he said that there is no science of the individual and medicine suffers from a fundamental contradiction. Its practice deals with the individual, in other words, the person that comes to see us is an individual. While its theory, what we learned in medical school, grasps universals only. So you were sort of left, in days gone by, to individualize your approaches and your thinking after you get out of medical school. This idea is not new, actually. A colleague of mine pointed this out to me that back in 350 B.C., none other than Aristotle said, the doctor does not treat man except accidentally. He treats Cailus or Socrates or somebody else. So if someone knows the universal without knowing the individuals contained in it, he will often fail in his treatment, for it is the individual who has to be treated. So it's not a new idea. So I keep reminding our students and what I think we have to think about is that when we see a large number of patients -- I'm a pediatrician, so this is where I start seeing my patients -- we just have to remind ourselves that each of these individuals has his or her own unique sampling of our species' genetic endowment. Each has a unique history of in-utero development and each is born into a family with a unique constellation of socioeconomic variables. So all of those factors, the genetic makeup, the early development and the social cultural milieu for each patient individualizes them and has an influence on what diseases they are at risk for and how they will respond to our attempts to treat -- or prevent or treat those diseases. So this is all well and good, but you could ask, what has changed? What makes it possible to contemplate moving medicine from its current successful level to an even more successful level as we go forward? So I would submit -- and I'm a geneticist so I would submit that the major driver for this is -- has been the Human Genome Project and what we've learned about the genetic makeup of members of our species. So, the Human Genome Project sequencing technology and the appreciation of sequence variation, what's come to be called whole genome sequence biology, an increasing prominence of evolutionary thinking in medicine, a progress in disease gene identification and sort of what has been, in my view, a watershed, but the ability to obtain individual genomes sequences on individual patients. So I'm going to talk about each of these bullet points briefly. So first of all, let's turn to the Human Genome Project and sequencing technology in genetic variations. So you all know that the genome project really was contemplated in the mid-'80s and there was a good bit of argument initially about whether or not it was a good idea and a useful way to spend our research dollars. But eventually, the argument carried the day and the genome project got started officially on October 1, 1990 under the direction of John Watson, across the street. Francis took over in 1993, Francis Collins. And initially it focused on technology and model organisms, yeast and C elegans and flies and so forth, but in the mid-'90s it turned its attention almost completely to the human genome and in fact it was a competition between the so called public group headed up by Francis and the private group headed up by Craig Venter. And miraculously, both groups finished on the same day, as shown here on this front page of The New York Times. This is Tuesday, June 27, 2000, when both groups announced the fact that they had a draft sequence of the human genome. The public group went on to do more than a draft, to do a very high quality complete sequence of the human genome and that was finished in about 2003. So let me just make sure we're all on the same page with terms. Is there a -- I forgot to bring a pointer -- is there a pointer here? So, I just want to -- since it's been awhile since some of you may have thought about this, what I've shown here is a diagram of a gene. So amazingly, the word gene was coined in about 1908 by a man by the main name of Johannsen and if you look at how genetics -- geneticists have defined what a gene is over the time since then, the definition keeps changing. And in fact, if you put 10 geneticists in a room right now and asked them for a definition of a gene, you might get 12 definitions. So let me tell you sort of so we're more or less on the same page, what most of us mean. So what I've defined -- shown on this diagram is a mammalian gene, a gene that encodes a protein. And it turns out, the pieces of the gene that actually account for the coding sequence are called exons. They are the pieces that -- thanks, Greg -- that are -- so here are the exons. This is a four exon gene. Those are the pieces of the gene that are -- been transcribed into RNA and then there's splicing that goes on, these four segments -- the RNA corresponding to these four segments -- ends up in the mature mRNA that goes out into the cytocell. And then there are pieces of DNA between the exons and we call those introns and when they're transcribed, the transcription of the gene to RNA would start right here. It would go like this and then these intronic pieces would be spliced out and the mature message would be made up of a sequence that corresponds to exon one, two, three and four all stitched together. So these are the exons. These blue -- the purple line is the introns. And up in the front of the gene, the five [unintelligible] there's some regulatory sequences. We call that the promoter. There might be some distant regulatory sequences way away that we call enhancers and the translation would start here once the RNA was made and all of these would give information about making the protein that corresponds to the product of this gene and then this would be the three prime UTR, untranslated region, of the message. So this is what is a classic protein coding gene. Now we now know that there are other genes in the genome that encode RNA but the RNA is never translated into proteins. So there's a set of RNA genes as well as protein genes. For most geneticists and for what I'm going to tell you today, when I say gene I mean genes that encode protein like this one here. So if we look at where we stood in 2003 in terms of understanding the human genome, there are some simple features that I just want to remind you of. First of all, if we counted up the number of genes in the genome, it turns out there are about 22,000 genes in the human genome. Now this is a big surprise to everybody. We had a pool about how many genes it would take to make a human and of course because we're egocentric we -- most of us guessed way high. I guessed 100,000 genes. And it turns out that if you look up all organisms that have bilateral symmetry from flatworms to butterflies to fruit flies, it's about 20,000 genes. So for some reason, we don't know why, this is sort of the sweet spot for genes in the biological kingdom. We know right now of about -- the function of human genes of about 75 percent of these 22,000, so there are still a good number of genes in the genome we don't have a clue as to what their function is. Those exons, those pieces of genes that actually get spliced and retained in the messenger RNA and go out on the cytocell and are used to direct the synthesis of proteins, we can actually count them up now because we've got the sequence of the whole genome and there are about 220,000 exons in the genome and those exons are distributed over about 50 mega bases. The entire genome is about 3,000 mega bases or three giga bases. And over here is a comparison to the mouse and there's actually pretty good similarity between a mouse and the human genome. So 22,000 genes, about 220,000 exons. And the exome, which we've come to call that portion of the genome that's comprised -- that comprises all the exons, is about 50 mega bases. That's only about 1.5 percent of the total genome. So there's a lot of the genome that doesn't seem to be -- have much function. If you look -- if you put an evolutionary test to it and ask how much of the genome is conserved over evolutionary time, it's about five percent. So there's an additional three point -- the exons are very conserved, so there's an additional 3.5 percent that's conserved. That means must have some function. We're not exactly sure what that function is. So there's still a lot to learn but at least we have a list of the parts at this point. Now once the reference sequence was done, roughly 2003, we said okay, we've got one human reference sequence but if we look around the room we can see no two people are alike. So what we really need to -- if we want to move forward, what we needed to do at that point was to understand something about the extent of genetic variation in our species. And so the genome -- the people involved in the genome project turned their attention to enumerating human genetic variation. We knew early on that one human to the next is pretty similar. The current number is around 99.5 or 99.6 percent identical, one person in the room to the next person in the room. And some people said wow, that's an extreme degree of similarity, but if you think about it from an evolutionary point of view, *** sapiens is a very young species; it started from a very small number of founders. And so this is about the evolutionary spread you would guess over that period of time and we're actually pretty close to our relatives. For example, in the coding sequence we're between 70 and 90 percent identical with a mouse and we're 98.5 percent identical with our closest living relative, the chimp. So on the other hand, .4 percent of three billion bases is actually a pretty big number, right? So there's a lot of chance for a difference, one person to the next. So the genome project turned to enumerating that difference and the first project was called the HapMap, which studied three populations of humans from around the world -- Northern Europeans, West Africans and Asians -- trying to find all the common variation. That was followed on by the current project which is called the 1000 Genome Project and actually the current goals of the 1000 Genome Project are to study about 2,500 individuals from about 50 populations around the world. And the idea is to catalog at least 90 percent of the variants that have a frequency somewhere in the world of at least one percent amongst human populations and in the coding sequence, that exome part of the genome, to catalog all variants that have a frequency of at least .1 percent. So in other words, when the 1000 Genome Project gets done, we can look forward to having a pretty good handle on all variation that's common in the genome across our species in various places in the world. There's tons of rare variation that won't be detected by the strategy so we'll continue to find the rare variation as we go forward. But at least we'll begin to have the common variation in our species. So what kind of variation is there? So there are several categories and I'm just going to briefly mention them and focus on two. First of all, there are small insertions and deletions. This would be like a few bases are inserted in one place in the genome. And very often where they're inserted is some part of the genome that's nonfunctional so it doesn't make any difference. Geneticists call these insertions or deletions indels and that makes up about 10 percent of the variation and sequence. There are some length polymorphisms; these tend to be sequences that are also short, maybe two nucleotides or three nucleotides repeated over and over again, typically in nonfunctional parts of the genome but not always. That makes up about five percent of variation. The variant that makes up a large chunk of the variant that I think you read about and heard -- have heard about are single nucleotide polymorphisms and I'll talk a little bit more about those. They make up about 40, 45 percent of the variation. And the other big variant, a kind of variation that we didn't really even anticipate in 2003 but we've learned about since then and we know that it counts for a lot of variation, are so called copy number variants and I'll show you what those are. They make up about another 40 or 45 percent of variations. So most of the variation is in these two categories, at least as far as no single nucleotide polymorphisms are SNPs and copy number variants are CMVs. Now there's also variants where pieces of the genome, a chunk of the genome was broken at both ends and flipped around. That's called an inversion and it can cause a problem if the break points are in protein coding genes. Those are hard to detect and we don't really know the extent of inversions as a contribution of variation yet. We know certainly of some inversions that make a difference but that's an area we need to learn a lot more about. And of course, at each generation, the chromosomes undergo recombination so the variants are reshuffled in terms of how they're distributed from one generation to the next. So there's a lot of genetic variation. Now let me just emphasize, make sure we're all on the same page in terms of understanding single nucleotide polymorphisms and copy number variants. So here's a typical single nucleotide polymorphism. Here's part of a sequence, GATCA, and at this particular place, this T, there's a second form of the gene, a different allele -- allele meaning a form of the gene -- it's exactly the same here and here but at one position it differs. And in this case it's a T in the one form and a G in the other form. So it's a single nucleotide variant or polymorphism. Polymorphism means it's relatively frequent. And these single nucleotide polymorphisms occur about one in every thousand base pairs in the genome. Some areas, they're a little bit more common; in some areas they're a little bit less common but that's enough to give you about three million or so variants per haploid genome per individual. So that's a lot of variants, to the extent that those variants occur -- when those variants occur in key functional regions of the genome. Moreover, these variants -- it's become very easy -- the technology's been developed to very easily and accurately measure at this position, let's say, whether the person has on one chromosome a T or a G and whether they have a T or a G on the other chromosome at that position. So that's called single nucleotide polymorphism, or SNP genotyping and we have chips that do that and the standard platform right now measures about a million SNPs across the genome. We have a big center over at Hopkins called the Center for Inherited Disease Research. And we do thousands of patients, this sort of genotyping, per day, measuring these variants. And so we use the SNPs as little tags to identify regions of the genome and how they've been transmitted down through the generations. So we'll come back to that in a minute. Now let me say a word about copy number variations. So here's two chromosome pairs and so think of this as perhaps the chromosome that you inherited from your mother and here's the corresponding chromosome that you inherited from your father. And in this region there's a little deletion in this chromosome so that this piece of DNA that's meant to be here from your mother's chromosome is not there in the father's chromosome. Now it turned out that cytogenetics, looking at chromosomes in the microscope, has a resolution down to about three million base pairs, three mega bases. That means a really good laboratory can see a change of a deletion or a duplication in a chromosome if that change is at least three mega bases or bigger. And standard molecular techniques, of course, were gauged to find changes of the sequence of a few base pairs, one or two or three base pairs. So if we had been smart enough a few years ago someone would have said, well, wait a minute? You're looking at the genome with two technologies, one that has a resolution down to about three mega bases and another that sort of the sweet spot of resolution is on the order of a single base or a few bases so you're not looking at a change that's in the size interval between those two technologies. And sure enough, it turns out that these copy number of variations, here's the different kind, a duplication in this region of the genome. So this chromosome is actually shorter by that amount that's duplicated. This chromosome is longer -- deleted -- and this chromosome is longer because that region is duplicated. So it turns out that there's a lot of copy number variation in our genome. That means that in certain regions, if there happened to be a gene here in this little piece of DNA that's deleted off of this chromosome, then this individual, instead of having two copies of this gene would have one on this chromosome and would not have any copy of that gene on this chromosome. So that means that for regions of the genome that are affected by copy member variation, we may have, instead of two copies of a gene, one copy. Or if it's a duplication we may have three copies instead of two copies. So that makes a lot of variation in the genome. It exposes genes that are sensitive to dosage. In other words, in some genes it's important that you have two function copies. Other genes, one is certainly adequate so it's relatively insensitive to dosage. We don't really know how many genes are dosage sensitive, but we think maybe a few percent of genes are dosage sensitive. For deletions, the other thing -- the other way that this can be important from a medical point of view is that if you have a normal variation in a gene on this chromosome, if there's no deletion over here, that normal variation may not be very important because you have two copies of a gene. But if you have a CNV over here that deletes a copy of the gene, then you have some variation on this chromosome that normally is not too significant, it becomes more significant if that's the only copy of that gene that you have. So for deletions that expose otherwise normal variation on the remaining allele, and you can have fusion of genes where you -- where the junction the repair -- the repair of the deletion occurs or the repair of the duplication occurs. So there's lots of ways in which copy number variation can perturb genetic function and not surprisingly, as we've appreciated this, we've found that this is a rich source for producing human disease. The bottom line of all this is there's a lot of variation in our genomes. In fact in 2007, Science magazine said that the breakthrough of the year was human genetic variation. And so we know that there are about 30 million single nucleotide polymorphisms in our species, about three million differences between each individual as compared to the reference sequence, and in terms of copy number variations there's three to seven large copy number variations per individual. About five to 10 percent of us have a copy number variation bigger than 100KB. The average gene is 30KB. And one to two percent of us in this room have a copy number variation bigger than a mega base; could affect 10 or 20 genes. So there's a lot of variations, both at the single nucleotide level and at the copy number variation level. So in fact, different members of our species are genetically different even though we only differ on the order of one base pair per thousand bases. Now -- so there's a lot of genetic variation. Now, the last thing I want to say in this category is the sort of advances in technology and I think many of you have heard about but I just use this single slide to emphasize the rapid change in DNA sequencing technology that's gone on since 2003 when we said we'd finished the genome project. So down here are years and this is 2000 over here, this is 2010 over here. Just pay attention to this red line which is the cost per million high quality base pairs of DNA sequence. So up here at the start of in 2000 it was about $10,000 per million base pairs. And you see the curve has come down so that in 2005 it was about $1,000; in 2006 it was about $100; in 2008 it was $10 and in 2010 it was 1 dollar. So the cost of DNA sequencing is coming down, down, down, down very rapidly. And not shown in the slide, but perhaps you can get from the rate of accumulation of sequence here, the ability, the throughput is actually going up and up and up. So the technology is advancing so that we can sequence DNA faster and faster and more and more accurately and cheaper and cheaper and cheaper. So DNA sequencing is becoming a very practical tool to enumerate the genetic variation that we just talked about to begin to understand the genetic differences between people. Consequently, we've begun to see in the literature and in other places, the availability of sequencing the DNA of a single individuals. So these little figures here show by the end of 2010, we had about 25 or 30 individuals whose whole genome sequence was available and it's estimated that at the end of 2011, so we're one month away, there will probably be on the order of 30,000 whole genome sequences of a particular individuals available in various databases. So DNA sequencing is really making a huge impact in enumerating genetic variation. What we have to learn is how to interpret all that. So that's all I'm going to say about the genome project, genetic sequence variation and technology. And let's turn to one -- make one point on what I've called whole genome sequence biology. So it's interesting that remember I said at the start of the genome project, there was an argument about whether or not it would be useful and would it stimulate research and would we learn anything from it. And now, some 20 years of so later from when those arguments were going on, any biologist who's studying any species wants to have a whole genome sequence of their favorite organism. So it's a complete flip in the mindset and it's hard to keep track of. I sort of use this tree of life to keep track of it. We have whole genome sequences from eukaryotes, animals like ourselves, from bacteria, procaryotes, and from members of archaea, which is the third kingdom of life, which we only recently found out about. And it's really pretty hard to be sure but I think that we have certainly more than 2,500 organisms whose whole genome sequence has been obtained and deposited in various databases. So we've gone from arguing, is it useful? To now, everybody's got to have it and use it for their favorite biology. And it's turned out to be a very -- a potent stimulus for understanding biology. And the pace continues. The other thing that's important to realize is that the sequence that's used, the protein coding language, really holds true across all biology. So once you have the sequence of a eukaryote, you can use that sequence information to go look for the corresponding genes in organisms that are evolutionarily as far removed as bacteria. So the DNA sequence provides a language of biology that allows us to look at what particular genes do across all biology. And so we gain a huge amount of information by having that language, that universal language across all biology. Okay. Now that's all I'm going to say about whole genome biology. Let's talk just for a minute about evolutionary thinking in medicine. Now when I went to medical school, evolution was not mentioned. I think the whole four years I was in medical school, I doubt that the word evolution was ever uttered. And if you asked me, and I was very interested in biology, about evolution, I would immediately start thinking about dinosaurs and fossils and things that were pretty far removed from medicine. But, as Dobzhansky [spelled phonetically] said, "Nothing in biology makes sense except in the light of evolution." And I think now nothing in medicine will make sense except in the light of evolution. We are part of the biological kingdom. We result the end products of evolutionary biology. So, what do I mean by this? Well, if you start looking at how evolution works and then think about what it means for medicine, for evolution, a central theme is variation and I've just shown you that we're now focusing on human genetic variations, so centrality of variation in terms of how things change over time. The continuity and consequences of natural selection, that is to say, natural selection is going on all the time. We all partaked of the little -- all partook of a little natural selection when we went outside and we had that very great breakfast that was served up to us in terms of the caloric increase, the kinds of nutrients that we exposed ourselves to and so forth. So natural selection is going on all the time. Biological systems have developed mechanisms by which they respond to the environmental changes to evolve. And that's turned our focus to systems biology, putting organisms back together instead of using reduction, actually using an integrative approach to understand biology as shown here. And, an emphasis on individuality because if you look at how selection works in whatever species you're thinking about, the selection actually occurs on individual members of a species. So that is what goes on in our species as well and that's selection, which in other species we applaud because it serves to make wonderful biological characteristics in different species. In our species, the people who are on the short end of the stick for natural selection are the patients that come to see you with problem -- medical problems. So it's natural selection that's going on. In our species we care about those individuals; in other species we don't worry about that. If you're interested in that, there's a review of this in PNAS in 2010 about evolutionary biology in medicine. So from the point of view of what we've learned from the genome sequence in evolutionary biology, it's really been very interesting because we can look and see how we compare with our closest living relatives, the chimps. Remember I said we were 98.5 identical so it's interesting to know how we differ, what makes us different from the chimps. We can even now sequence our closest relative ever which was Neanderthal. So now the genome of Neanderthal has been sequenced last year by Svante Paabo and his colleagues. And you can ask okay, what are the differences, the major differences between us and Neanderthal? And if you enumerate them and lump them together, it turns out there are a bunch of genes that show sequence variation between us and Neanderthal that are involved in energy metabolism. There's another bunch of genes that are expressed in the nervous system and are thought to be important in cognition. There's another bunch of genes, one that I'm particularly interested in that is -- that are involved in neural development. And then the last category that's more -- that's particularly variable are in micro RNAs, these new RNA molecules that are important in regulation gene expression. So you can begin to see -- get the idea of what is it that has changed over evolutionary time to allow *** sapiens different properties and different characteristics as compared to Neanderthal, our last -- our last relative. So that's evolutionary thinking in medicine. Let me just say a word now about disease gene identification. So, disease gene identification, if you look at when disease genes were first identified, roughly 1900, the time between 1900 and 1910. There was some knowledge of color blindness before then, but we began to think about genes causing specific human phenotypes in that first decade of the 20th century. But progress was very slow and I plotted it here. This is a modification of a plot that originally was published by Joe Nadeau. Just look at this pink curve, this scale over here. And what I've plotted is the identification of genes that are responsible for rare Mendelian conditions. So these are things like PKU, cystic fibrosis, Marfan syndrome, LDL receptor defects, all of those strong phenotypes that are inherited as Mendelian traits just as exactly the way that Mendel showed in pea plants. And so you can see the number has gone up pretty dramatically and currently there are about 2,600 genes in the human genome that have been shown to have variations that account for particular human diseases. We'll come back to the common complex traits later but that's on a different scale. You see here sort of way behind on common complex traits. This is -- focus for a minute on the Mendelian disorders in this category. And you can actually look at the progress and there's an online resource called on Online Mendelian Inheritance in Man. This was started by Victor McKusick at Hopkins and currently it's maintained by Dr. Ada Hamosh and her team at Hopkins and accurately lists -- here I say 2,500 but today's count is around 2,650. And there's another online resource called Gene Test that measures the number of these genes that can actually be measured to make diagnoses or sequenced to make diagnosis and it's about 2,000 now. This plot, which was -- came from an article by Art Bodet [spelled phonetically]. I can't read it, I guess, here but this looks at the number of genetic tests going from 1990 up to the year 2000 and you can see this tremendous increase so that here we had less than 50 genetic tests, now have about 2,000 genetic tests. This is causing a radical change, particularly -- at least at Hopkins for the way pathology deals with this. So a patient is seen and the doctor wants to send off a test for some very rare disorder and the pathology department has to find a laboratory that does the test and make sure that they're certified and so forth. So one of the things we're wrestling with is how to modernize the way we handle requests for genetic tests and how we interpret those results. Molecular cytogenetics, the copy number of variations is moving along. So we're making a lot of progress in this whole effort. Now, if you want to find out -- the one thing I want to point out though, that although this number is big, it's only about 15 percent of the total number of genes. So we have a lot of work left to do. There's no reason to think that the other 85 percent of genes won't also have Mendelian phenotypes associated with them. If you're interested in keeping track of this, I urge you to go to this catalog, Online Mendelian Inheritance in Man. I already mentioned it's very user friendly. You go to www.OMIM.org and it has a search box here on the first page and you can punch in that search box either a disease name or a clinical symptom. So here I've written in Marfan syndrome and I get a list of entries and those entries include fibrillin, that's the gene that's responsible for Marfan syndrome. Here's the clinical phenotype, Marfan syndrome. So if we're in the clinic we see a patient we think might have Marfan syndrome, we want to look up the clinical features, we put in Marfan, we click on this and we get it and we -- I'll show you what you go to. If you want to learn something about the molecular biology, we put in Marfan syndrome, it comes up with a gene, click on this and we go to the gene and so forth. These are other symptoms -- other syndromes that are -- have similar overlapping features that might be considered in the differential diagnosis. That's a -- if you put in Marfan. You could also put in, not the name of the syndrome, but just some clinical features. So here I put in tall stature and dislocated ocular lenses and you'll see I get pretty much the same thing. The first thing that turns up is fibrillal and that's the gene responsible for Marfan syndrome center. The third entry is Marfan syndrome. The second thing that comes up in the list is homocystinuria which is a phenotype that is very similar to Marfan syndrome. So you can use this as a tool for trying to figure out what your patient has, based on the clinical findings that you observe. If you actually go to the entry--so here I've gone to the entry for Marfan syndrome. This is a long, long entry. I've just shown you the top of it but it describes the history, the clinical features and so forth. You have this table of contents over here so if you're just interested in how to make the diagnosis, you pull that down, click on that and it will tell you how to make the molecular diagnosis, where to send the test and so forth. Or if you want to know are there animal models and what do we learn from that, you just click on that. So it's a very useful tool. It's free and very easy to use. Just go to OMIM.org and try it out. Now, what about identifying the genes, not for these rare Mendelian disorders but identifying the genetic variants that contribute risk for common, complex traits like diabetes that we already mentioned or coronary artery disease or neuropsychiatric disease? And there, notice that the scale is different and progress has been very slow, although recently it has spiked up tremendously. So we now have variants that we think are responsible or contribute risk for at least 200 of these phenotypes. By and large, these variants are not causative. They're actually susceptibility variants and either increase or decrease one's risk for a particular phenotype. And the method that's used for this is -- I'm sure you've heard about is genome-wide association studies or GWAS studies. And these studies are agnostic approaches that identify SNP markers enriched as cases as compared to controls. So typically, if you want to do this, you have a large group of cases and a large group of controls. And you do that SNP genotyping that I mentioned earlier across the whole genome and you look for particular SNP genotypes that are enriched in your cases as compared to your controls. And it's agnostic in the sense that it makes no assumptions about what the genes are or what the variants are that are responsible. The only assumption it makes is that somewhere in the genome there is a variant that contributes risk for it. So it finds stuff that's not looking under the light post but looking across the whole genome without any sort of preconceived notions. It's a very powerful aspect of this that's been very informative to us. So you find markers, SNP markers and then you look around those markers and you try to find the causative variants that are actually responsible for the change in susceptibility. And once you find those causative variants, that gives you a particular gene and tells you something about the biological perturbation of that gene function that increases the risk or decreases the risk for a particular trait. And it also gives you, to the extent that you know what the biological system that gene product works in, it identifies a biological system that is perturbed that gives you -- changes your risk for a particular disease of interest. So this diagnostic approach has proved very powerful in terms of illuminating biological systems that are responsible for certain phenotypes that we had no prior knowledge that they played a role in that. In the case of type 2 diabetes, years ago, we all thought that that was insulin resistance that was a peripheral problem that the peripheral tissues were resistant to insulin. There was some degree of that it turns out, most of variants that contribute risk for type 2 diabetes are in insulin production, not in insulin resistance. So -- and then of course understanding the pathophysiology gives us a better way of treating -- of dealing with the patients. This is sort of a diagram of how this might happen. This is from a paper by Teri Manolio at the Genome Institute, and it is in this series -- and I'll come to this at the end -- in the New England Journal that Greg is one of the editors for -- Greg Feero is one of the editors for and it really is a wonderful collection of papers, a sort of state of the art genomic technologies. This diagram -- I don't know how well you can read it -- but it shows a region of the genome and maybe the distance between these two single nucleotide polymorphisms is one KB. And it shows it in three individuals so you get the genotype of this SNP and the genotype of this SNP in these three individuals plus a whole bunch of individuals and you look at the frequency of those genotypes in your cases and compare them to controls. And let's say in the cases the particular variant is more common and so you see more heterozygotes and more homozygotes for that variant. You might want to do a replication study, a different population, to make sure it's not something to population stratification. And the end result, then, you plot out all of those variants and you ask, are there any variants that statistically are associated with a statistic -- at a statistically significant level with a particular disease phenotype. This has come to be called the Manhattan plot because looks like the skylight -- the skyline of New York. Each chromo -- all the variants of each chromosome are color coded and see they are all more or less clumped around the bottom here except the one region on chromosome nine there's a bunch of variants whose P value is exceedingly low, that is, here P less than 10 to the minus eight. And so they're statistically significant, even with all of the tests that one has done. So that says, in this region of the genome defined by these two markers, there are some variant or some set of variants that contributes risk for this particular disease. So now we go look at that region very carefully, identify the cause of the variants and move forward in our understanding of the biology of the disease. If you're interested in this, NHGR [spelled phonetically] maintains a great website and here's the whole genome and all of the variants that have achieve statistical significance for all of the phenotypes. This was up to date as of March 2011. I think there is a newer version online now. The interesting thing is that many of these variants, as I've already indicated, tag genes or biological systems that we did not previously know were important for particular disease phenotypes. The other thing that we've noticed is that most of these variants for the common complex traits, are not in protein coding space, not in those exons that we talked about, which are usually hit for the Mendelian disorders, but are in fact in regions of the genome that seem to regulate gene expression. So you remember the little diagram of a gene I showed you, there are upstream sequences in the promoter or more remotely related to the gene called enhancers. And I think that most of the variants that are involved in this actually perturb gene regulations so they're in the non-coding regulatory regions of the genome. That's important because we don't know -- we're more -- our state of the knowledge is weaker in terms of understanding regulatory variants as opposed to protein coding variants. So it's an important area of research going on right now. And in aggregate, if you look at a lot of disease phenotypes, we haven't found all of the variation yet, so much of the heritability, that is, the genetic variation that contributes to a phenotype, remains to be explained. That has come to be called the dark matter. Variants so far identified for particular complex traits may vary from as high as 60 percent, that's probably where we are for age related macular degeneration, to less than five percent, that's probably where we are for type 2 diabetes. So for some disorders, we've only explained a small fraction of the genetic variation. Other disorders, the methods so far have allowed us to explain quite a substantial fraction of the genetic variation. So, this comes to, then, if we've found variants but they only explain a small fraction of the risk, people have said, well, what have you learned? Well, one thing you've learned is, you've identify biological systems and those systems become important to study to understand the mechanisms of the disease in a more complete way. The other thing is that the risks that we calculate by present methods, I think, are underestimates for a variety of reasons. So the common sort of critique of this is that the risk allele that this single nucleotide polymorphism confers a risk to individuals that is only 1.2 times greater, let's say, for example. So it's hard to change medical management. It's hard to get a person to change their lifestyle based on a tiny change in risk. So we have to do better than that. Now, remember that these risks are calculated in populations. There are not calculated in individuals. So for a given individual, a particular variant may be much riskier or, in fact, it may not be risky at all. We're just talking about the risk across populations so we have to learn how to individualize these risks and we have to recognize that the way these variants work are in complex biological systems. So this shows a complex biological system, each dot representing a protein product and the interactions between these protein products represented by the lines connecting the dots. So the systems are complicated and involve many components. So what we really need is not to look at a particular variant but we need to learn how to look at sets of variants characteristic of a particular individual, and also integrate that with the environmental exposures of that individual to really calculate an individualized risk. And we just are not able to do that yet, although I must say people are making considerable progress in developing new analytical methods, a more biologically based, I would say, set of analytical methods to really calculate accurate individualized risks. And in fact, one strategy that has very recently been applied and is turning out to be much more -- identifying much greater risks, actually, is looking not at the clinical phenotype but looking at biochemical markers. So this has come to be called metabolomics and this is a study that was just published, looked at about 3,000 individuals, used the whole genome SNP genotyping that we talked about, measured about 250 metabolites very precisely in these individuals and found 25 loci with effect sizes anywhere from 10 to 60 percent of explaining the biological variation for those small molecules. So it suggests that if we sharpen up the phenotype that we're looking at, in this case measuring a biological marker, the risks would be much more predictive and much more significant. So this is just the top of that list of all the variants and you can see P values here on the order of 10 to the minus 250. So these are highly statistically significant variants that influence the level of these metabolites. The metabolites, in turn, are involved in a variety of complex traits. So we're making a good bit of progress in this area. I'm not going to say anything more about identifying variants for complex traits. I just want to say a word about individual genome sequencing and then how -- and what this means currently for the practice of medicine. So individual genomes sequences, we've already mentioned that. The first one that was published was Craig Venter's, and I think that's because part of his genome sequence was what his company sequenced in the race to get a whole genome. So, we had the reference sequence but of course that was an anonymous person. So what we really want to know is what is the sequence of my patient? So that's why I think individualizing -- obtaining the sequence of individuals really is a really a change in the way we look at patients. So for example in Venter's, he had 4.1 million variants as compared to the reference sequence. That included 3.2 million SNPs and about 300,000 copy number variants. There were 90 inversions and the total number of space covered by the variants was about 123 mega bases. That's a huge chunk -- or about 12.3 mega bases -- a huge chunk of the genome. So to put it in personal terms, you could look at this individual and you could look at his lactase genotype and ask, is he someone who can tolerate ice cream or not? You could look at his dopa DR4 [spelled phonetically] receptor, that's associated with risk taking behavior, and you probably could have made a guess about Craig Venter's risk taking behavior genotype before doing it, but you could actually make that measurement. Or you could look at his ApoE genotype and understand his risk for whether or not he has an increased risk for Alzheimer's disease. So it's a completely different level of information about individual patients. So I think this will have -- all of these things, all of these changes and advancements will have profound effects for medicine. This is a picture of a Dr. Shirani who was rounding [spelled phonetically] in Kansas in the middle of the 20th century. The picture was taken by Eugene Smith. This is sort of the idea that I had, what I would be doing when I decided to go into medicine. And of course it's far from -- far from what we do. So, we sort of have summed it up in terms of developing what might be called the science of the individual, how we're going to use this information to understand our individual patients. So what have we learned about the science of the individual currently? So first of all, it exposes the pitfalls of technological thinking. In other words, remember those kids I showed you where you say okay, this is an example of a certain disease. Rather, we think this is a patient who has features of a particular disease and we understand that no two patients have exactly the same manifestations of that disease and no two patients will have the same responses to our attempts to treat them. So it confirms what is in the past has been called the physiologic view of disease. Each individual has their own disease. It emphasizes the importance of asking, why does this particular patient have this particular problem at this particular time? So it turns the focus more on trying to understand why people get sick and what can we learn about from that exercise in terms of managing the patient as we go forward in terms of the best treatment for this particular patient? However, moving it into medicine and making it practical and bringing it to the clinic and to your offices is a challenge. And you all understand that. It's interesting to look at a paper that came out recently that attempted to do this. This is a scientist called Steve Quake. He had a relative who dropped dead of sudden cardiac death in his 20s. Here's Quake over here. So he went to his cardiologist out at Stanford and he said, look, I have this relative who just dropped dead in their early -- in their 20s and I want to know if I'm at risk for that. So that's a reasonable question to ask. So they got a big pedigree and they went ahead and sequenced his genome and then they tried to use that information to give him a more informed understanding of his risks, not only for sudden cardiac death but for other common medical problems. And it turned out that was a really daunting exercise. It took all of these people. Here's Quake, he got to be an author on his own sequence. Here's the person who led the study, Russ Altman, who's a cardiologist. There is one medical geneticist and one genetic counselor. It took the genetic counselor five and a half or six hours to sit down with Steven Quake who is a very accomplished molecular biologist, and explain to him all the variation that was found in his genome. So you can imagine doing that exercise to less sophisticated individuals. And in the end, at the current state, most of the information we could give him was changing his risk for certain things in modest ways. So it did not really overnight change how Quake would be managed and certainly did not change much beyond what we would have done from having his pedigree. On the other hand, we're learning stuff about how to use this information as we go forward virtually every day. So I think going forward, we will increasingly learn how to use this information in a much more effective way. And I would support that argument with this -- with these examples. First of all, to do this it will require rigorous research of the kind the Genome Institute and Hopkins is doing both at the basic level, at the translational level, and at the clinical level. New technology continues to accelerate the pace and it's not going to happen overnight. It happens gradually and let me give you these three examples. First of all, acute lymphoblastic leukemia. When I was a house officer in the late '60s and early '70s, acute lymphoblastic leukemia was the most common form of childhood leukemia and had a 95 percent mortality rate -- 95 percent mortality. Nowadays, acute lymphoblastic leukemia remains the most common childhood leukemia. It has a 95 percent survival rate, 95 percent survival. So it went from 95 percent mortality to 95 percent survival. So what accounts for that change? So, actually if you look at it, the medicines that are currently being used are very similar, if not identical, to the medicines that we used all those years ago. So it's not the kinds of medicines that are being used. What it is, I would argue, is that oncologists have learned that this diagnosis, acute lymphoblastic leukemia, is actually a heterogeneous group of disorders. And they've learned how to use gene expression profiling, age at onset, DNA sequence variation and other tools to subdivide the patients. In other words, move from one collective diagnosis to subcategories of diagnosis, moving towards individualizing the diagnosis to individual patients and then manipulating their treatment according to which subdivision the patient falls in. And that approach, a more informed approach in terms of differences between individual patients with the same diagnosis, has had a dramatic effect on the consequences of having ALL. The same is true, but to a lesser effect, for sickle cell disease. You all know that there are patients with sickle cell disease who are very sick from infancy forward. And there are other patients that just have an occasional crisis, maybe once a year or once very few years. So there's tremendous variation among individuals with sickle cell disease and recall that they all have exactly the same genetic defect at the disease gene locus. They all have exactly the same mutation in betaglobin. So what makes the difference between one patient with sickle cell disease and the next? So increasingly, we're finding modifying genes that modify the phenotype of sickle cell disease and we can define a subgroup of sicklers that are much common much more likely to develop, let's say, certain a very disastrous side -- complications of sickle cell disease such as stroke and so forth, and we can manage that subset of patients with sickle cell disease much more aggressively when they're at risk for developing a stroke, let's say. So we're individualizing therapy for sickle cell disease and that's having better outcomes. Recently, the genome project, genome scientists are sequencing tumors so there's a lot going on now about sequencing individual cancers and the people who have the cancers. And one of the interesting things that's come out, first identified by Burt Vogelstein at Hopkins, looking at glioblastoma multiforme, the most serious brain cancer, and it turns out that a small fraction of glioblastoma multiformes had a mutation in isocitrate dehydrogenase. That's a -- genome codes an enzyme in the citric acid cycle. But it turned out that you could stratify the patients but in terms of whether or not their tumor had an IDH1 mutation. And if you did that, it turns out that the patients with the IDH1 mutations in their tumors behave differently than the patients that don't have those mutations. So we're moving again toward stratifying a different -- a diagnosis, moving to individualize the diagnoses and adjusting our treatment and our thinking about the patients accordingly. So this is going on over and over again and it will go on rapidly in some areas and more slowly in other areas and eventually we will lead to a sort of a very individualized approach. Here's an example that I find sort of clever. This came from deCODE genetics in Iceland. And they said, if you look at baseline PSA levels, there's actually evidence that the genetic makeup plays a big effect on your PSA level. So currently, as you know, we use this standard cut point for PSA of four. But if you look at normal individuals, four is actually -- their PSA is actually a good bit below four. And then other normal individuals have a PSA above four, so this four is sort of an average cut point. So they argued that, let's say you measured genetic variation at six loci, they recommended, and then you adjusted the cut point for the individual based on their genetic makeup so that four would actually be too high for some individuals. And for other individuals it's acceptable. So you individualize the risk that you determine with PSA level and that gives you a more informed way to deal with the patients. Now time is short. I'm not going to say anything about pharmacogenetics, except to say that it is a classic gene by environment interaction. The environment variable in this case, though, is very well defined. You know the drug, you know the dose, you know when the patient started at it. And not surprisingly, there's a lot of genetic variation that influences response to drugs. So that's an area that's going to go forward very quickly. And it already has numerous positive effects. Time's short and I won't talk about it but variants that influence your response to statins or your response to treatment for hepatitis C and so forth. And these variants tend to be variants of quite large effect, so that's an area where the variation really has turned out to be very important for the phenotype. The end result of all of this, I would argue, will get us to this point. So this is a picture, a painting by Sir Luke Fildes, of the doctor looking at his patient and this is what we would like to do. We would like to understand our patient. We'd like to look at the patient and not only use our history and our physical exam but knowledge of the genetic makeup and the patient's environmental histories to really understand the patient at a level that is far better than what we currently can understand the patient. So I think over the next few years, you'll see tremendous progress in this approach so that we can think of our patients, not as representatives of a particular disease, but as individuals who have a particular set of problems. So with that I'll close. Thanks for your attention. [applause] Let me give a plug to this set of articles which you can find in the New England Journal, Genomic Medicine: An Updated Primer. Greg is one of the editors. The one that came out this week is called "Genomics and Cardiovascular Disease," quite good. And I should also acknowledge my colleagues and a heavy dose [spelled phonetically] of Barton Childs shown here, now deceased, who spend his whole life really thinking about how we could incorporate genetic knowledge into making management of our patients more effective and more individualized. Thank you. [applause] Male Speaker: So I realize that people probably have to get off [unintelligible]. We probably have time for a few questions. Male Speaker: Have they ever sequenced embryonic stem cells? Does that completely represent a fully developed or is that sequence early enough that you can modify it at that early stage? David Valle: So the question is, have people sequenced embryonic stem cells and what's different about that sequence as compared to -- Male Speaker: [inaudible] David Valle: Yeah, and can you manipulate it? So that touches on a whole area which I did not say a word about, which epigenomics. So if you look at the sequence of an embryonic stem cell, let's say from a particular individual, and you could develop that cell line and then follow the individual over their lifetime, the sequence would remain the same, right? We're born with the sequence that was put together at the time of the *** and the egg that made us form a fertilized egg. But what is different, if you look at an embryonic stem cell versus cells in the adult, is sort of what's called the epigenomic imprint. So this is patterns of regulation of genes. So it's easy to -- the way I think of it is, if you look at, let's say, the liver in an adult, when you have a liver cell and that liver cell divides, you get two liver cells. If you look at a -- let's say a muscle cell and that muscle cell divides, you get two muscle cells. And yet the genetic material in those two cells, the liver cell and the muscle cell, is the same. So what's different about those cells? And the reason one cell is a liver cell and one cell was a muscle cell, is that there are these programs of regulation of gene expression that are sort of turned on and turned off and so on the liver you turn on a program that's necessary for making liver cells. You turn off everything else. In the muscle, you turn on a program that's necessary for muscle cells and turn off everything else. That -- those patterns or programs of regulation of genome expression are called epigenetics. And so what we would see a stem -- in an embryonic stem cell is a much more non committed epigenetic set of regulations and as the cell was differentiated into different cell types, the epigenetic patterning of the gene -- regulation of gene expression would become established to make the daughter cells that derive from that embryonic stem cell develop -- move them down the developmental pathway to the various pluripotent outcomes that we would expect. When you go to the next generation, all of that has to be erased because you start, not with a collection of liver cells, muscle cells and brain cells, you start with a single cell that then has to be pluripotent to become all other cells. Male Speaker: In type 2 diabetes you mentioned that the [unintelligible] regarding the incident [unintelligible] said there's no problem with the [unintelligible] the incident itself. So there's no difference between type 1 and type 2? David Valle: No, I didn't say there was no problem, that you make a good point. What I meant to say, and maybe I misspoke myself, what I meant to say is there certainly is an element of insulin resistance but it turns out that equally important, if not more important in type 2 diabetes are various aspects of insulin production. So type 2 diabetes is different from type 1 diabetes which is more of a more pure of -- you know, drop out of the beta cell, basically. Male Speaker: [unintelligible] insufficiency and the [inaudible]. David Valle: Correct. Male Speaker: Thanks. Male Speaker: [unintelligible] we're not showing cross-sequencing as really remarkable [spelled phonetically]. Are there any limits [unintelligible]? David Valle: [laughs] Yes. Male Speaker: It seems almost impossible. David Valle: Yeah, so the question is, where's the limit of this curve that has to do with the cost and throughput of DNA sequencing? And I don't know. I'm pretty sure we haven't reached -- we're not even close to the limit. So you know that some years ago, Francis set the audacious goal of a $1,000 genome. And certainly, we can do a whole genome -- you can order a whole genome on a patient, let's say, at Hopkins, for about $4,000 right now. So that's pretty darn close to the $1,000 dollar genome. You can do a whole exome, that is just look at the exons, for about $1,000. Now, however, that gives you sort of a preliminary set of analysis of that sequence. It does not give you a sophisticated analysis of that sequence. And currently, in fact there was an article in The New York Times yesterday pointing out that the really expensive part of genome sequencing, particularly as what we're interested in, what does it mean for patients, is in the analysis. And that is coming along at a slower pace and so if you want -- if you have to factor in how much does it cost to pay the people to do the analysis and so forth, it's more expensive. Now -- but there are new technologies available, compared to the way we -- the current -- the next generation. There's already a next, next generation that's clearly coming down the pike. And that will clearly lower the cost and increase the throughput more. So, I think the thousand genome will easily be surpassed in the near future. And what I tell patients and medical students is, of course, if you come to Johns Hopkins -- I don't know how it is here at Suburban -- if you have some complicated problem you come to Johns Hopkins at 9:00 in the morning, you go home in the afternoon, you're going to blow $1,000 dollars very fast, so. It's in the range of everything -- you may not be able to even get out of the parking lot for that. [laughter] Male Speaker: [inaudible] David Valle: Yes. Male Speaker: [inaudible] point of interest, how close was the Neanderthal genome to *** sapiens? David Valle: So the question is -- Male Speaker: [inaudible] David Valle: So the question is, how close was the Neanderthal genome to *** sapiens and would they be interfertile? So, it's about 99 -- first of all, the sequence quality of the Neanderthal is nowhere near the sequence quality we have for home sapiens but the best guess, I think, is it's about 99 point -- 99 percent -- a little bit better than 99 percent identical. And people were very interested to know if *** sapiens -- for some reason, I don't actually know why we're so interested to know -- but people are interested to know whether there was any interbreeding between *** sapiens and Neanderthal. And the genetic evidence we have right now suggests yes, there was interbreeding between Neanderthal and *** sapiens. And, you know, we cohabitated and it seems to me that -- pretty likely. That's what I would have bet before we had the genetic evidence, human nature being what it is. [applause] [music playing]