Tip:
Highlight text to annotate it
X
David Valle: Thank you, Greg and Bob, and it's a pleasure
to be here. It's -- I've had the pleasure of a long collaboration with people at NHGRI.
We recently actually have instituted a combined fellowship program in genetics with the intramural
program at NHGRI and of course now Suburban and Johns Hopkins have joined together so
we have lots to celebrate and to look forward to in terms of continued interactions. And
so I'm pleased to be here to have a chance to talk to you about my favorite subject.
I'm also pleased to be the leadoff speaker in this series, which Greg and Bob have put
together, which looks quite good. And so I hope to sort of set the stage for many of
the people that come later. Please interrupt me if I'm not making myself clear or if you
want clarification on some point or if I'm talking about stuff that's old hat, tell me
to move ahead and not spend time on it. So I'd love to make this as informal and as interactive
as possible.
So what I would like to talk about is the human genome and what I call individualized
medicine. And so we could start off by saying, what is individualized medicine? I'm going
to put this upright. And so I turn to Francis for guidance here, Francis Collins, and Francis
said back in 2005 that at its most basic, personalized medicine refers to using information
about a person's genetic makeup to tailor strategies for the detection, treatment or
prevention of disease. And I think that sums it up quite well. The only place that I disagree
with Francis is that I much prefer that term individualized as opposed to personalized
and I'm going to take the prerogative here, since I have the podium, to tell you why.
So personal, if you look in the dictionary, has two meanings, really. It has relating
to someone's private life, a sense of intimacy. And the other is relating to one person or
particular individual. Now medicine has always been personal. As physicians, we are allowed
to ask people questions that no one else asks them. We are allowed to examine them in ways
that no one else examines them. So medicine has always been personal, in my view. But
it has not been individualized. And so that's what we're looking at, a way to consider each
person as an individual and to adjust out thinking about the patient to account for
their individual strengths and weaknesses. So for that reason and for others which I
won't go into, I urge you to consider thinking of this as individualized medicine.
Now, why would we consider the topic of the individualized medicine now? Why is it so
much in the press and in the news and so forth? And that's particularly worth asking because
modern medicine really has had enormous successes. So we've had a dramatic prolongation of the
lifestyle, of the lifespan, and a dramatic improved quality of life. So medicine has
been doing a good job.
On the other hand, there are ongoing concerns. Many diseases have an increasing incidence.
There's an unacceptable frequency of adverse events, adverse therapeutic events. We all
hear daily about the cost of medicine continuing to go up. And if you go talk to patients and
all of you do all the time, they usually say they want two things from their physician.
They want a physician who's smart, has good knowledge, but they also want a physician
who cares about them as an individual, as a person. So I think we have an opportunity
to move medicine from a very successful level to a new plateau and I think the way we will
do that is by individualizing medical care.
So to put it in a different way, I like to think about a particular disease and the one
I would mention is type 2 diabetes. As you know, its incidence is increasing throughout
the industrialized world. And it's intertwined with the increasing incidence of obesity and
it's a chronic illness with an array of complications, microvascular complications and macrovascular
complications. But suppose a member of your family or a friend of yours, a close friend
of yours has type 2 diabetes, would you like to know the prognosis and response to existing
treatments for the average patient with type 2 diabetes? Or would you like to know as precisely
as possible, the specific features, prognosis and response to therapy for your loved one
or your close acquaintance? So, or even better, could we imagine knowing ahead of time who
is at high risk for type 2 diabetes action prevent the illness from ever occurring?
So that's the goal here, is to try to identify the individual strengths and weaknesses of
our patients and as much as possible prevent them from ever getting sick. But if they do
get sick, then to individualize our counseling and our treatment and optimize it as quickly
as possible.
So in that regard, I sort of think there are two characteristics of modern medicine, current
medicine. The first is, in medical school -- and currently I think we have been trained
to perform what I call an average medicine. Now by average, I don't mean in the pejorative
sense that it's mediocre, I mean that we think about, when we make a diagnosis we think about
what is appropriate for the average patient with that diagnosis. And part of that comes
from the way we're trained and I call that aspect of our training the classic case mentality.
So this is -- the little boy on the left is a patient that I saw and his sister and they
both have a genetic syndrome that's characterized by some abnormal physical findings.
And in the days gone by what would happen is the family would come to the clinic, we
would take a family history and history of the present illness, a physical exam, some
X-rays and routine laboratory tests, and then whoever had seen the patient would get together
and say well, I think it's a case of this or I think it's a case of that. And usually
what would happen, at least at Hopkins, would be the person with the most grey hair or the
least hair would finally make a pronouncement, that I think this is a case of whatever. And
then our thinking would become constrained. We would start thinking of the patient as
an example of a particular disease, rather than thinking of the patient as an individual
who happens to have this set of problems. And at the time, in terms of the tools that
we had available, that was the way we had to practice medicine.
Now the other aspect of medical practice in the 20th century, is what I call trial and
error medicine. That is to say that, as you all know, what we do is we see a patient,
we make a diagnosis, we think about what kind of interventions we would like to make, we
make some baseline measurements, we make the intervention then we follow the patient and
repeat the baseline measurements and ask where that intervention, has the patient -- is the
patient during better? Is it staying the same, or is it in fact worse? If they're saying
the same or worse then perhaps we'll change that intervention and alter it and do something
else. So this is sort of a trial and error kind of medicine. So the goal in the future
would be to be able to predict with a fair degree of accuracy what would be the best
treatment for this patient without having to go through this trial and error sort of
set of protocols.
So, thinking about the patient and -- as an individual and so I think the more experience
physicians have, they come to learn, despite the fact that in medical school, classically,
you are sort of taught, like, this is what happens with a case of this, this is what
happens with a case of that. As you get more and more experience, you begin to realize
that no two patients, even those patients with the same diagnoses, behave exactly alike
in terms of their complications, their response to treatment and so forth. So that's a lesson
that we tend to learn by experience in the trenches. And this point was actually emphasized
by Oswei Temkin who was a professor of the history of medicine at Hopkins, now deceased,
but he said that there is no science of the individual and medicine suffers from a fundamental
contradiction. Its practice deals with the individual, in other words, the person that
comes to see us is an individual. While its theory, what we learned in medical school,
grasps universals only. So you were sort of left, in days gone by, to individualize your
approaches and your thinking after you get out of medical school.
This idea is not new, actually. A colleague of mine pointed this out to me that back in
350 B.C., none other than Aristotle said, the doctor does not treat man except accidentally.
He treats Cailus or Socrates or somebody else. So if someone knows the universal without
knowing the individuals contained in it, he will often fail in his treatment, for it is
the individual who has to be treated. So it's not a new idea.
So I keep reminding our students and what I think we have to think about is that when
we see a large number of patients -- I'm a pediatrician, so this is where I start seeing
my patients -- we just have to remind ourselves that each of these individuals has his or
her own unique sampling of our species' genetic endowment. Each has a unique history of in-utero
development and each is born into a family with a unique constellation of socioeconomic
variables. So all of those factors, the genetic makeup, the early development and the social
cultural milieu for each patient individualizes them and has an influence on what diseases
they are at risk for and how they will respond to our attempts to treat -- or prevent or
treat those diseases.
So this is all well and good, but you could ask, what has changed? What makes it possible
to contemplate moving medicine from its current successful level to an even more successful
level as we go forward? So I would submit -- and I'm a geneticist so I would submit
that the major driver for this is -- has been the Human Genome Project and what we've learned
about the genetic makeup of members of our species. So, the Human Genome Project sequencing
technology and the appreciation of sequence variation, what's come to be called whole
genome sequence biology, an increasing prominence of evolutionary thinking in medicine, a progress
in disease gene identification and sort of what has been, in my view, a watershed, but
the ability to obtain individual genomes sequences on individual patients. So I'm going to talk
about each of these bullet points briefly.
So first of all, let's turn to the Human Genome Project and sequencing technology in genetic
variations. So you all know that the genome project really was contemplated in the mid-'80s
and there was a good bit of argument initially about whether or not it was a good idea and
a useful way to spend our research dollars. But eventually, the argument carried the day
and the genome project got started officially on October 1, 1990 under the direction of
John Watson, across the street. Francis took over in 1993, Francis Collins. And initially
it focused on technology and model organisms, yeast and C elegans and flies and so forth,
but in the mid-'90s it turned its attention almost completely to the human genome and
in fact it was a competition between the so called public group headed up by Francis and
the private group headed up by Craig Venter. And miraculously, both groups finished on
the same day, as shown here on this front page of The New York Times. This is Tuesday,
June 27, 2000, when both groups announced the fact that they had a draft sequence of
the human genome. The public group went on to do more than a draft, to do a very high
quality complete sequence of the human genome and that was finished in about 2003.
So let me just make sure we're all on the same page with terms. Is there a -- I forgot
to bring a pointer -- is there a pointer here? So, I just want to -- since it's been awhile
since some of you may have thought about this, what I've shown here is a diagram of a gene.
So amazingly, the word gene was coined in about 1908 by a man by the main name of Johannsen
and if you look at how genetics -- geneticists have defined what a gene is over the time
since then, the definition keeps changing. And in fact, if you put 10 geneticists in
a room right now and asked them for a definition of a gene, you might get 12 definitions.
So let me tell you sort of so we're more or less on the same page, what most of us mean.
So what I've defined -- shown on this diagram is a mammalian gene, a gene that encodes a
protein. And it turns out, the pieces of the gene that actually account for the coding
sequence are called exons. They are the pieces that -- thanks, Greg -- that are -- so here
are the exons. This is a four exon gene. Those are the pieces of the gene that are -- been
transcribed into RNA and then there's splicing that goes on, these four segments -- the RNA
corresponding to these four segments -- ends up in the mature mRNA that goes out into the
cytocell. And then there are pieces of DNA between the exons and we call those introns
and when they're transcribed, the transcription of the gene to RNA would start right here.
It would go like this and then these intronic pieces would be spliced out and the mature
message would be made up of a sequence that corresponds to exon one, two, three and four
all stitched together.
So these are the exons. These blue -- the purple line is the introns. And up in the
front of the gene, the five [unintelligible] there's some regulatory sequences. We call
that the promoter. There might be some distant regulatory sequences way away that we call
enhancers and the translation would start here once the RNA was made and all of these
would give information about making the protein that corresponds to the product of this gene
and then this would be the three prime UTR, untranslated region, of the message. So this
is what is a classic protein coding gene. Now we now know that there are other genes
in the genome that encode RNA but the RNA is never translated into proteins. So there's
a set of RNA genes as well as protein genes. For most geneticists and for what I'm going
to tell you today, when I say gene I mean genes that encode protein like this one here.
So if we look at where we stood in 2003 in terms of understanding the human genome, there
are some simple features that I just want to remind you of. First of all, if we counted
up the number of genes in the genome, it turns out there are about 22,000 genes in the human
genome. Now this is a big surprise to everybody. We had a pool about how many genes it would
take to make a human and of course because we're egocentric we -- most of us guessed
way high. I guessed 100,000 genes. And it turns out that if you look up all organisms
that have bilateral symmetry from flatworms to butterflies to fruit flies, it's about
20,000 genes. So for some reason, we don't know why, this is sort of the sweet spot for
genes in the biological kingdom.
We know right now of about -- the function of human genes of about 75 percent of these
22,000, so there are still a good number of genes in the genome we don't have a clue as
to what their function is. Those exons, those pieces of genes that actually get spliced
and retained in the messenger RNA and go out on the cytocell and are used to direct the
synthesis of proteins, we can actually count them up now because we've got the sequence
of the whole genome and there are about 220,000 exons in the genome and those exons are distributed
over about 50 mega bases. The entire genome is about 3,000 mega bases or three giga bases.
And over here is a comparison to the mouse and there's actually pretty good similarity
between a mouse and the human genome. So 22,000 genes, about 220,000 exons. And the exome,
which we've come to call that portion of the genome that's comprised -- that comprises
all the exons, is about 50 mega bases. That's only about 1.5 percent of the total genome.
So there's a lot of the genome that doesn't seem to be -- have much function. If you look
-- if you put an evolutionary test to it and ask how much of the genome is conserved over
evolutionary time, it's about five percent. So there's an additional three point -- the
exons are very conserved, so there's an additional 3.5 percent that's conserved. That means must
have some function. We're not exactly sure what that function is. So there's still a
lot to learn but at least we have a list of the parts at this point.
Now once the reference sequence was done, roughly 2003, we said okay, we've got one
human reference sequence but if we look around the room we can see no two people are alike.
So what we really need to -- if we want to move forward, what we needed to do at that
point was to understand something about the extent of genetic variation in our species.
And so the genome -- the people involved in the genome project turned their attention
to enumerating human genetic variation. We knew early on that one human to the next is
pretty similar. The current number is around 99.5 or 99.6 percent identical, one person
in the room to the next person in the room. And some people said wow, that's an extreme
degree of similarity, but if you think about it from an evolutionary point of view, ***
sapiens is a very young species; it started from a very small number of founders. And
so this is about the evolutionary spread you would guess over that period of time and we're
actually pretty close to our relatives. For example, in the coding sequence we're between
70 and 90 percent identical with a mouse and we're 98.5 percent identical with our closest
living relative, the chimp.
So on the other hand, .4 percent of three billion bases is actually a pretty big number,
right? So there's a lot of chance for a difference, one person to the next. So the genome project
turned to enumerating that difference and the first project was called the HapMap, which
studied three populations of humans from around the world -- Northern Europeans, West Africans
and Asians -- trying to find all the common variation. That was followed on by the current
project which is called the 1000 Genome Project and actually the current goals of the 1000
Genome Project are to study about 2,500 individuals from about 50 populations around the world.
And the idea is to catalog at least 90 percent of the variants that have a frequency somewhere
in the world of at least one percent amongst human populations and in the coding sequence,
that exome part of the genome, to catalog all variants that have a frequency of at least
.1 percent. So in other words, when the 1000 Genome Project gets done, we can look forward
to having a pretty good handle on all variation that's common in the genome across our species
in various places in the world. There's tons of rare variation that won't be detected by
the strategy so we'll continue to find the rare variation as we go forward. But at least
we'll begin to have the common variation in our species.
So what kind of variation is there? So there are several categories and I'm just going
to briefly mention them and focus on two. First of all, there are small insertions and
deletions. This would be like a few bases are inserted in one place in the genome. And
very often where they're inserted is some part of the genome that's nonfunctional so
it doesn't make any difference. Geneticists call these insertions or deletions indels
and that makes up about 10 percent of the variation and sequence. There are some length
polymorphisms; these tend to be sequences that are also short, maybe two nucleotides
or three nucleotides repeated over and over again, typically in nonfunctional parts of
the genome but not always. That makes up about five percent of variation.
The variant that makes up a large chunk of the variant that I think you read about and
heard -- have heard about are single nucleotide polymorphisms and I'll talk a little bit more
about those. They make up about 40, 45 percent of the variation. And the other big variant,
a kind of variation that we didn't really even anticipate in 2003 but we've learned
about since then and we know that it counts for a lot of variation, are so called copy
number variants and I'll show you what those are. They make up about another 40 or 45 percent
of variations. So most of the variation is in these two categories, at least as far as
no single nucleotide polymorphisms are SNPs and copy number variants are CMVs.
Now there's also variants where pieces of the genome, a chunk of the genome was broken
at both ends and flipped around. That's called an inversion and it can cause a problem if
the break points are in protein coding genes. Those are hard to detect and we don't really
know the extent of inversions as a contribution of variation yet. We know certainly of some
inversions that make a difference but that's an area we need to learn a lot more about.
And of course, at each generation, the chromosomes undergo recombination so the variants are
reshuffled in terms of how they're distributed from one generation to the next.
So there's a lot of genetic variation. Now let me just emphasize, make sure we're all
on the same page in terms of understanding single nucleotide polymorphisms and copy number
variants. So here's a typical single nucleotide polymorphism. Here's part of a sequence, GATCA,
and at this particular place, this T, there's a second form of the gene, a different allele
-- allele meaning a form of the gene -- it's exactly the same here and here but at one
position it differs. And in this case it's a T in the one form and a G in the other form.
So it's a single nucleotide variant or polymorphism. Polymorphism means it's relatively frequent.
And these single nucleotide polymorphisms occur about one in every thousand base pairs
in the genome. Some areas, they're a little bit more common; in some areas they're a little
bit less common but that's enough to give you about three million or so variants per
haploid genome per individual. So that's a lot of variants, to the extent that those
variants occur -- when those variants occur in key functional regions of the genome.
Moreover, these variants -- it's become very easy -- the technology's been developed to
very easily and accurately measure at this position, let's say, whether the person has
on one chromosome a T or a G and whether they have a T or a G on the other chromosome at
that position. So that's called single nucleotide polymorphism, or SNP genotyping and we have
chips that do that and the standard platform right now measures about a million SNPs across
the genome. We have a big center over at Hopkins called the Center for Inherited Disease Research.
And we do thousands of patients, this sort of genotyping, per day, measuring these variants.
And so we use the SNPs as little tags to identify regions of the genome and how they've been
transmitted down through the generations. So we'll come back to that in a minute.
Now let me say a word about copy number variations. So here's two chromosome pairs and so think
of this as perhaps the chromosome that you inherited from your mother and here's the
corresponding chromosome that you inherited from your father. And in this region there's
a little deletion in this chromosome so that this piece of DNA that's meant to be here
from your mother's chromosome is not there in the father's chromosome. Now it turned
out that cytogenetics, looking at chromosomes in the microscope, has a resolution down to
about three million base pairs, three mega bases. That means a really good laboratory
can see a change of a deletion or a duplication in a chromosome if that change is at least
three mega bases or bigger. And standard molecular techniques, of course, were gauged to find
changes of the sequence of a few base pairs, one or two or three base pairs.
So if we had been smart enough a few years ago someone would have said, well, wait a
minute? You're looking at the genome with two technologies, one that has a resolution
down to about three mega bases and another that sort of the sweet spot of resolution
is on the order of a single base or a few bases so you're not looking at a change that's
in the size interval between those two technologies. And sure enough, it turns out that these copy
number of variations, here's the different kind, a duplication in this region of the
genome. So this chromosome is actually shorter by that amount that's duplicated. This chromosome
is longer -- deleted -- and this chromosome is longer because that region is duplicated.
So it turns out that there's a lot of copy number variation in our genome. That means
that in certain regions, if there happened to be a gene here in this little piece of
DNA that's deleted off of this chromosome, then this individual, instead of having two
copies of this gene would have one on this chromosome and would not have any copy of
that gene on this chromosome. So that means that for regions of the genome that are affected
by copy member variation, we may have, instead of two copies of a gene, one copy. Or if it's
a duplication we may have three copies instead of two copies.
So that makes a lot of variation in the genome. It exposes genes that are sensitive to dosage.
In other words, in some genes it's important that you have two function copies. Other genes,
one is certainly adequate so it's relatively insensitive to dosage. We don't really know
how many genes are dosage sensitive, but we think maybe a few percent of genes are dosage
sensitive. For deletions, the other thing -- the other
way that this can be important from a medical point of view is that if you have a normal
variation in a gene on this chromosome, if there's no deletion over here, that normal
variation may not be very important because you have two copies of a gene. But if you
have a CNV over here that deletes a copy of the gene, then you have some variation on
this chromosome that normally is not too significant, it becomes more significant if that's the
only copy of that gene that you have. So for deletions that expose otherwise normal variation
on the remaining allele, and you can have fusion of genes where you -- where the junction
the repair -- the repair of the deletion occurs or the repair of the duplication occurs. So
there's lots of ways in which copy number variation can perturb genetic function and
not surprisingly, as we've appreciated this, we've found that this is a rich source for
producing human disease.
The bottom line of all this is there's a lot of variation in our genomes. In fact in 2007,
Science magazine said that the breakthrough of the year was human genetic variation. And
so we know that there are about 30 million single nucleotide polymorphisms in our species,
about three million differences between each individual as compared to the reference sequence,
and in terms of copy number variations there's three to seven large copy number variations
per individual. About five to 10 percent of us have a copy number variation bigger than
100KB. The average gene is 30KB. And one to two percent of us in this room have a copy
number variation bigger than a mega base; could affect 10 or 20 genes. So there's a
lot of variations, both at the single nucleotide level and at the copy number variation level.
So in fact, different members of our species are genetically different even though we only
differ on the order of one base pair per thousand bases.
Now -- so there's a lot of genetic variation. Now, the last thing I want to say in this
category is the sort of advances in technology and I think many of you have heard about but
I just use this single slide to emphasize the rapid change in DNA sequencing technology
that's gone on since 2003 when we said we'd finished the genome project. So down here
are years and this is 2000 over here, this is 2010 over here. Just pay attention to this
red line which is the cost per million high quality base pairs of DNA sequence. So up
here at the start of in 2000 it was about $10,000 per million base pairs. And you see
the curve has come down so that in 2005 it was about $1,000; in 2006 it was about $100;
in 2008 it was $10 and in 2010 it was 1 dollar. So the cost of DNA sequencing is coming down,
down, down, down very rapidly.
And not shown in the slide, but perhaps you can get from the rate of accumulation of sequence
here, the ability, the throughput is actually going up and up and up. So the technology
is advancing so that we can sequence DNA faster and faster and more and more accurately and
cheaper and cheaper and cheaper. So DNA sequencing is becoming a very practical tool to enumerate
the genetic variation that we just talked about to begin to understand the genetic differences
between people.
Consequently, we've begun to see in the literature and in other places, the availability of sequencing
the DNA of a single individuals. So these little figures here show by the end of 2010,
we had about 25 or 30 individuals whose whole genome sequence was available and it's estimated
that at the end of 2011, so we're one month away, there will probably be on the order
of 30,000 whole genome sequences of a particular individuals available in various databases.
So DNA sequencing is really making a huge impact in enumerating genetic variation. What
we have to learn is how to interpret all that.
So that's all I'm going to say about the genome project, genetic sequence variation and technology.
And let's turn to one -- make one point on what I've called whole genome sequence biology.
So it's interesting that remember I said at the start of the genome project, there was
an argument about whether or not it would be useful and would it stimulate research
and would we learn anything from it. And now, some 20 years of so later from when those
arguments were going on, any biologist who's studying any species wants to have a whole
genome sequence of their favorite organism. So it's a complete flip in the mindset and
it's hard to keep track of. I sort of use this tree of life to keep track of it. We
have whole genome sequences from eukaryotes, animals like ourselves, from bacteria, procaryotes,
and from members of archaea, which is the third kingdom of life, which we only recently
found out about. And it's really pretty hard to be sure but I think that we have certainly
more than 2,500 organisms whose whole genome sequence has been obtained and deposited in
various databases. So we've gone from arguing, is it useful? To now, everybody's got to have
it and use it for their favorite biology. And it's turned out to be a very -- a potent
stimulus for understanding biology. And the pace continues.
The other thing that's important to realize is that the sequence that's used, the protein
coding language, really holds true across all biology. So once you have the sequence
of a eukaryote, you can use that sequence information to go look for the corresponding
genes in organisms that are evolutionarily as far removed as bacteria. So the DNA sequence
provides a language of biology that allows us to look at what particular genes do across
all biology. And so we gain a huge amount of information by having that language, that
universal language across all biology. Okay. Now that's all I'm going to say about
whole genome biology.
Let's talk just for a minute about evolutionary thinking in medicine. Now when I went to medical
school, evolution was not mentioned. I think the whole four years I was in medical school,
I doubt that the word evolution was ever uttered. And if you asked me, and I was very interested
in biology, about evolution, I would immediately start thinking about dinosaurs and fossils
and things that were pretty far removed from medicine. But, as Dobzhansky [spelled phonetically]
said, "Nothing in biology makes sense except in the light of evolution." And I think now
nothing in medicine will make sense except in the light of evolution. We are part of
the biological kingdom. We result the end products of evolutionary biology.
So, what do I mean by this? Well, if you start looking at how evolution works and then think
about what it means for medicine, for evolution, a central theme is variation and I've just
shown you that we're now focusing on human genetic variations, so centrality of variation
in terms of how things change over time. The continuity and consequences of natural selection,
that is to say, natural selection is going on all the time. We all partaked of the little
-- all partook of a little natural selection when we went outside and we had that very
great breakfast that was served up to us in terms of the caloric increase, the kinds of
nutrients that we exposed ourselves to and so forth. So natural selection is going on
all the time.
Biological systems have developed mechanisms by which they respond to the environmental
changes to evolve. And that's turned our focus to systems biology, putting organisms back
together instead of using reduction, actually using an integrative approach to understand
biology as shown here. And, an emphasis on individuality because if you look at how selection
works in whatever species you're thinking about, the selection actually occurs on individual
members of a species. So that is what goes on in our species as well and that's selection,
which in other species we applaud because it serves to make wonderful biological characteristics
in different species. In our species, the people who are on the short end of the stick
for natural selection are the patients that come to see you with problem -- medical problems.
So it's natural selection that's going on. In our species we care about those individuals;
in other species we don't worry about that. If you're interested in that, there's a review
of this in PNAS in 2010 about evolutionary biology in medicine.
So from the point of view of what we've learned from the genome sequence in evolutionary biology,
it's really been very interesting because we can look and see how we compare with our
closest living relatives, the chimps. Remember I said we were 98.5 identical so it's interesting
to know how we differ, what makes us different from the chimps. We can even now sequence
our closest relative ever which was Neanderthal. So now the genome of Neanderthal has been
sequenced last year by Svante Paabo and his colleagues. And you can ask okay, what are
the differences, the major differences between us and Neanderthal? And if you enumerate them
and lump them together, it turns out there are a bunch of genes that show sequence variation
between us and Neanderthal that are involved in energy metabolism. There's another bunch
of genes that are expressed in the nervous system and are thought to be important in
cognition. There's another bunch of genes, one that I'm particularly interested in that
is -- that are involved in neural development. And then the last category that's more -- that's
particularly variable are in micro RNAs, these new RNA molecules that are important in regulation
gene expression. So you can begin to see -- get the idea of what is it that has changed over
evolutionary time to allow *** sapiens different properties and different characteristics as
compared to Neanderthal, our last -- our last relative.
So that's evolutionary thinking in medicine. Let me just say a word now about disease gene
identification. So, disease gene identification, if you look at when disease genes were first
identified, roughly 1900, the time between 1900 and 1910. There was some knowledge of
color blindness before then, but we began to think about genes causing specific human
phenotypes in that first decade of the 20th century. But progress was very slow and I
plotted it here. This is a modification of a plot that originally was published by Joe
Nadeau. Just look at this pink curve, this scale over here. And what I've plotted is
the identification of genes that are responsible for rare Mendelian conditions. So these are
things like PKU, cystic fibrosis, Marfan syndrome, LDL receptor defects, all of those strong
phenotypes that are inherited as Mendelian traits just as exactly the way that Mendel
showed in pea plants. And so you can see the number has gone up pretty dramatically and
currently there are about 2,600 genes in the human genome that have been shown to have
variations that account for particular human diseases.
We'll come back to the common complex traits later but that's on a different scale. You
see here sort of way behind on common complex traits. This is -- focus for a minute on the
Mendelian disorders in this category. And you can actually look at the progress and
there's an online resource called on Online Mendelian Inheritance in Man. This was started
by Victor McKusick at Hopkins and currently it's maintained by Dr. Ada Hamosh and her
team at Hopkins and accurately lists -- here I say 2,500 but today's count is around 2,650.
And there's another online resource called Gene Test that measures the number of these
genes that can actually be measured to make diagnoses or sequenced to make diagnosis and
it's about 2,000 now.
This plot, which was -- came from an article by Art Bodet [spelled phonetically]. I can't
read it, I guess, here but this looks at the number of genetic tests going from 1990 up
to the year 2000 and you can see this tremendous increase so that here we had less than 50
genetic tests, now have about 2,000 genetic tests. This is causing a radical change, particularly
-- at least at Hopkins for the way pathology deals with this. So a patient is seen and
the doctor wants to send off a test for some very rare disorder and the pathology department
has to find a laboratory that does the test and make sure that they're certified and so
forth. So one of the things we're wrestling with is how to modernize the way we handle
requests for genetic tests and how we interpret those results.
Molecular cytogenetics, the copy number of variations is moving along. So we're making
a lot of progress in this whole effort. Now, if you want to find out -- the one thing I
want to point out though, that although this number is big, it's only about 15 percent
of the total number of genes. So we have a lot of work left to do. There's no reason
to think that the other 85 percent of genes won't also have Mendelian phenotypes associated
with them. If you're interested in keeping track of this, I urge you to go to this catalog,
Online Mendelian Inheritance in Man. I already mentioned it's very user friendly. You go
to www.OMIM.org and it has a search box here on the first page and you can punch in that
search box either a disease name or a clinical symptom.
So here I've written in Marfan syndrome and I get a list of entries and those entries
include fibrillin, that's the gene that's responsible for Marfan syndrome. Here's the
clinical phenotype, Marfan syndrome. So if we're in the clinic we see a patient we think
might have Marfan syndrome, we want to look up the clinical features, we put in Marfan,
we click on this and we get it and we -- I'll show you what you go to. If you want to learn
something about the molecular biology, we put in Marfan syndrome, it comes up with a
gene, click on this and we go to the gene and so forth. These are other symptoms -- other
syndromes that are -- have similar overlapping features that might be considered in the differential
diagnosis. That's a -- if you put in Marfan. You could also put in, not the name of the
syndrome, but just some clinical features. So here I put in tall stature and dislocated
ocular lenses and you'll see I get pretty much the same thing. The first thing that
turns up is fibrillal and that's the gene responsible for Marfan syndrome center. The
third entry is Marfan syndrome. The second thing that comes up in the list is homocystinuria
which is a phenotype that is very similar to Marfan syndrome. So you can use this as
a tool for trying to figure out what your patient has, based on the clinical findings
that you observe.
If you actually go to the entry--so here I've gone to the entry for Marfan syndrome. This
is a long, long entry. I've just shown you the top of it but it describes the history,
the clinical features and so forth. You have this table of contents over here so if you're
just interested in how to make the diagnosis, you pull that down, click on that and it will
tell you how to make the molecular diagnosis, where to send the test and so forth. Or if
you want to know are there animal models and what do we learn from that, you just click
on that. So it's a very useful tool. It's free and very easy to use. Just go to OMIM.org
and try it out.
Now, what about identifying the genes, not for these rare Mendelian disorders but identifying
the genetic variants that contribute risk for common, complex traits like diabetes that
we already mentioned or coronary artery disease or neuropsychiatric disease? And there, notice
that the scale is different and progress has been very slow, although recently it has spiked
up tremendously. So we now have variants that we think are responsible or contribute risk
for at least 200 of these phenotypes.
By and large, these variants are not causative. They're actually susceptibility variants and
either increase or decrease one's risk for a particular phenotype. And the method that's
used for this is -- I'm sure you've heard about is genome-wide association studies or
GWAS studies. And these studies are agnostic approaches that identify SNP markers enriched
as cases as compared to controls. So typically, if you want to do this, you have a large group
of cases and a large group of controls. And you do that SNP genotyping that I mentioned
earlier across the whole genome and you look for particular SNP genotypes that are enriched
in your cases as compared to your controls. And it's agnostic in the sense that it makes
no assumptions about what the genes are or what the variants are that are responsible.
The only assumption it makes is that somewhere in the genome there is a variant that contributes
risk for it. So it finds stuff that's not looking under the light post but looking across
the whole genome without any sort of preconceived notions. It's a very powerful aspect of this
that's been very informative to us.
So you find markers, SNP markers and then you look around those markers and you try
to find the causative variants that are actually responsible for the change in susceptibility.
And once you find those causative variants, that gives you a particular gene and tells
you something about the biological perturbation of that gene function that increases the risk
or decreases the risk for a particular trait. And it also gives you, to the extent that
you know what the biological system that gene product works in, it identifies a biological
system that is perturbed that gives you -- changes your risk for a particular disease of interest.
So this diagnostic approach has proved very powerful in terms of illuminating biological
systems that are responsible for certain phenotypes that we had no prior knowledge that they played
a role in that. In the case of type 2 diabetes, years ago, we all thought that that was insulin
resistance that was a peripheral problem that the peripheral tissues were resistant to insulin.
There was some degree of that it turns out, most of variants that contribute risk for
type 2 diabetes are in insulin production, not in insulin resistance.
So -- and then of course understanding the pathophysiology gives us a better way of treating
-- of dealing with the patients. This is sort of a diagram of how this might happen. This
is from a paper by Teri Manolio at the Genome Institute, and it is in this series -- and
I'll come to this at the end -- in the New England Journal that Greg is one of the editors
for -- Greg Feero is one of the editors for and it really is a wonderful collection of
papers, a sort of state of the art genomic technologies.
This diagram -- I don't know how well you can read it -- but it shows a region of the
genome and maybe the distance between these two single nucleotide polymorphisms is one
KB. And it shows it in three individuals so you get the genotype of this SNP and the genotype
of this SNP in these three individuals plus a whole bunch of individuals and you look
at the frequency of those genotypes in your cases and compare them to controls. And let's
say in the cases the particular variant is more common and so you see more heterozygotes
and more homozygotes for that variant. You might want to do a replication study, a different
population, to make sure it's not something to population stratification. And the end
result, then, you plot out all of those variants and you ask, are there any variants that statistically
are associated with a statistic -- at a statistically significant level with a particular disease
phenotype. This has come to be called the Manhattan plot because looks like the skylight
-- the skyline of New York. Each chromo -- all the variants of each chromosome are color
coded and see they are all more or less clumped around the bottom here except the one region
on chromosome nine there's a bunch of variants whose P value is exceedingly low, that is,
here P less than 10 to the minus eight. And so they're statistically significant, even
with all of the tests that one has done. So that says, in this region of the genome defined
by these two markers, there are some variant or some set of variants that contributes risk
for this particular disease. So now we go look at that region very carefully, identify
the cause of the variants and move forward in our understanding of the biology of the
disease.
If you're interested in this, NHGR [spelled phonetically] maintains a great website and
here's the whole genome and all of the variants that have achieve statistical significance
for all of the phenotypes. This was up to date as of March 2011. I think there is a
newer version online now.
The interesting thing is that many of these variants, as I've already indicated, tag genes
or biological systems that we did not previously know were important for particular disease
phenotypes. The other thing that we've noticed is that most of these variants for the common
complex traits, are not in protein coding space, not in those exons that we talked about,
which are usually hit for the Mendelian disorders, but are in fact in regions of the genome that
seem to regulate gene expression. So you remember the little diagram of a gene I showed you,
there are upstream sequences in the promoter or more remotely related to the gene called
enhancers. And I think that most of the variants that are involved in this actually perturb
gene regulations so they're in the non-coding regulatory regions of the genome. That's important
because we don't know -- we're more -- our state of the knowledge is weaker in terms
of understanding regulatory variants as opposed to protein coding variants. So it's an important
area of research going on right now.
And in aggregate, if you look at a lot of disease phenotypes, we haven't found all of
the variation yet, so much of the heritability, that is, the genetic variation that contributes
to a phenotype, remains to be explained. That has come to be called the dark matter. Variants
so far identified for particular complex traits may vary from as high as 60 percent, that's
probably where we are for age related macular degeneration, to less than five percent, that's
probably where we are for type 2 diabetes. So for some disorders, we've only explained
a small fraction of the genetic variation. Other disorders, the methods so far have allowed
us to explain quite a substantial fraction of the genetic variation.
So, this comes to, then, if we've found variants but they only explain a small fraction of
the risk, people have said, well, what have you learned? Well, one thing you've learned
is, you've identify biological systems and those systems become important to study to
understand the mechanisms of the disease in a more complete way. The other thing is that
the risks that we calculate by present methods, I think, are underestimates for a variety
of reasons. So the common sort of critique of this is that the risk allele that this
single nucleotide polymorphism confers a risk to individuals that is only 1.2 times greater,
let's say, for example. So it's hard to change medical management. It's hard to get a person
to change their lifestyle based on a tiny change in risk. So we have to do better than
that.
Now, remember that these risks are calculated in populations. There are not calculated in
individuals. So for a given individual, a particular variant may be much riskier or,
in fact, it may not be risky at all. We're just talking about the risk across populations
so we have to learn how to individualize these risks and we have to recognize that the way
these variants work are in complex biological systems. So this shows a complex biological
system, each dot representing a protein product and the interactions between these protein
products represented by the lines connecting the dots.
So the systems are complicated and involve many components. So what we really need is
not to look at a particular variant but we need to learn how to look at sets of variants
characteristic of a particular individual, and also integrate that with the environmental
exposures of that individual to really calculate an individualized risk. And we just are not
able to do that yet, although I must say people are making considerable progress in developing
new analytical methods, a more biologically based, I would say, set of analytical methods
to really calculate accurate individualized risks. And in fact, one strategy that has
very recently been applied and is turning out to be much more -- identifying much greater
risks, actually, is looking not at the clinical phenotype but looking at biochemical markers.
So this has come to be called metabolomics and this is a study that was just published,
looked at about 3,000 individuals, used the whole genome SNP genotyping that we talked
about, measured about 250 metabolites very precisely in these individuals and found 25
loci with effect sizes anywhere from 10 to 60 percent of explaining the biological variation
for those small molecules. So it suggests that if we sharpen up the phenotype that we're
looking at, in this case measuring a biological marker, the risks would be much more predictive
and much more significant.
So this is just the top of that list of all the variants and you can see P values here
on the order of 10 to the minus 250. So these are highly statistically significant variants
that influence the level of these metabolites. The metabolites, in turn, are involved in
a variety of complex traits. So we're making a good bit of progress in this area.
I'm not going to say anything more about identifying variants for complex traits. I just want to
say a word about individual genome sequencing and then how -- and what this means currently
for the practice of medicine. So individual genomes sequences, we've already mentioned
that. The first one that was published was Craig Venter's, and I think that's because
part of his genome sequence was what his company sequenced in the race to get a whole genome.
So, we had the reference sequence but of course that was an anonymous person. So what we really
want to know is what is the sequence of my patient? So that's why I think individualizing
-- obtaining the sequence of individuals really is a really a change in the way we look at
patients.
So for example in Venter's, he had 4.1 million variants as compared to the reference sequence.
That included 3.2 million SNPs and about 300,000 copy number variants. There were 90 inversions
and the total number of space covered by the variants was about 123 mega bases. That's
a huge chunk -- or about 12.3 mega bases -- a huge chunk of the genome. So to put it in
personal terms, you could look at this individual and you could look at his lactase genotype
and ask, is he someone who can tolerate ice cream or not? You could look at his dopa DR4
[spelled phonetically] receptor, that's associated with risk taking behavior, and you probably
could have made a guess about Craig Venter's risk taking behavior genotype before doing
it, but you could actually make that measurement. Or you could look at his ApoE genotype and
understand his risk for whether or not he has an increased risk for Alzheimer's disease.
So it's a completely different level of information about individual patients.
So I think this will have -- all of these things, all of these changes and advancements
will have profound effects for medicine. This is a picture of a Dr. Shirani who was rounding
[spelled phonetically] in Kansas in the middle of the 20th century. The picture was taken
by Eugene Smith. This is sort of the idea that I had, what I would be doing when I decided
to go into medicine. And of course it's far from -- far from what we do. So, we sort of
have summed it up in terms of developing what might be called the science of the individual,
how we're going to use this information to understand our individual patients.
So what have we learned about the science of the individual currently? So first of all,
it exposes the pitfalls of technological thinking. In other words, remember those kids I showed
you where you say okay, this is an example of a certain disease. Rather, we think this
is a patient who has features of a particular disease and we understand that no two patients
have exactly the same manifestations of that disease and no two patients will have the
same responses to our attempts to treat them. So it confirms what is in the past has been
called the physiologic view of disease. Each individual has their own disease. It emphasizes
the importance of asking, why does this particular patient have this particular problem at this
particular time? So it turns the focus more on trying to understand why people get sick
and what can we learn about from that exercise in terms of managing the patient as we go
forward in terms of the best treatment for this particular patient?
However, moving it into medicine and making it practical and bringing it to the clinic
and to your offices is a challenge. And you all understand that. It's interesting to look
at a paper that came out recently that attempted to do this. This is a scientist called Steve
Quake. He had a relative who dropped dead of sudden cardiac death in his 20s. Here's
Quake over here. So he went to his cardiologist out at Stanford and he said, look, I have
this relative who just dropped dead in their early -- in their 20s and I want to know if
I'm at risk for that. So that's a reasonable question to ask. So they got a big pedigree
and they went ahead and sequenced his genome and then they tried to use that information
to give him a more informed understanding of his risks, not only for sudden cardiac
death but for other common medical problems.
And it turned out that was a really daunting exercise. It took all of these people. Here's
Quake, he got to be an author on his own sequence. Here's the person who led the study, Russ
Altman, who's a cardiologist. There is one medical geneticist and one genetic counselor.
It took the genetic counselor five and a half or six hours to sit down with Steven Quake
who is a very accomplished molecular biologist, and explain to him all the variation that
was found in his genome. So you can imagine doing that exercise to less sophisticated
individuals. And in the end, at the current state, most of the information we could give
him was changing his risk for certain things in modest ways. So it did not really overnight
change how Quake would be managed and certainly did not change much beyond what we would have
done from having his pedigree.
On the other hand, we're learning stuff about how to use this information as we go forward
virtually every day. So I think going forward, we will increasingly learn how to use this
information in a much more effective way. And I would support that argument with this
-- with these examples. First of all, to do this it will require rigorous research of
the kind the Genome Institute and Hopkins is doing both at the basic level, at the translational
level, and at the clinical level. New technology continues to accelerate the pace and it's
not going to happen overnight. It happens gradually and let me give you these three
examples.
First of all, acute lymphoblastic leukemia. When I was a house officer in the late '60s
and early '70s, acute lymphoblastic leukemia was the most common form of childhood leukemia
and had a 95 percent mortality rate -- 95 percent mortality. Nowadays, acute lymphoblastic
leukemia remains the most common childhood leukemia. It has a 95 percent survival rate,
95 percent survival. So it went from 95 percent mortality to 95 percent survival. So what
accounts for that change? So, actually if you look at it, the medicines that are currently
being used are very similar, if not identical, to the medicines that we used all those years
ago. So it's not the kinds of medicines that are being used.
What it is, I would argue, is that oncologists have learned that this diagnosis, acute lymphoblastic
leukemia, is actually a heterogeneous group of disorders. And they've learned how to use
gene expression profiling, age at onset, DNA sequence variation and other tools to subdivide
the patients. In other words, move from one collective diagnosis to subcategories of diagnosis,
moving towards individualizing the diagnosis to individual patients and then manipulating
their treatment according to which subdivision the patient falls in. And that approach, a
more informed approach in terms of differences between individual patients with the same
diagnosis, has had a dramatic effect on the consequences of having ALL.
The same is true, but to a lesser effect, for sickle cell disease. You all know that
there are patients with sickle cell disease who are very sick from infancy forward. And
there are other patients that just have an occasional crisis, maybe once a year or once
very few years. So there's tremendous variation among individuals with sickle cell disease
and recall that they all have exactly the same genetic defect at the disease gene locus.
They all have exactly the same mutation in betaglobin. So what makes the difference between
one patient with sickle cell disease and the next?
So increasingly, we're finding modifying genes that modify the phenotype of sickle cell disease
and we can define a subgroup of sicklers that are much common much more likely to develop,
let's say, certain a very disastrous side -- complications of sickle cell disease such
as stroke and so forth, and we can manage that subset of patients with sickle cell disease
much more aggressively when they're at risk for developing a stroke, let's say. So we're
individualizing therapy for sickle cell disease and that's having better outcomes.
Recently, the genome project, genome scientists are sequencing tumors so there's a lot going
on now about sequencing individual cancers and the people who have the cancers. And one
of the interesting things that's come out, first identified by Burt Vogelstein at Hopkins,
looking at glioblastoma multiforme, the most serious brain cancer, and it turns out that
a small fraction of glioblastoma multiformes had a mutation in isocitrate dehydrogenase.
That's a -- genome codes an enzyme in the citric acid cycle. But it turned out that
you could stratify the patients but in terms of whether or not their tumor had an IDH1
mutation. And if you did that, it turns out that the patients with the IDH1 mutations
in their tumors behave differently than the patients that don't have those mutations.
So we're moving again toward stratifying a different -- a diagnosis, moving to individualize
the diagnoses and adjusting our treatment and our thinking about the patients accordingly.
So this is going on over and over again and it will go on rapidly in some areas and more
slowly in other areas and eventually we will lead to a sort of a very individualized approach.
Here's an example that I find sort of clever. This came from deCODE genetics in Iceland.
And they said, if you look at baseline PSA levels, there's actually evidence that the
genetic makeup plays a big effect on your PSA level. So currently, as you know, we use
this standard cut point for PSA of four. But if you look at normal individuals, four is
actually -- their PSA is actually a good bit below four. And then other normal individuals
have a PSA above four, so this four is sort of an average cut point. So they argued that,
let's say you measured genetic variation at six loci, they recommended, and then you adjusted
the cut point for the individual based on their genetic makeup so that four would actually
be too high for some individuals. And for other individuals it's acceptable. So you
individualize the risk that you determine with PSA level and that gives you a more informed
way to deal with the patients.
Now time is short. I'm not going to say anything about pharmacogenetics, except to say that
it is a classic gene by environment interaction. The environment variable in this case, though,
is very well defined. You know the drug, you know the dose, you know when the patient started
at it. And not surprisingly, there's a lot of genetic variation that influences response
to drugs. So that's an area that's going to go forward very quickly. And it already has
numerous positive effects. Time's short and I won't talk about it but variants that influence
your response to statins or your response to treatment for hepatitis C and so forth.
And these variants tend to be variants of quite large effect, so that's an area where
the variation really has turned out to be very important for the phenotype.
The end result of all of this, I would argue, will get us to this point. So this is a picture,
a painting by Sir Luke Fildes, of the doctor looking at his patient and this is what we
would like to do. We would like to understand our patient. We'd like to look at the patient
and not only use our history and our physical exam but knowledge of the genetic makeup and
the patient's environmental histories to really understand the patient at a level that is
far better than what we currently can understand the patient. So I think over the next few
years, you'll see tremendous progress in this approach so that we can think of our patients,
not as representatives of a particular disease, but as individuals who have a particular set
of problems.
So with that I'll close. Thanks for your attention.
[applause]
Let me give a plug to this set of articles which you can find in the New England Journal,
Genomic Medicine: An Updated Primer. Greg is one of the editors. The one that came out
this week is called "Genomics and Cardiovascular Disease," quite good. And I should also acknowledge
my colleagues and a heavy dose [spelled phonetically] of Barton Childs shown here, now deceased,
who spend his whole life really thinking about how we could incorporate genetic knowledge
into making management of our patients more effective and more individualized. Thank you.
[applause]
Male Speaker: So I realize that people probably have to
get off [unintelligible]. We probably have time for a few questions.
Male Speaker: Have they ever sequenced embryonic stem cells?
Does that completely represent a fully developed or is that sequence early enough that you
can modify it at that early stage?
David Valle: So the question is, have people sequenced
embryonic stem cells and what's different about that sequence as compared to --
Male Speaker: [inaudible]
David Valle: Yeah, and can you manipulate it?
So that touches on a whole area which I did not say a word about, which epigenomics. So
if you look at the sequence of an embryonic stem cell, let's say from a particular individual,
and you could develop that cell line and then follow the individual over their lifetime,
the sequence would remain the same, right? We're born with the sequence that was put
together at the time of the *** and the egg that made us form a fertilized egg. But
what is different, if you look at an embryonic stem cell versus cells in the adult, is sort
of what's called the epigenomic imprint. So this is patterns of regulation of genes. So
it's easy to -- the way I think of it is, if you look at, let's say, the liver in an
adult, when you have a liver cell and that liver cell divides, you get two liver cells.
If you look at a -- let's say a muscle cell and that muscle cell divides, you get two
muscle cells. And yet the genetic material in those two cells, the liver cell and the
muscle cell, is the same.
So what's different about those cells? And the reason one cell is a liver cell and one
cell was a muscle cell, is that there are these programs of regulation of gene expression
that are sort of turned on and turned off and so on the liver you turn on a program
that's necessary for making liver cells. You turn off everything else. In the muscle, you
turn on a program that's necessary for muscle cells and turn off everything else. That -- those
patterns or programs of regulation of genome expression are called epigenetics. And so
what we would see a stem -- in an embryonic stem cell is a much more non committed epigenetic
set of regulations and as the cell was differentiated into different cell types, the epigenetic
patterning of the gene -- regulation of gene expression would become established to make
the daughter cells that derive from that embryonic stem cell develop -- move them down the developmental
pathway to the various pluripotent outcomes that we would expect.
When you go to the next generation, all of that has to be erased because you start, not
with a collection of liver cells, muscle cells and brain cells, you start with a single cell
that then has to be pluripotent to become all other cells.
Male Speaker: In type 2 diabetes you mentioned that the
[unintelligible] regarding the incident [unintelligible] said there's no problem with the [unintelligible]
the incident itself. So there's no difference between type 1 and type 2?
David Valle: No, I didn't say there was no problem, that
you make a good point. What I meant to say, and maybe I misspoke myself, what I meant
to say is there certainly is an element of insulin resistance but it turns out that equally
important, if not more important in type 2 diabetes are various aspects of insulin production.
So type 2 diabetes is different from type 1 diabetes which is more of a more pure of
-- you know, drop out of the beta cell, basically.
Male Speaker: [unintelligible] insufficiency and the [inaudible].
David Valle: Correct.
Male Speaker: Thanks.
Male Speaker: [unintelligible] we're not showing cross-sequencing
as really remarkable [spelled phonetically]. Are there any limits [unintelligible]?
David Valle: [laughs] Yes.
Male Speaker: It seems almost impossible.
David Valle: Yeah, so the question is, where's the limit
of this curve that has to do with the cost and throughput of DNA sequencing? And I don't
know. I'm pretty sure we haven't reached -- we're not even close to the limit. So you know that
some years ago, Francis set the audacious goal of a $1,000 genome. And certainly, we
can do a whole genome -- you can order a whole genome on a patient, let's say, at Hopkins,
for about $4,000 right now. So that's pretty darn close to the $1,000 dollar genome. You
can do a whole exome, that is just look at the exons, for about $1,000.
Now, however, that gives you sort of a preliminary set of analysis of that sequence. It does
not give you a sophisticated analysis of that sequence. And currently, in fact there was
an article in The New York Times yesterday pointing out that the really expensive part
of genome sequencing, particularly as what we're interested in, what does it mean for
patients, is in the analysis. And that is coming along at a slower pace and so if you
want -- if you have to factor in how much does it cost to pay the people to do the analysis
and so forth, it's more expensive.
Now -- but there are new technologies available, compared to the way we -- the current -- the
next generation. There's already a next, next generation that's clearly coming down the
pike. And that will clearly lower the cost and increase the throughput more. So, I think
the thousand genome will easily be surpassed in the near future. And what I tell patients
and medical students is, of course, if you come to Johns Hopkins -- I don't know how
it is here at Suburban -- if you have some complicated problem you come to Johns Hopkins
at 9:00 in the morning, you go home in the afternoon, you're going to blow $1,000 dollars
very fast, so. It's in the range of everything -- you may not be able to even get out of
the parking lot for that.
[laughter]
Male Speaker: [inaudible]
David Valle: Yes.
Male Speaker: [inaudible] point of interest, how close was
the Neanderthal genome to *** sapiens?
David Valle: So the question is --
Male Speaker: [inaudible]
David Valle: So the question is, how close was the Neanderthal
genome to *** sapiens and would they be interfertile?
So, it's about 99 -- first of all, the sequence quality of the Neanderthal is nowhere near
the sequence quality we have for home sapiens but the best guess, I think, is it's about
99 point -- 99 percent -- a little bit better than 99 percent identical. And people were
very interested to know if *** sapiens -- for some reason, I don't actually know why we're
so interested to know -- but people are interested to know whether there was any interbreeding
between *** sapiens and Neanderthal. And the genetic evidence we have right now suggests
yes, there was interbreeding between Neanderthal and *** sapiens. And, you know, we cohabitated
and it seems to me that -- pretty likely. That's what I would have bet before we had
the genetic evidence, human nature being what it is.
[applause]
[music playing]