Tip:
Highlight text to annotate it
X
[music playing]
Larry Thompson: On April 14, 2003, the founding directors
of the National Human Genome Research Institute, Dr. James Watson, and his successor, Dr. Francis
Collins, declared during a press conference that the Human Genome Project had reached
its goal of sequencing the first complete reference sequence of the human genome, and
that the project was officially over. At the same time, the Genome Institute published
a strategic plan that laid out an ambitious research agenda for the field.
My name is Larry Thompson, and with me is Dr. Eric Green, the current director of the
National Human Genome Research Institute, and one of the co-authors of their research
agenda, which was published in Nature. So, Dr. Green, you know, a decade ago, NHGRI laid
out a lot of audacious big challenges for the whole field of genomics, so I'd like to
ask you about some of those and how we've been doing. Let's start with the really basic
stuff, which is where the plan itself started. You went into the field, needed to sort of
know, now we've got the sequence, you know, what's in there? What are the parts of the
genome, and how do they work?
Eric Green: Yeah, so the Genome Project, of course, just
ordered the letters but didn't tell us what their meaning was. Many efforts have been
pursuing that for the past decade. At NHGRI, we've launched the ENCODE Project, Encyclopedia
for DNA Elements Project. ENCODE has revealed over and over again many surprises, all sorts
of information about, you know, how much our genome actually gets made into RNA, and lots
of questions, and some answers about what that RNA might be doing. Lots of information
about all the different parts of non-coding DNA that get bound by different proteins that
are probably playing a role in choreographing the expression of genes in different tissues
at different times, and just remarkable insights about how complicated the human genome really
is. Only about 1.5 percent of the genome accounts for the protein-coding areas, but there's
this other 5 to 8 percent that's highly conserved across all mammals.
Larry Thompson: So, yeah, so what's going on with that?
Eric Green: So, that was a surprise, and when the insights
originally came out on that, when we sequenced the mouse genome, immediately following sequencing
the human genome, there were some that sort of almost couldn't believe what they were
seeing, and then you added the rat genome, and then the dog genome, and others, and that
-- numbers held up, you know, something on the order of 5 to 8 percent, and now we've
done analyses of almost several dozen mammalian species. And there seems to be sort of a core
set of sequences that are highly, highly conserved across virtually all mammals. And as you pointed
out, the minority of those are protein-coding regions, and many of those non-protein-coding
regions have been conserved through evolution as hard as tight almost a grip on them as
much as the protein-coding sequence.
Larry Thompson: We're at the -- we're 10 years after the completion
of the human genome. How many genes are there, and how has the definition of a gene evolved
over this time?
Eric Green: Ten years later, that number's pretty much
settled down into about 20,000, and, you know, plus or minus maybe a few hundred; people
argue around the edges. The other thing that I think has evolved over the past decade is
the definition of a gene. What does a gene mean? Once upon a time, we believed in the
central dogma, it was DNA made RNA, and RNA made protein, and the definition of a gene
was that segment of DNA that contained the information for making a protein. But that
was before we realized that there's more complexity there because DNA can make RNA, and RNA can
go off and do all sorts of important biological things other than making proteins.
Larry Thompson: So one of the other things that was really
clear at the end of the Human Genome Project was that there was this very small percentage
of human variation, maybe about one-tenth of 1 percent was the estimate, and it would
seem clear that within that small amount was the key to understanding why some people are
at risk to disease and other people are not. So what do we do to go about trying to understand
that level of human variation, and what do we know now?
Eric Green: Well -- and, of course, the reason we wanted
to know about variation is we weren't just interested in a hypothetical human genome
sequence that had been generated by the Human Genome Project, but eventually, we wanted
to analyze individual's genome sequences, eventually, perhaps as part of clinical care,
and we were most going to be interested in the differences. But while we knew what a
rough estimate of maybe 1 in 1,000 bases individual differed, we also recognized that it's not
that each of us have unique differences, but, in fact, across groups of people, there's
a lot of common variants that reside out there. And that became a bit of a finite problem;
if you just started sequencing or analyzing enough individuals' genomes, you could develop
catalogs of those variants, and those knowledge of those variants could then be useful tools
for pursuing studies to figure out which of those variants are biologically relevant and
which ones are not biologically relevant.
Larry Thompson: How has the field gone from identifying these
variants across these populations and applying it in the clinic?
Eric Green: It was reassuring when, about seven or eight
years ago, the first demonstrated success story of a genome-wide association study was
published in Science, describing, not only -- they were actually were quite fortunate
in that study, they actually got down to the gene that had variant in it conferring risk
for age-related macular degeneration. And that was about 2005, that first example. And,
wow, all you have to do is follow the literature ever since. There has just been just an avalanche,
if you will, of publications describing successful genome-wide association studies. Over 1,400
publications since that time have been implicating hundreds and hundreds of regions of the human
genome, at least at a statistical association level, with literally hundreds of different
diseases or traits of interests.
Larry Thompson: There's been clearly an investment by the
Institute in technology; how has that transformed how this work can be done?
Eric Green: It's been remarkable, but I think what's happened
in the arena of DNA sequencing technology development, past 10 years, almost precisely
the past 10 years, has been truly spectacular. And we just go back 10 years, just had generated
that first sequence of the human genome, and the act of sequencing part took about six
to eight years, consumed about a billion dollars; that was about the cost for the actual act
of organizing for sequence and actually doing the sequencing. You know, 10 years ago, when
the Genome Project ended, if those same groups immediately would have produced a second sequence
of the human genome, hypothetically, they estimated it would probably take them maybe
three to five months to do instead of six to eight years, but it would still cost about
$10 to $15 million.
But now fast forward 10 years, after these spectacular new technologies have been developed,
and we're well under $10,000; in fact, the current estimates for getting a sequence of
the human genome, something on the order of $3,000, $4,000, $5,000, en route to $1,000,
I think, within a year or two. And remarkably, you could do it today in a couple days, and
probably by the end of this calendar year, I'm being told, probably within a day.
Larry Thompson: What could we say about the genetics of diseases?
Eric Green: The real success stories in terms of really
reaching the finish line with respect to understanding the genomic basis of disease so far have been
limited, almost exclusively, to rare diseases, not the complex diseases that have multiple
genetic components. But what's going on with rare genetic diseases has been truly remarkable.
When the Genome Project began, we knew the genetic basis of maybe on the order of 60
diseases that were caused by defects in a single gene. The Genome Project accelerated
the pace at which we were able to discover those genes and those mutations, so that by
the time the Genome Project ended, that number was about 2,200. Over the last 10 years, it's
been more than doubled; we're almost up to 5,000 disorders that we now know the genetic
basis.
Larry Thompson: And we are already seeing, like the FDA is
using genetic information, and it's advising doctors how to use certain prescriptions,
more than a hundred?
Eric Green: More than a hundred now. When the Genome Project
began, there were only about four where we thought genetics really was at all relevant
for the decision making about what drugs to give patients. And we are very confident that
list is going to grow.
Larry Thompson: Two years ago, the Institute created, with
the help of the community, sort of a new strategic plan, sort of looking forward. So what's the
future vision of where the field is going?
Eric Green: The strategic plan two years ago broadens
that and gets much more specific, in thinking much more critically about that logical progression
going from understanding the genome structure and understanding genome biology; and then
a heavy emphasis on then using genomic approaches and knowledge to understand human disease;
and then using that information to advance medical science, but then not stopping, recognizing
there's a responsibility to continue to do studies to then use that knowledge about advanced
medical science and studying it when you actually go to deliver in a health care system, which
I'll think it's very complicated. And so it's sort of now that full view, everything from
bases of the genome to actual health care implementation of genomic approaches that
is articulated in this current strategic plan, that, you know, everything we think about
really hangs off of in some way.
Larry Thompson: Thank you, Dr. Green. The Nobel Prize winner
Walter Gilbert famously calculated that it would take 15 years to sequence the human
genome for the first time, but predicted that it would take another 100 years of basic biological
research to understand it. The first 10 years appear to have gotten off to a pretty good
start. I'm Larry Thompson, at the National Human Genome Research Institute.
[music playing]