Highlight text to annotate itX
Dr. Evan Eichler: So it really is, indeed, a pleasure to be
here. I was here I think five years ago last time and my message has stayed -- maybe not
hopefully -- but has stayed very much consistent. As Jim alluded to, our lab has been focused
on what some people have called the darkest matter of the Human Genome. We’re specifically
interested in regions that changed very, very rapidly, specifically within the human species;
areas of the Genome that have been proven to be dynamic both in terms of structure,
their organization, and in terms of their evolution.
As he mentioned, specifically we’re focused, as one aspect of that study, on regions of
the Genome that are highly duplicated. And so these come in two different flavors. These
are duplicated sequences that are either duplicated within a chromosome known as intrachromosomal
duplications or duplicated between non-homologous chromosomes known as interchromosomal duplications.
The reasons we’re interested in this are really two-fold. One, these are dynamic by
the fact that they have sequence identity at very high levels to not only the comolgous
recombination promoting additional rearrangement events at the specific sites. The second reason
is that if you believe the work of Susumu Ohno and others duplication is the primary
force by which new genes and gene families evolve. So we’re interested in these regions
really from the perspective of dynamic mutation, De novo mutation associated with disease,
and second from the perspective of the evolution potentially of new genes and gene families
within human. And both of those topics I want to discuss today.
So just to summarize the work that came from really analyzing the whole Genome, or really
the finished Human Genome, this is the pattern of the largest and most identical duplications
within our Genome; the blue lines representing these large blocks, greater than 95 percent,
greater than 20 kb in size, of intrachromosomal duplications. And you’ll notice from this
that a lot of our duplications are essentially interspersed. So if you look at chromosome
seven, you find a lot of the paraways [spelled phonetically], the large ones, are separated
by megabases of sequence. If you add the intrachromosomal pattern you get something like this. So this
is the pattern of the Human Genome that has been relatively constant to the new assemblies.
And most of this data, I should point out, came from back-based sequencing. This is the
sequencing of large insert clones. If you go back and look at some of the first published
whole Genome shotgun assembly versions of the Human Genome, these areas of the Genome
are completely missing. The important point is about 60 percent of the large duplications
within our Genome are interspersed. That is to say, they are separated by at least a megabase
from their nearest neighbor, or they map to another chromosome. If you contrast this with
some recent data that we’ve done with Deanna Church and looking at kind of the comparable
finished version of the mouse Genome, this is the pattern that you see for the most identical
and the largest duplications within mouse.
So in this, the total amount of duplicated sequence now turns out to be very similar
to what we saw initially with human; roughly five percent of the Genome. But you’ll notice
two things. The actual locations of these are fewer in number, so there are about half
the number of sites in the mouse Genome that are highly duplicated. And the second thing
you’ll notice is that essentially most of the lines, the blue lines here, which indicate
intrachromosomal duplications, are right on top of one another, suggesting that most of
the duplications in mouse, about 82 percent of them, are tandem, that is to say, clustered,
in orientation. So this difference between man and mouse in terms of finished back-based
sequence assembly has important ramifications, both in terms of evolution, the fact that
you can juxtapose different pieces of DNA creating complex configurations that you don’t
see in close-related species, and also in terms of disease.
So its importance in terms of disease comes from really some of this seminal work from
Jim Lofsky [spelled phonetically] and others in the early 1990s, and the idea is very straightforward.
If you have duplicated sequences within a Genome you can trick the recombination machinery
during meiosis to recombine where it shouldn’t. So here’s showing two of the four chromosomes
aligning during meiosis, the duplicated sequence shown in green, and anomaly re-homologous
[spelled phonetically] recombination, also known as an equal crossing over event occurring,
leading to gametes that have accumulated an additional copy of that duplicated sequence
or have lost a copy of that duplicated sequence.
So the really important part is that if these were essentially interspersed, imagine intrachromosomal
duplications now with unique sequences encoding genes A, B, and C, genes A, B, and C get taken
along for the ride. So in addition to producing gametes that have additional copies of that
duplicated sequence, we now have gametes that have additional copies of genes A, B, and
C, and we have gametes that have lost copies of A, B, and C. And so those genes are triple
sensitive, haploid sufficient, or imprinted, the result is disease. And so there are at
this point about 30 different syndromes in the human population that are caused precisely
by this mechanism. It is not really a genetic disease, because it doesn’t have to be transmitted.
There’s something that goes on in all of us as we sit in this room and we produce gametes,
either egg or ***. And an architecture that has a lot of these intraspersed configurations
is obviously going to be predisposed to these types of events at a much higher frequency.
So these are some of the diseases. I’m sure many of you have heard of some of them: Velo-Cardio-Facial
DiGeorge, Williams Syndrome, Prader-Willi and so on. There are two interesting aspects
about these diseases, if you look at them. So shown here is the actual size of the duplication,
which is immediate in the rearrangement. And the important point here is that most of the
events are large. The duplicated sequences have to be greater than 10 kb, often greater
than 100 kb in size, to mediate at a high frequency of De novo event. The second point
is that the degree of sequence identity is also very high. So typically most of the diseases
are caused by duplications that are greater than 95 percent, and the vast majority are
greater than 98 percent. And the third component of these diseases, which you can kind of see
here, is that the vast majority of diseases that have been described thus far involve
some type of neurologic component: either peripheral nervous system, or central nervous
and cognitive deficit with these kids.
So the hypothesis was very straightforward. If we had a beautiful duplication map of the
Genome, which was born on the sweat of a lot of wonderful people working on this project
over the last 10 years, could we use that as essentially a morbidity map to predict
the sites of disease associated with these specific regions? And, specifically, could
we focus on children with mental retardation to find new diseases, previously unknown?
So this is this duplication map I showed you again. So it was not just a quality control
exercise for the Human Genome project, but we actually viewed it as a disease map. And
so, here is our roadmap. All of the gold bars represent blocks of sequence where the architecture
is such that you would be believe to be a high frequency of De novo mutation based on
very large, very identical sequences at these positions. So there are roughly 130 regions
at that time, of the Genome. Twenty-three at that time, which are the gold bars with
letters behind them, were ones already associated with disease, and we were betting, at least
the subset of those remaining regions would be associated with De novo disease in the
So the way we did this -- this is kind of old technology now, but we began this work
about two and a half years ago -- we targeted all of our regions that had at least 50 kb
of unique sequence less than five megabases that were flanked by duplications greater
than 95 identity and greater than 10 kb in size. We took backs from these regions and
we built a specialized microray which contained about 2,000 backs from these roughly 130 regions
of the Human Genome. We spotted them on a microray and we simply would test and give
a normal DNA sample labeled with one florachrome against a diseased individual labeled with
another florachrome, and looked for sigma intensity differences based on hybridization
to this chip as evidence or gain of loss of that specific region.
So, in terms of the study populations, we used a normal control group, which people
have argued maybe isn’t the best normal control group, but it was what we had available
at that time, which included all of the HapMap samples, as well as an additional diversity
panel of roughly 45 individuals. So we used these normal individuals to establish the
normal pattern of variation within individuals without disease, or at least without disease
associated with mental retardation. So I’m not going to go over those details, other
than to say that we found lots of copy number variation. So, harkening back to something
that Claire mentioned, the Human Genome had riddled with copy number differences and gains
and losses of sequences in different individuals. We then focused on a collection of kids that
essentially the clinical community, or at least diagnostic community, had given up on.
There’s roughly 500 children: children which had been tested for Fragile X had come back
negative, children tested for [unintelligible] rearrangements, and children whose carrier
type was normal for testing, using this platform.
So, some of the results. So after screening the normal collection then following up with
studies of these three, roughly initially the first 291 children from Oxford, we found
regions of the Genome that look like this. So what you’re looking at here is a log
two relative hybridization intensity plot for four different individuals. These are
all children with mental retardation. And we’re looking for things that deviate from
the log two ratio of zero, which would be no difference.
And you’ll probably notice that there’s a lot of noise over these regions, which is
because about a third of our probes were actually selected right from the duplicated regions,
where the denominator really isn’t 2n, but is actually more than that, so this actually
creates some background. But clearly there’s something different about these four kids.
They have essentially about five backs that are apparently showing evidence of microdeletion
in a region that we never saw once in a normal control group of study. These were validated
by fish. I think the most probably interesting aspect is that we could actually go back now
and do a more high density [unintelligible] nucleotide customized micro race instead of
using five backs in the region, we designed now 11,000 [unintelligible] over that specific
region and really confirmed to see whether the breakpoints were identical.
So it’s showing here those four children once again. This is the log two relative hybridization
intensity depression shown here in terms of Log Two, indicated by significance in terms
of when you see the red signal. And what you can see here is a couple of things. First
off, if we compare the affected child with that of the parents, so this is one of the
children compared to mom and dad, you see mom and dad are normal over in that area,
but the child has a deletion of roughly 450 kb, precisely at that site. You also noticed
here, this is the segmental duplications. These are very large, highly identical duplications,
which chair about 99 percent identity over 100 kb in size. So the duplications are demarcating
the boundaries, or the breakpoints, roughly. But you’ll also notice when you look at
the regions contained underneath these duplications, you see a lot of variation in the normal population,
The important point here was that we had essentially an identical, critical region in four children
identified from this study of mental retardation. All of them had haploid sufficiency [spelled
phonetically], at least that’s our model, and in fact all of them that we’ve been
able to test so far were De novo events. In other events, parents did not have this lesion;
this was seen specifically in the kids.
On top here are some of the genes and then there’s five genes mapping into that region.
We don’t know which causes the disease but obviously there are some great candidates.
One of the most interesting is MAPT, also known as Tao. It’s a gene in which point
mutations have been associated with Parkinson’s, Alzheimer’s, and frontal temperate dementia.
So we’re screening now patients which have essentially phenocopy in terms of disease
and looking for point mutations.
I’d like to just emphasize and make this note, that even though we screened only 300
kids, this was roughly of the idiopathic collection that we looked at, was roughly one and a half
percent of the total in terms of disease.
These are what the kids looked like in collaboration with our former competitors, Bert de Vries
in Holland. We’ve had the opportunity to look at roughly now 21 children, all of them
which have the microdeletion. Nineteen of them, we’ve been able to look at parentals
and show the De novo events. And if you look at these kids you can see there’s some similarities
in phenotype. One of the most pronounced, believe it or not, is this very bulbous nose
that you see in almost all of the kids. You see a pronounced philtrum, sometimes protruding
tongue, as well as a fairly happy disposition, which has actually been noted in many of the
clinical records. So the children have a better outlook than most of us in terms of life.
And, in fact, we’ve now been able to go back and identify from De novo collections
being able to show clinicians the data being able to identify additional kids using this
So one of the interesting parts of this particular, what we think is a new, deletion syndrome,
is that the exact some region that we identified as being deleted in the human population was
a region that was described a year-and-a-half earlier by Curry Stephenson [spelled phonetically]
from Decode as being a site of a common inversion polymorphism in humans. And shown here is
the region once again blowing up. This is actually looking at the CEPH Diversity Panel,
and the black indicating the frequency of that inversion. So that inversion is essentially
restricted largely to Caucasian populations. Both European and Mediterranean populations
have this inversion, most common. You’ll see, once you get into Africa and Asia and
India, you see very low frequencies of this inversion. Their data suggests that this inversion,
for completely reasons, was associated with increased fecundity and associated with increased
combination in these populations. That was based on genealogical data from I believe
the Icelandic population.
So we went back and we looked at our kids to see if they came from haplotypes that essentially
carried the inversion, and to date 19 out of 19 cases all come from this inversion haplotype.
So I want to make it clear that we don’t know necessarily whether it’s the inversion
that’s predisposing to this microdeletion event or it’s something else on that haplotypic
background which may be predisposing. But the data are overwhelming that this inversion
polymorphism, which is ethnically stratified, is essentially predisposing, or the inversion
haplotype is predisposing to disease. So this obviously has some ramifications. One of the
ramifications would be that this is largely a Caucasian-specific idiopathic mental retardation
syndrome. And our screening so far of African-Americans has shown no cases in a screening of 500 kids
of this particular deletion.
That wasn’t the only one we found. So here’s another region on 15Q24.1, 24.2. Four megabases
in size. These are the children and these are their actual genotypes based on a [unintelligible]
GH over [unintelligible]. Breakpoints in three of the four cases occur precisely at regions
of high sequence identity. In these three cases, we know that each of these events is
De novo. And these kids are fairly high functioning. They have IQs of around 65 to 70. They have
been described as Autistic Spectrum Disorder, but they have extra features such as a growth
Here’s yet a third example. This is distal to the Prader-Willi Region on 15Q13.3. Our
initial screening we skipped over this region and that was because of our criteria. This
index patient here had a breakpoint between breakpoint three and breakpoint five. Actually,
it was not a De novo event. So when we looked at the parents, the mother actually had this
very large deletion over this region, however, it turned out that the mother also had mild
mental retardation as well as epilepsy. So we screened this one and we got two additional
cases that came in. Both of these cases were smaller. They were between breakpoint four
and breakpoint five. These particular cases were both De novo and in both of these cases
there’s also mild mental retardation or a developmental delay and epilepsy. We don’t
know for sure if this a genetic disorder but we’re betting it is. What’s particularly
interesting is that there is one gene located here, [unintelligible] seven, which is a gated
ion channel gene, which has been associated or at least has been implicated and I don’t
think ever proven to be associated with myoclonic epilepsy. So we believe that haploid sufficiency
of this region also causes disease and once again the breakpoints are mapping to these
very large, highly identical duplications.
And the last example that I’ll show you is an example of recurrent deletion not associated
with mental retardation. So we’ve now moved outside of kids with mental retardation and
started screening kids with other types of pediatric disease. And so this is analysis
of some of those children. This is a collection of roughly 80 pediatric patients with renal
disease that have been screened. What we found in this particular case was once again a De
novo deletion. We should point out that all of these cases are De novo with respect to
breakpoints embedded right within the segmental duplications. What’s particularly remarkable
about this disease is that at least in terms of the studies that we’ve looked at and
the samples that we’ve looked at, and this is largely with Christine Belan Chantello
[spelled phonetically] at Paris, it accounts for about 20 percent of pediatric patients
with renal disease that they have in their collection. So it’s a very common, what
we think is a common, microdeletion. Interestingly enough, it’s never been observed once in
a controlled group of 927 individuals. And interestingly, on top of that is essentially
that about 36 percent of children with maturity onset diabetes of the young Type Five also
have the same microdeletion. There is a gene in this region, TCF2, the transcription factor,
in which point mutations have been shown to be associated with both renal disease and
MODY Five diabetes.
So, in summary, we’ve actually looked at now a large number of kids, particularly from
the IMR Study. These numbers are based largely on the initial 300 set from Oxford. And in
these patients we identified what we think are roughly 16 sites of novel structural variation.
I wouldn’t claim that the majority of those are causative, but I do feel comfortable saying
that we do have at least three novel genomic disorders in which we have De novo event recurrence
and we have phenotypic similarities that actually allow us to assign this as a new disorder.
We have one example of a microdeletion event associated with diabetes and renal disease.
And I’d be willing to hazard a bet that if we screen more children with more forms
of pediatric disease, we’ll actually find additional genomic disorders associated with
a wide range of phenotypes.
I’ll just leave this one slide here as an example of why I think this is so important.
We just finished screening using the Lumina platform with Debbie Nickerson and Greg Cooper
in my group, a large number of normal individuals. These are individuals that came in essentially
for lipid testing as part of a study known as the “Park Study” [spelled phonetically].
And shown here is essentially hot spot regions that we find deleted or duplicated within
this normal control group. So shown here are the duplications in pink, and the deletions
are shown here in blue. These are the number of chromosomes from this collection of roughly
1,920 chromosomes that were shown at various frequencies. So here is the absolute number,
here’s the one percent frequency cut off, and here are a bunch of events that are roughly
.1 and .2 percent frequency.
So, coming back to a point that Richard made. Two issues that I want you to think about.
Roughly in this group we have six to 12 percent of normal individuals having big deletions,
precisely over regions that are non-allelic, homologous, recombination, predisposed. We
have an excess, which we don’t understand why in terms of deletions versus duplications,
but we have an absence of things that are around the one to two to three percent frequency.
I would bet that these are being fed by De novo mutations at a high frequency, in the
more normal pool. And the question remains open: what is the impact of these in terms
of disease or susceptibility?
So one of the things you might ask yourself is, why? If you think about the mouse Genome
architecture and human, why do we have all of these large blocks of interchromosomal
and intrachromosomal duplications, if they predispose 10 percent of our Genome to microdelete
and microduplicate at a high frequency? Well, so we tried to address this question over
the years, and maybe I’ll kind of go over these fairly quickly. But the idea and one
of the important things to realize about the duplication architecture in these regions
is it’s not just one piece of sequence. It’s essentially heterogeneous, made up
of many different parts that have had different evolutionary histories and trajectories. So
this is one of roughly the 400 regions in our Human Genome, and each of the colored
in grey represent regions that are duplicated. So basically this full 790 kb stretch of DNA
is entirely duplicated. When you actually reconstruct the evolutionary history of this
region, what you find is that everything in color we’ve been able to show comes from
a different area of the Genome. So we have, essentially, this hodgepodge mosaic over these
specific regions made up of all of these parraways [spelled phonetically] alignments from all
over the Genome. To complicate matters, these regions then duplicate between these large
duplication blocks and can share large blocks of homology in common with another. And these
are the types of events, the secondary events, that are actually predisposed to microdeletions
and microduplications associated with disease.
So we have this architecture; can we systematically reconstruct the evolutionary history of these
regions? And so, working with Pavel Pevzner, we came up with an approach to look at all
of the individual ParaWise alignments within the Human Genome that make up these duplication
blocks, decompose them into minimal evolutionary shared segments so we could break all of these
parraways alignments into individual subunits, or duplication subunits. And then, using data
largely from work from UCSC, basically compare these regions of the Human Genome, all of
the duplicated positions, to see if we can identify the ancestral segment from where
the duplication began. Therefore, provide directionality in terms of the duplications.
And here the logic is pretty straightforward. Most of the duplications that we’re studying
are primate-specific. So if we look at out-group species, such as rat, dog, and mouse that
should not have these regions duplicated, we should see a single hit. Moreover, because
the human copy that’s ancestral moves by this multiple step procedure in terms of duplication,
we should see more autologous anchors between the human and mammalian out-group sequence.
So using this approach we defined the ancestral origin for 67 percent of the duplications
within the Human Genome. We confirmed or validated by fish to see if we really could identify
these ancestral origins. So we take an out-group species, we take a probe that comes from the
derivative locus, and we hybridize to see if it goes back to the right spot that we
predicted. That confirmed, in this case, a relatively small number of experiments in
a matter of 12 times. We then also compared our experimental maps, which we had generated
over the years before with our Insilco prediction with Pavel, and you can see that there’s
pretty good correspondence between the dupocons [spelled phonetically] that we identified.
So what did we learn from this analysis? So here’s the part that we learned. If we start
looking at these intrachromosomal duplication blocks that cause disease what we find in
almost all cases, maybe with one or two exceptions, is that shown here is a map of the duplication
blocks. So these are all of the duplication blocks that have emerged in the last 25 million
years on chromosome 15. About a third of these cause disease. One of the things that we find
is that located almost precisely in the center of these blocks is a common sequence in about
90 percent of the cases, at least for this specific chromosome. This is what we call
a core duplicon. It has a number of interesting properties. It’s the most abundant and most
ancient, as you might except in terms of duplications. Even though these have all heterogeneous histories,
it is common to the vast majority, seems to be the focal point for intrachromosomal duplication
formation. Cores are frequently duplicated as solo elements in the Genome, but rarely
are the flanking duplicons.
So the flanking duplicons almost always exist in association with a core. And when you look
at the cores they are enriched four to five fold for both genes, at least annotated genes
per base pair, as well as ESTs. So these seem to be the most transcriptionally active, most
dynamic areas of the Genome. When we compare those cores, and we find them on about a half
a dozen human chromosomes that have experienced this burst of intrachromosomal duplication,
what we find is that these cores are often associated with Great Ape and human specific
gene families that have been described in the literature over the last five or six years.
We described one of the first, called a nuclear pole interacting protein, which evolves about
50 times faster than most normal genes, at least based on DNDS ratios, and there’s
a number of other genes that have been described. The common features of these genes is they
do not have orthologs in mouse. They have multiple copies in human in chimp. They show
dramatic expressions and changes in their expression profile when compared to these
out-group species, such as baboon or macaque. And at least three of the four examples here
show signatures of positive selection, and in two cases very dramatic examples of positive
So, I’ll just finish off by actually sharing with you some of the work we’ve been able
to do with Eric and NISC, particularly in this regard. Because these regions are so
complicated, we really can’t get a handle on their architecture from looking at whole
Genome shotgun sequence assemblies of chimpanzee, gorilla, macaque, and so on. So working with
Eric we’ve been able to actually target these regions and re-sequence them systematically
in a number of primate species. So shown here is another core region, just to give you an
idea. These are all of the locations of these cores, and this is -- or I should say these
duplication blocks. So this is about 250 kb in size and there’s this core of roughly
20 kb, which is in 14 out of 16 of the blocks that are shown on this chromosome. This is
a core, which is particularly interesting, as it has a very rapidly evolving gene family
embedded within in, which is the nuclear pore interacting protein. So that if you look at
the actual degree of sequence identity and sliding windows across this region of the
Genome, this is actually comparing any two copies, you will find troughs and peaks in
terms of the sequence identity. And what’s most remarkable is that these troughs correspond
precisely to the position of exons. So this is this eight exon gene, with no known function.
And the other thing I’ll just point out is that 98 percent of these changes have resulted
in amino acid changes between the copies. This is an extreme example of positive selection.
So working with Eric we drilled down and looked at a lot of other copies in other primates,
particularly focusing on gorilla, chimpanzee, orangutan, and baboon. We sequence annotated
all of the sequences that we got back, both experimentally and computationally, and then
we reconstructed the phylogeny of these segments. So I hope you don’t go blind, but this is
the actual phylogeny, shown here. This is based on a neighbor joining analysis of two
kb of non-coding sequence for the core. And shown here is the structure that you see with
HSA representing human, PTR representing chimp, GGO representing gorilla, and so on. So what
we get from this bewildering complexity over these parts of the Genome are really a couple
of things. Number one, all of this architecture that we now see, which we now know causes
disease, is about 10 million years young. So all of the events have occurred in the
common ancestor of chimp, human, and gorilla, or immediately after the separation of those
species. The second thing, which I think is really, really interesting, is that when we
look at orangutan, we find none of the core -- we see the core once again present, but
we see completely different flanking duplicons, which are unique in human and all of the other
Great Ape species. So orangutan has done the exact same thing that our Genome did seven
million years, ago using the same core, but has actually picked up completely different
flanking sequences which are unique in chimpanzee, unique in gorilla, and unique in human.
So this tells us that this core is actively transducing, we think, segments of the Genome
around. And just to give you a perspective back now 25 million years ago, these are all
of the pieces that in human look like this -- oh, sorry. So this is the architecture
that we see in human. Each one of these blocks of sequence are essentially unique in baboon,
they’re unique in macaque. So we think these all began as unique copy sequences, with the
core beginning to jump probably about 20 million years ago, pick up flanks, and continue to
grow, such that it now occupies LCR16A and its associated duplicons, about 10 percent
of the euchromatin of human chromosome 16. At least, in this case, 16P. With large insert
sequences, we can also map the locations in orangutan. And so this is the orangutan picture.
This is human 16. Very limited activity on human 16. But here you see on chromosome 13,
the core has essentially jumped, jumped to a new chromosome, and begun to do its dance
again on these particular chromosomes, creating a very complex architecture once again on
chromosome 13, interspersed at duplications distributed across, in this case, chromosome
13. So here I think the important point is the cores are mobile. They can jump to new
chromosomes, and they can actually transduce flanking sequences as part of its trajectory.
I don’t have time to go into all of this data; just summarize what we know about this
particular core. We know that it began as a single copy sequence about 25 million years
ago, and data that I don’t have time to show you indicates that it was actually ***
specific. It was expressed only in the tests and it showed no evidence of selection by
any of the normal tests looking at KAKS. So this is a little bit heretical, I think, because
most people would teach you that all genes are born from other genes. Our data suggests
that this thing was born from a transcript that was probably neutral in terms of evolution.
Then about 25 million years to 12 million years ago, in the common ancestor of orangutan,
gorilla, and chimpanzee in human, it began to move, it began to duplicate. Some copies
on this lineage duplicating specifically heavily in chromosome 16 and here duplicating on chromosome
13. At this point, when it began to duplicate based on expression analysis in orangutan,
chimpanzee, and human, we see ubiquitous pattern of expression. It’s expressed in every tissue
in orangutan and in human that we’ve ever analyzed to date. So we’ve looked at about
a dozen in orangutan and about 32 different tissues in human.
Between seven and 12 million years ago, not on this lineage but on the African Great Ape
human lineage, we see extreme positive selection. KAKS values on the order of, like, 10 compared
to the Old World monkey sequence, suggesting that at that point some mutation must have
occurred to lead to an open reading frame that was essentially selective and then became
fixed at a very high frequency in the population. So the 98 percent amino acid changes that
I mentioned are occurring right here at this branch.
So we believe that the movement of this core led to the emergence of a novel gene family
about seven million years ago; probably one of the youngest and most rapidly evolving
genes in the human species.
So in summary, I’ve talked about the architecture of the Human Genome, with respect to these
large blocks of duplication, I talked about how complex they were, and specifically showed
you some examples of how these complex, all of this complex architecture can predispose
to De novo, large deletions, probably of huge significant effect or selective effect within
the population. Our targeted approach has uncovered four new microdeletion syndromes,
and we’ve shown that they’re recurrent De novo. And I think the question remains
unanswered: what is the importance of this mechanism toward complex disease? Because
if you think about it none of these events can be tagged using a tag SNP. Because they’re
De novo, they’re occurring in some cases on different haplotypes.
Then I talked about the evolutionary significance of these regions; particularly the core architecture
that we think has emerged to account for the expansion of intrachromosomals within the
human Great Ape lineage. And particularly, I’ll leave you with this kind of final thought,
maybe the negative selection of these microdeletion and microduplication events that exist in
our species may be partially offset by the positive effect of having newly minted genes,
many copies of them, at new locations. And if you think about it, there’s a huge challenge
ahead, even though they’re few in number, is to work out what are the functions of these
types of genes that don’t exist in out-group species and that are embedded in these very
complex regions of the Genome where STS and SNP mappers fear to tread.
And we hope to continue hopefully -- I think my students say it’s going to be on my epitaph,
that he found these genes but never actually showed the function of a single one. Maybe
five years from now I can come back and share with you some evidence that they’re actually
functional. So, acknowledge these folks: Andy Sharp [spelled phonetically], Heather Mefford
[spelled phonetically]. They did most of the post docs that worked on the human disease
angle. Matt Johnson [spelled phonetically] and Zo Xi Jang [spelled phonetically], they
are both students who did all of the work with respect to the evolution of these core
regions. Good colleagues in sequencing centers, Baylor Washoe [spelled phonetically] and specifically
at NISC, who rose to the challenge of sequencing some of these very difficult clones. I think
I have a reputation, probably rightly deserved, that these are some of the nastiest clones
for sequencing centers to sequence. Thank the patience of people like Bashali Mascari
[spelled phonetically], Bob Blakesly [spelled phonetically], and really Jerry Bouchard [spelled
phonetically], who really took these on and took them to completion, at least within the
primates. And a lot of great colleagues, clinical colleagues, particularly overseas, that have
been very forthcoming in providing samples and working on collaborations. Thank you.
Dr. Eric Green: We have time for quick questions. Any questions
from the floor?
Dr. Evan Eichler: Richard’s got one.
Dr. Eric Green: Richard, you’ve got one? Come over here.
Dr. Richard Gibbs: Yea. Evan, what about mild mice? Have you
got any duplication data there? Have they got the same low level of these events?
Dr. Evan Eichler: Nothing on -- you mean wild outbred mice?
Yea, so no information on wild yet. We have a lot of information from the inbreds over
the duplicated regions, and they show as much variation as humans do. The only difference
is that variation is restricted to the duplications which are tandem, and doesn’t influence
these unique stretches between the interspersed duplications.
Dr. Eric Green: So I want to ask a question and we’re trying
to get a PowerPoint loaded, so I’ll also stall a little. Evan, so the screen that you
did of the pediatric patients, you made some comment that if you screened for more pediatric
diseases you’d like to find more of these copy number changes, but there’s no reason
to think if you screened unusual adult onset diseases you might find similar.
Dr. Evan Eichler: Right. No, we’ve toyed with the idea and
we’re thinking about doing a more adult-oriented disease, I guess. I guess I kind of have this
fundamental belief that if we can show it at a pediatric level, there’ll be a stronger
genetic component, and so I’m more interested in actually screening more kids with disease
in which we don’t have a good explanation, than actually looking at diseases where environment
will play probably a bigger role and genetics might play less.