Human Genome Structural Variation, Disease, And Evolution - Evan eichler

[applause] Dr. Evan Eichler: So it really is, indeed, a pleasure to be here. I was here I think five years ago last time and my message has stayed -- maybe not hopefully -- but has stayed very much consistent. As Jim alluded to, our lab has been focused on what some people have called the darkest matter of the Human Genome. We’re specifically interested in regions that changed very, very rapidly, specifically within the human species; areas of the Genome that have been proven to be dynamic both in terms of structure, their organization, and in terms of their evolution. As he mentioned, specifically we’re focused, as one aspect of that study, on regions of the Genome that are highly duplicated. And so these come in two different flavors. These are duplicated sequences that are either duplicated within a chromosome known as intrachromosomal duplications or duplicated between non-homologous chromosomes known as interchromosomal duplications. The reasons we’re interested in this are really two-fold. One, these are dynamic by the fact that they have sequence identity at very high levels to not only the comolgous recombination promoting additional rearrangement events at the specific sites. The second reason is that if you believe the work of Susumu Ohno and others duplication is the primary force by which new genes and gene families evolve. So we’re interested in these regions really from the perspective of dynamic mutation, De novo mutation associated with disease, and second from the perspective of the evolution potentially of new genes and gene families within human. And both of those topics I want to discuss today. So just to summarize the work that came from really analyzing the whole Genome, or really the finished Human Genome, this is the pattern of the largest and most identical duplications within our Genome; the blue lines representing these large blocks, greater than 95 percent, greater than 20 kb in size, of intrachromosomal duplications. And you’ll notice from this that a lot of our duplications are essentially interspersed. So if you look at chromosome seven, you find a lot of the paraways [spelled phonetically], the large ones, are separated by megabases of sequence. If you add the intrachromosomal pattern you get something like this. So this is the pattern of the Human Genome that has been relatively constant to the new assemblies. And most of this data, I should point out, came from back-based sequencing. This is the sequencing of large insert clones. If you go back and look at some of the first published whole Genome shotgun assembly versions of the Human Genome, these areas of the Genome are completely missing. The important point is about 60 percent of the large duplications within our Genome are interspersed. That is to say, they are separated by at least a megabase from their nearest neighbor, or they map to another chromosome. If you contrast this with some recent data that we’ve done with Deanna Church and looking at kind of the comparable finished version of the mouse Genome, this is the pattern that you see for the most identical and the largest duplications within mouse. So in this, the total amount of duplicated sequence now turns out to be very similar to what we saw initially with human; roughly five percent of the Genome. But you’ll notice two things. The actual locations of these are fewer in number, so there are about half the number of sites in the mouse Genome that are highly duplicated. And the second thing you’ll notice is that essentially most of the lines, the blue lines here, which indicate intrachromosomal duplications, are right on top of one another, suggesting that most of the duplications in mouse, about 82 percent of them, are tandem, that is to say, clustered, in orientation. So this difference between man and mouse in terms of finished back-based sequence assembly has important ramifications, both in terms of evolution, the fact that you can juxtapose different pieces of DNA creating complex configurations that you don’t see in close-related species, and also in terms of disease. So its importance in terms of disease comes from really some of this seminal work from Jim Lofsky [spelled phonetically] and others in the early 1990s, and the idea is very straightforward. If you have duplicated sequences within a Genome you can trick the recombination machinery during meiosis to recombine where it shouldn’t. So here’s showing two of the four chromosomes aligning during meiosis, the duplicated sequence shown in green, and anomaly re-homologous [spelled phonetically] recombination, also known as an equal crossing over event occurring, leading to gametes that have accumulated an additional copy of that duplicated sequence or have lost a copy of that duplicated sequence. So the really important part is that if these were essentially interspersed, imagine intrachromosomal duplications now with unique sequences encoding genes A, B, and C, genes A, B, and C get taken along for the ride. So in addition to producing gametes that have additional copies of that duplicated sequence, we now have gametes that have additional copies of genes A, B, and C, and we have gametes that have lost copies of A, B, and C. And so those genes are triple sensitive, haploid sufficient, or imprinted, the result is disease. And so there are at this point about 30 different syndromes in the human population that are caused precisely by this mechanism. It is not really a genetic disease, because it doesn’t have to be transmitted. There’s something that goes on in all of us as we sit in this room and we produce gametes, either egg or ***. And an architecture that has a lot of these intraspersed configurations is obviously going to be predisposed to these types of events at a much higher frequency. So these are some of the diseases. I’m sure many of you have heard of some of them: Velo-Cardio-Facial DiGeorge, Williams Syndrome, Prader-Willi and so on. There are two interesting aspects about these diseases, if you look at them. So shown here is the actual size of the duplication, which is immediate in the rearrangement. And the important point here is that most of the events are large. The duplicated sequences have to be greater than 10 kb, often greater than 100 kb in size, to mediate at a high frequency of De novo event. The second point is that the degree of sequence identity is also very high. So typically most of the diseases are caused by duplications that are greater than 95 percent, and the vast majority are greater than 98 percent. And the third component of these diseases, which you can kind of see here, is that the vast majority of diseases that have been described thus far involve some type of neurologic component: either peripheral nervous system, or central nervous and cognitive deficit with these kids. So the hypothesis was very straightforward. If we had a beautiful duplication map of the Genome, which was born on the sweat of a lot of wonderful people working on this project over the last 10 years, could we use that as essentially a morbidity map to predict the sites of disease associated with these specific regions? And, specifically, could we focus on children with mental retardation to find new diseases, previously unknown? So this is this duplication map I showed you again. So it was not just a quality control exercise for the Human Genome project, but we actually viewed it as a disease map. And so, here is our roadmap. All of the gold bars represent blocks of sequence where the architecture is such that you would be believe to be a high frequency of De novo mutation based on very large, very identical sequences at these positions. So there are roughly 130 regions at that time, of the Genome. Twenty-three at that time, which are the gold bars with letters behind them, were ones already associated with disease, and we were betting, at least the subset of those remaining regions would be associated with De novo disease in the So the way we did this -- this is kind of old technology now, but we began this work about two and a half years ago -- we targeted all of our regions that had at least 50 kb of unique sequence less than five megabases that were flanked by duplications greater than 95 identity and greater than 10 kb in size. We took backs from these regions and we built a specialized microray which contained about 2,000 backs from these roughly 130 regions of the Human Genome. We spotted them on a microray and we simply would test and give a normal DNA sample labeled with one florachrome against a diseased individual labeled with another florachrome, and looked for sigma intensity differences based on hybridization to this chip as evidence or gain of loss of that specific region. So, in terms of the study populations, we used a normal control group, which people have argued maybe isn’t the best normal control group, but it was what we had available at that time, which included all of the HapMap samples, as well as an additional diversity panel of roughly 45 individuals. So we used these normal individuals to establish the normal pattern of variation within individuals without disease, or at least without disease associated with mental retardation. So I’m not going to go over those details, other than to say that we found lots of copy number variation. So, harkening back to something that Claire mentioned, the Human Genome had riddled with copy number differences and gains and losses of sequences in different individuals. We then focused on a collection of kids that essentially the clinical community, or at least diagnostic community, had given up on. There’s roughly 500 children: children which had been tested for Fragile X had come back negative, children tested for [unintelligible] rearrangements, and children whose carrier type was normal for testing, using this platform. So, some of the results. So after screening the normal collection then following up with studies of these three, roughly initially the first 291 children from Oxford, we found regions of the Genome that look like this. So what you’re looking at here is a log two relative hybridization intensity plot for four different individuals. These are all children with mental retardation. And we’re looking for things that deviate from the log two ratio of zero, which would be no difference. And you’ll probably notice that there’s a lot of noise over these regions, which is because about a third of our probes were actually selected right from the duplicated regions, where the denominator really isn’t 2n, but is actually more than that, so this actually creates some background. But clearly there’s something different about these four kids. They have essentially about five backs that are apparently showing evidence of microdeletion in a region that we never saw once in a normal control group of study. These were validated by fish. I think the most probably interesting aspect is that we could actually go back now and do a more high density [unintelligible] nucleotide customized micro race instead of using five backs in the region, we designed now 11,000 [unintelligible] over that specific region and really confirmed to see whether the breakpoints were identical. So it’s showing here those four children once again. This is the log two relative hybridization intensity depression shown here in terms of Log Two, indicated by significance in terms of when you see the red signal. And what you can see here is a couple of things. First off, if we compare the affected child with that of the parents, so this is one of the children compared to mom and dad, you see mom and dad are normal over in that area, but the child has a deletion of roughly 450 kb, precisely at that site. You also noticed here, this is the segmental duplications. These are very large, highly identical duplications, which chair about 99 percent identity over 100 kb in size. So the duplications are demarcating the boundaries, or the breakpoints, roughly. But you’ll also notice when you look at the regions contained underneath these duplications, you see a lot of variation in the normal population, as well. The important point here was that we had essentially an identical, critical region in four children identified from this study of mental retardation. All of them had haploid sufficiency [spelled phonetically], at least that’s our model, and in fact all of them that we’ve been able to test so far were De novo events. In other events, parents did not have this lesion; this was seen specifically in the kids. On top here are some of the genes and then there’s five genes mapping into that region. We don’t know which causes the disease but obviously there are some great candidates. One of the most interesting is MAPT, also known as Tao. It’s a gene in which point mutations have been associated with Parkinson’s, Alzheimer’s, and frontal temperate dementia. So we’re screening now patients which have essentially phenocopy in terms of disease and looking for point mutations. I’d like to just emphasize and make this note, that even though we screened only 300 kids, this was roughly of the idiopathic collection that we looked at, was roughly one and a half percent of the total in terms of disease. These are what the kids looked like in collaboration with our former competitors, Bert de Vries in Holland. We’ve had the opportunity to look at roughly now 21 children, all of them which have the microdeletion. Nineteen of them, we’ve been able to look at parentals and show the De novo events. And if you look at these kids you can see there’s some similarities in phenotype. One of the most pronounced, believe it or not, is this very bulbous nose that you see in almost all of the kids. You see a pronounced philtrum, sometimes protruding tongue, as well as a fairly happy disposition, which has actually been noted in many of the clinical records. So the children have a better outlook than most of us in terms of life. And, in fact, we’ve now been able to go back and identify from De novo collections being able to show clinicians the data being able to identify additional kids using this So one of the interesting parts of this particular, what we think is a new, deletion syndrome, is that the exact some region that we identified as being deleted in the human population was a region that was described a year-and-a-half earlier by Curry Stephenson [spelled phonetically] from Decode as being a site of a common inversion polymorphism in humans. And shown here is the region once again blowing up. This is actually looking at the CEPH Diversity Panel, and the black indicating the frequency of that inversion. So that inversion is essentially restricted largely to Caucasian populations. Both European and Mediterranean populations have this inversion, most common. You’ll see, once you get into Africa and Asia and India, you see very low frequencies of this inversion. Their data suggests that this inversion, for completely reasons, was associated with increased fecundity and associated with increased combination in these populations. That was based on genealogical data from I believe the Icelandic population. So we went back and we looked at our kids to see if they came from haplotypes that essentially carried the inversion, and to date 19 out of 19 cases all come from this inversion haplotype. So I want to make it clear that we don’t know necessarily whether it’s the inversion that’s predisposing to this microdeletion event or it’s something else on that haplotypic background which may be predisposing. But the data are overwhelming that this inversion polymorphism, which is ethnically stratified, is essentially predisposing, or the inversion haplotype is predisposing to disease. So this obviously has some ramifications. One of the ramifications would be that this is largely a Caucasian-specific idiopathic mental retardation syndrome. And our screening so far of African-Americans has shown no cases in a screening of 500 kids of this particular deletion. That wasn’t the only one we found. So here’s another region on 15Q24.1, 24.2. Four megabases in size. These are the children and these are their actual genotypes based on a [unintelligible] GH over [unintelligible]. Breakpoints in three of the four cases occur precisely at regions of high sequence identity. In these three cases, we know that each of these events is De novo. And these kids are fairly high functioning. They have IQs of around 65 to 70. They have been described as Autistic Spectrum Disorder, but they have extra features such as a growth deficiency. Here’s yet a third example. This is distal to the Prader-Willi Region on 15Q13.3. Our initial screening we skipped over this region and that was because of our criteria. This index patient here had a breakpoint between breakpoint three and breakpoint five. Actually, it was not a De novo event. So when we looked at the parents, the mother actually had this very large deletion over this region, however, it turned out that the mother also had mild mental retardation as well as epilepsy. So we screened this one and we got two additional cases that came in. Both of these cases were smaller. They were between breakpoint four and breakpoint five. These particular cases were both De novo and in both of these cases there’s also mild mental retardation or a developmental delay and epilepsy. We don’t know for sure if this a genetic disorder but we’re betting it is. What’s particularly interesting is that there is one gene located here, [unintelligible] seven, which is a gated ion channel gene, which has been associated or at least has been implicated and I don’t think ever proven to be associated with myoclonic epilepsy. So we believe that haploid sufficiency of this region also causes disease and once again the breakpoints are mapping to these very large, highly identical duplications. And the last example that I’ll show you is an example of recurrent deletion not associated with mental retardation. So we’ve now moved outside of kids with mental retardation and started screening kids with other types of pediatric disease. And so this is analysis of some of those children. This is a collection of roughly 80 pediatric patients with renal disease that have been screened. What we found in this particular case was once again a De novo deletion. We should point out that all of these cases are De novo with respect to breakpoints embedded right within the segmental duplications. What’s particularly remarkable about this disease is that at least in terms of the studies that we’ve looked at and the samples that we’ve looked at, and this is largely with Christine Belan Chantello [spelled phonetically] at Paris, it accounts for about 20 percent of pediatric patients with renal disease that they have in their collection. So it’s a very common, what we think is a common, microdeletion. Interestingly enough, it’s never been observed once in a controlled group of 927 individuals. And interestingly, on top of that is essentially that about 36 percent of children with maturity onset diabetes of the young Type Five also have the same microdeletion. There is a gene in this region, TCF2, the transcription factor, in which point mutations have been shown to be associated with both renal disease and MODY Five diabetes. So, in summary, we’ve actually looked at now a large number of kids, particularly from the IMR Study. These numbers are based largely on the initial 300 set from Oxford. And in these patients we identified what we think are roughly 16 sites of novel structural variation. I wouldn’t claim that the majority of those are causative, but I do feel comfortable saying that we do have at least three novel genomic disorders in which we have De novo event recurrence and we have phenotypic similarities that actually allow us to assign this as a new disorder. We have one example of a microdeletion event associated with diabetes and renal disease. And I’d be willing to hazard a bet that if we screen more children with more forms of pediatric disease, we’ll actually find additional genomic disorders associated with a wide range of phenotypes. I’ll just leave this one slide here as an example of why I think this is so important. We just finished screening using the Lumina platform with Debbie Nickerson and Greg Cooper in my group, a large number of normal individuals. These are individuals that came in essentially for lipid testing as part of a study known as the “Park Study” [spelled phonetically]. And shown here is essentially hot spot regions that we find deleted or duplicated within this normal control group. So shown here are the duplications in pink, and the deletions are shown here in blue. These are the number of chromosomes from this collection of roughly 1,920 chromosomes that were shown at various frequencies. So here is the absolute number, here’s the one percent frequency cut off, and here are a bunch of events that are roughly .1 and .2 percent frequency. So, coming back to a point that Richard made. Two issues that I want you to think about. Roughly in this group we have six to 12 percent of normal individuals having big deletions, precisely over regions that are non-allelic, homologous, recombination, predisposed. We have an excess, which we don’t understand why in terms of deletions versus duplications, but we have an absence of things that are around the one to two to three percent frequency. I would bet that these are being fed by De novo mutations at a high frequency, in the more normal pool. And the question remains open: what is the impact of these in terms of disease or susceptibility? So one of the things you might ask yourself is, why? If you think about the mouse Genome architecture and human, why do we have all of these large blocks of interchromosomal and intrachromosomal duplications, if they predispose 10 percent of our Genome to microdelete and microduplicate at a high frequency? Well, so we tried to address this question over the years, and maybe I’ll kind of go over these fairly quickly. But the idea and one of the important things to realize about the duplication architecture in these regions is it’s not just one piece of sequence. It’s essentially heterogeneous, made up of many different parts that have had different evolutionary histories and trajectories. So this is one of roughly the 400 regions in our Human Genome, and each of the colored in grey represent regions that are duplicated. So basically this full 790 kb stretch of DNA is entirely duplicated. When you actually reconstruct the evolutionary history of this region, what you find is that everything in color we’ve been able to show comes from a different area of the Genome. So we have, essentially, this hodgepodge mosaic over these specific regions made up of all of these parraways [spelled phonetically] alignments from all over the Genome. To complicate matters, these regions then duplicate between these large duplication blocks and can share large blocks of homology in common with another. And these are the types of events, the secondary events, that are actually predisposed to microdeletions and microduplications associated with disease. So we have this architecture; can we systematically reconstruct the evolutionary history of these regions? And so, working with Pavel Pevzner, we came up with an approach to look at all of the individual ParaWise alignments within the Human Genome that make up these duplication blocks, decompose them into minimal evolutionary shared segments so we could break all of these parraways alignments into individual subunits, or duplication subunits. And then, using data largely from work from UCSC, basically compare these regions of the Human Genome, all of the duplicated positions, to see if we can identify the ancestral segment from where the duplication began. Therefore, provide directionality in terms of the duplications. And here the logic is pretty straightforward. Most of the duplications that we’re studying are primate-specific. So if we look at out-group species, such as rat, dog, and mouse that should not have these regions duplicated, we should see a single hit. Moreover, because the human copy that’s ancestral moves by this multiple step procedure in terms of duplication, we should see more autologous anchors between the human and mammalian out-group sequence. So using this approach we defined the ancestral origin for 67 percent of the duplications within the Human Genome. We confirmed or validated by fish to see if we really could identify these ancestral origins. So we take an out-group species, we take a probe that comes from the derivative locus, and we hybridize to see if it goes back to the right spot that we predicted. That confirmed, in this case, a relatively small number of experiments in a matter of 12 times. We then also compared our experimental maps, which we had generated over the years before with our Insilco prediction with Pavel, and you can see that there’s pretty good correspondence between the dupocons [spelled phonetically] that we identified. So what did we learn from this analysis? So here’s the part that we learned. If we start looking at these intrachromosomal duplication blocks that cause disease what we find in almost all cases, maybe with one or two exceptions, is that shown here is a map of the duplication blocks. So these are all of the duplication blocks that have emerged in the last 25 million years on chromosome 15. About a third of these cause disease. One of the things that we find is that located almost precisely in the center of these blocks is a common sequence in about 90 percent of the cases, at least for this specific chromosome. This is what we call a core duplicon. It has a number of interesting properties. It’s the most abundant and most ancient, as you might except in terms of duplications. Even though these have all heterogeneous histories, it is common to the vast majority, seems to be the focal point for intrachromosomal duplication formation. Cores are frequently duplicated as solo elements in the Genome, but rarely are the flanking duplicons. So the flanking duplicons almost always exist in association with a core. And when you look at the cores they are enriched four to five fold for both genes, at least annotated genes per base pair, as well as ESTs. So these seem to be the most transcriptionally active, most dynamic areas of the Genome. When we compare those cores, and we find them on about a half a dozen human chromosomes that have experienced this burst of intrachromosomal duplication, what we find is that these cores are often associated with Great Ape and human specific gene families that have been described in the literature over the last five or six years. We described one of the first, called a nuclear pole interacting protein, which evolves about 50 times faster than most normal genes, at least based on DNDS ratios, and there’s a number of other genes that have been described. The common features of these genes is they do not have orthologs in mouse. They have multiple copies in human in chimp. They show dramatic expressions and changes in their expression profile when compared to these out-group species, such as baboon or macaque. And at least three of the four examples here show signatures of positive selection, and in two cases very dramatic examples of positive selection. So, I’ll just finish off by actually sharing with you some of the work we’ve been able to do with Eric and NISC, particularly in this regard. Because these regions are so complicated, we really can’t get a handle on their architecture from looking at whole Genome shotgun sequence assemblies of chimpanzee, gorilla, macaque, and so on. So working with Eric we’ve been able to actually target these regions and re-sequence them systematically in a number of primate species. So shown here is another core region, just to give you an idea. These are all of the locations of these cores, and this is -- or I should say these duplication blocks. So this is about 250 kb in size and there’s this core of roughly 20 kb, which is in 14 out of 16 of the blocks that are shown on this chromosome. This is a core, which is particularly interesting, as it has a very rapidly evolving gene family embedded within in, which is the nuclear pore interacting protein. So that if you look at the actual degree of sequence identity and sliding windows across this region of the Genome, this is actually comparing any two copies, you will find troughs and peaks in terms of the sequence identity. And what’s most remarkable is that these troughs correspond precisely to the position of exons. So this is this eight exon gene, with no known function. And the other thing I’ll just point out is that 98 percent of these changes have resulted in amino acid changes between the copies. This is an extreme example of positive selection. So working with Eric we drilled down and looked at a lot of other copies in other primates, particularly focusing on gorilla, chimpanzee, orangutan, and baboon. We sequence annotated all of the sequences that we got back, both experimentally and computationally, and then we reconstructed the phylogeny of these segments. So I hope you don’t go blind, but this is the actual phylogeny, shown here. This is based on a neighbor joining analysis of two kb of non-coding sequence for the core. And shown here is the structure that you see with HSA representing human, PTR representing chimp, GGO representing gorilla, and so on. So what we get from this bewildering complexity over these parts of the Genome are really a couple of things. Number one, all of this architecture that we now see, which we now know causes disease, is about 10 million years young. So all of the events have occurred in the common ancestor of chimp, human, and gorilla, or immediately after the separation of those species. The second thing, which I think is really, really interesting, is that when we look at orangutan, we find none of the core -- we see the core once again present, but we see completely different flanking duplicons, which are unique in human and all of the other Great Ape species. So orangutan has done the exact same thing that our Genome did seven million years, ago using the same core, but has actually picked up completely different flanking sequences which are unique in chimpanzee, unique in gorilla, and unique in human. So this tells us that this core is actively transducing, we think, segments of the Genome around. And just to give you a perspective back now 25 million years ago, these are all of the pieces that in human look like this -- oh, sorry. So this is the architecture that we see in human. Each one of these blocks of sequence are essentially unique in baboon, they’re unique in macaque. So we think these all began as unique copy sequences, with the core beginning to jump probably about 20 million years ago, pick up flanks, and continue to grow, such that it now occupies LCR16A and its associated duplicons, about 10 percent of the euchromatin of human chromosome 16. At least, in this case, 16P. With large insert sequences, we can also map the locations in orangutan. And so this is the orangutan picture. This is human 16. Very limited activity on human 16. But here you see on chromosome 13, the core has essentially jumped, jumped to a new chromosome, and begun to do its dance again on these particular chromosomes, creating a very complex architecture once again on chromosome 13, interspersed at duplications distributed across, in this case, chromosome 13. So here I think the important point is the cores are mobile. They can jump to new chromosomes, and they can actually transduce flanking sequences as part of its trajectory. I don’t have time to go into all of this data; just summarize what we know about this particular core. We know that it began as a single copy sequence about 25 million years ago, and data that I don’t have time to show you indicates that it was actually *** specific. It was expressed only in the tests and it showed no evidence of selection by any of the normal tests looking at KAKS. So this is a little bit heretical, I think, because most people would teach you that all genes are born from other genes. Our data suggests that this thing was born from a transcript that was probably neutral in terms of evolution. Then about 25 million years to 12 million years ago, in the common ancestor of orangutan, gorilla, and chimpanzee in human, it began to move, it began to duplicate. Some copies on this lineage duplicating specifically heavily in chromosome 16 and here duplicating on chromosome 13. At this point, when it began to duplicate based on expression analysis in orangutan, chimpanzee, and human, we see ubiquitous pattern of expression. It’s expressed in every tissue in orangutan and in human that we’ve ever analyzed to date. So we’ve looked at about a dozen in orangutan and about 32 different tissues in human. Between seven and 12 million years ago, not on this lineage but on the African Great Ape human lineage, we see extreme positive selection. KAKS values on the order of, like, 10 compared to the Old World monkey sequence, suggesting that at that point some mutation must have occurred to lead to an open reading frame that was essentially selective and then became fixed at a very high frequency in the population. So the 98 percent amino acid changes that I mentioned are occurring right here at this branch. So we believe that the movement of this core led to the emergence of a novel gene family about seven million years ago; probably one of the youngest and most rapidly evolving genes in the human species. So in summary, I’ve talked about the architecture of the Human Genome, with respect to these large blocks of duplication, I talked about how complex they were, and specifically showed you some examples of how these complex, all of this complex architecture can predispose to De novo, large deletions, probably of huge significant effect or selective effect within the population. Our targeted approach has uncovered four new microdeletion syndromes, and we’ve shown that they’re recurrent De novo. And I think the question remains unanswered: what is the importance of this mechanism toward complex disease? Because if you think about it none of these events can be tagged using a tag SNP. Because they’re De novo, they’re occurring in some cases on different haplotypes. Then I talked about the evolutionary significance of these regions; particularly the core architecture that we think has emerged to account for the expansion of intrachromosomals within the human Great Ape lineage. And particularly, I’ll leave you with this kind of final thought, maybe the negative selection of these microdeletion and microduplication events that exist in our species may be partially offset by the positive effect of having newly minted genes, many copies of them, at new locations. And if you think about it, there’s a huge challenge ahead, even though they’re few in number, is to work out what are the functions of these types of genes that don’t exist in out-group species and that are embedded in these very complex regions of the Genome where STS and SNP mappers fear to tread. [laughter] And we hope to continue hopefully -- I think my students say it’s going to be on my epitaph, that he found these genes but never actually showed the function of a single one. Maybe five years from now I can come back and share with you some evidence that they’re actually functional. So, acknowledge these folks: Andy Sharp [spelled phonetically], Heather Mefford [spelled phonetically]. They did most of the post docs that worked on the human disease angle. Matt Johnson [spelled phonetically] and Zo Xi Jang [spelled phonetically], they are both students who did all of the work with respect to the evolution of these core regions. Good colleagues in sequencing centers, Baylor Washoe [spelled phonetically] and specifically at NISC, who rose to the challenge of sequencing some of these very difficult clones. I think I have a reputation, probably rightly deserved, that these are some of the nastiest clones for sequencing centers to sequence. Thank the patience of people like Bashali Mascari [spelled phonetically], Bob Blakesly [spelled phonetically], and really Jerry Bouchard [spelled phonetically], who really took these on and took them to completion, at least within the primates. And a lot of great colleagues, clinical colleagues, particularly overseas, that have been very forthcoming in providing samples and working on collaborations. Thank you. [applause] Dr. Eric Green: We have time for quick questions. Any questions from the floor? Dr. Evan Eichler: Richard’s got one. Dr. Eric Green: Richard, you’ve got one? Come over here. Dr. Richard Gibbs: Yea. Evan, what about mild mice? Have you got any duplication data there? Have they got the same low level of these events? Dr. Evan Eichler: Nothing on -- you mean wild outbred mice? Yea, so no information on wild yet. We have a lot of information from the inbreds over the duplicated regions, and they show as much variation as humans do. The only difference is that variation is restricted to the duplications which are tandem, and doesn’t influence these unique stretches between the interspersed duplications. Dr. Eric Green: So I want to ask a question and we’re trying to get a PowerPoint loaded, so I’ll also stall a little. Evan, so the screen that you did of the pediatric patients, you made some comment that if you screened for more pediatric diseases you’d like to find more of these copy number changes, but there’s no reason to think if you screened unusual adult onset diseases you might find similar. Dr. Evan Eichler: Right. No, we’ve toyed with the idea and we’re thinking about doing a more adult-oriented disease, I guess. I guess I kind of have this fundamental belief that if we can show it at a pediatric level, there’ll be a stronger genetic component, and so I’m more interested in actually screening more kids with disease in which we don’t have a good explanation, than actually looking at diseases where environment will play probably a bigger role and genetics might play less.